Handling big data in any organisation can at first seem daunting. Not to mention the responsibilities that come with dealing with the vast amount of information contained within it. Naturally, the storage of big data can be a challenge in terms of security for many reasons because the amount of data stored have a direct effect on the consequences of a breach, but it also influences the strategic and tactical approaches that should be taken to ensure compliance and privacy. But it needn’t cause headache.

When producing information for big data, organisations have to ensure that they have the right balance between utility of the data and privacy. This follows a process of anonymising the data, encrypting it, putting proper access control in place with security monitoring, risk assessment and making sure storage complies with local regulations.

Anonymising Data

Before the data is stored, it should be adequately anonymised, which involves removing any unique identifier for a user. This in itself can be a security challenge, as removing unique identifiers might not be enough to guarantee that the data will remain anonymous. The anonymised data could be could be cross-referenced with other available data following de-anonymisation techniques. Therefore, it should also be encrypted.


Both the raw data and the outcome from analytics should be adequately protected with encryption. In the case of cloud services, data cannot be sent encrypted by the users if the cloud needs to perform operations over the data. A solution for this is to use “Fully Homomorphic Encryption” (FHE), which allows data stored in the cloud to perform operations over the encrypted data so that new encrypted data will be created. In addition, protect communications: data in transit should be adequately protected to ensure its confidentiality and integrity.

Access Control & Security Monitoring

Adequate access control mechanisms will also be key in protecting the data. Access control has traditionally been provided by operating systems or applications restricting access to the information, which typically exposes all the information if the system or application is hacked. A better approach is to protect the information using encryption that only allows decryption if the entity trying to access the information is authorised by an access control policy.

One problem that may need to be overcome is that software commonly used to store big data, such as Hadoop, doesn’t always come with user authentication by default. This makes the issue of access control trickier, as a default installation would leave the information open to unauthenticated users. By using real-time security monitoring, access to the data is monitored and threat intelligence applied in order to prevent unauthorised access to the data.

Risk Assessment & Compliance

Organisations should run a risk assessment over collected data and consider if they collect customer information that should be kept private to establish adequate policies that protect the data and clients’ right to privacy. They should also carefully account for regional laws around handling customer data, such as the EU Data Directive.

If the data is shared with other organisations, then it should be considered how this is done. Deliberately released data that turns out to infringe on privacy can have a huge impact on an organisation from a reputational and economic point of view. Anyone using third party cloud providers to store or process data needs to ensure that providers comply with regulations.

The main challenge introduced by big data is how to identify sensitive pieces of information that are stored within the unstructured data set, so it is crucial to bear in mind that security is a process, not a product. Therefore, organisations using big data will need to introduce adequate processes and apply traditional information lifecycle management that helps them balance effectively managing and protecting the data, as well as their customers’ privacy.