As news of a major data breach involving the computer system of a major US Department of Defense contractor unfolds – with 24,000 `sensitive’ Pentagon files being lost in the process – the data loss is a classic example of what can happen when large volumes of sensitive data are not adequately identified and protected.

Although the exact details of the data loss are still being revealed, it is likely the Pentagon is in the process of trying to identify how much of the data that was stolen was sensitive and where it was stolen from. Much of the stolen data was also likely unstructured or semi-structured, residing on file share accessible throughout the organization.

With 26,000 employees, the Pentagon generates a vast amount of digital data every single day, attempting to manage and protect all of this data without some type of automation that provides visibility into what data is sensitive, where it resides and the access permissions settings leaves it vulnerable.

Unstructured data is typically comprised of documents, emails, spread sheets, images, videos, etc. – much of this data is stored on File shares. The problem with unstructured data – which typically accounts for 80 per cent of data in a large enterprise – is that it been a challenge for IT to manage and protect with native OS auditing features.

While you can say that a specific employee accessed a given database record at 13:37 last Tuesday afternoon quickly with many applications, the same level of auditing is virtually impossible be applied to unstructured or semi-structured data without some form of software automation because of poor native auditing functionality and the sheer volume of the data files involved.

And this is the kind of data, when not identified and managed, that can enable someone to grab 24,000 sensitive Pentagon files by simply downloading large chunks of data from a file share.

It is interesting to note that the data went missing while under the stewardship of a contractor, which suggests that the data was possibly stored in a secure private cloud-like repository somewhere in the US.

Private clouds are extremely popular in major enterprises on both sides of public/private sector divide, as they allow access to similar economies of scale that you get with – for example, Amazon’s public cloud service – but you are able to impose your own levels of security on the data repositories, since you are effectively contracting the data storage out to your own rented datacenter.

No-one is saying what the data was exactly, or who was responsible for the successful data incursion. One thing is for sure, however, and that is that the new cyber security rules that the US DoD has drawn up will only stop this sort of thing happening again provided they are implemented with technology that shows organizations what data is sensitive, where it resides, who has access to it and who is using their access.

It will be interesting to see what levels of authentication, authorization, and auditing these new rules impose on the US government’s data centres and contractors. Anyone that handles large volumes of data will be watching the news reports with great interest.