Big data made the cover of The Economist over four years ago now with the headline “The data deluge”. Considering what else has happened in the last four years, I can’t help but feel that we haven’t made big data universal yet. That in turn raises the question a) why not? and b) should big data be for everyone?
This brought me back to something I’ve blogged about before: how IDC framed big data with two simple questions:
- Do you know what questions to ask your data?
This sounds an obvious question but helping people understand the questions you can ask is an important part of making big data for everyone. For example:
- “Tell me everything this customer has ever done”
- “Show me how market conditions affected sales”
- “What’s the entire data center capacity right now”
- “I need to see a global view of security risks”
- Can you ask these questions without big data?
If big data is to deliver on its promise then there are some potentially significant and critical challenges that need to be addressed. For example:
- “There is too much data/it’s moving too fast for it to be feasible”
- “It takes too long, is too expensive and the data is all over the place”
- “It is just too difficult for me as a non-technical user”
If we take those two questions and apply them to a wide range of potential users of big data we see that, if big data is to become universal, it needs to be packaged, presented and consumed in different ways. Taking the example of big data in Hadoop, consider the following possible uses:
If you’re a data scientist or analyst right now then you’re in demand. The ability to business and operational questions into meaningful insight and results from big data is a hot topic. The bad news is that you’re only going to become more popular and more will be asked of you. Data scientists are going to need to search big data and package it up as business analytics and operational intelligence.
The packaging, presentation and convenience of consumption of Hadoop data rests with architecture. How do you let Hadoop data coexist with what you have (using principles like schema on the fly without the ETL, fixed schema challenges of data warehouses) and yet make it available for everyone (e.g. allowing questions to be asked by data scientists, data visualised by marketing and developers to build apps on top of data).
IT Ops will be both a supplier and consumer of big data. If big data does become something for everyone, it will be IT Ops’ job to make sure this mission critical system that business decisions rest upon is always available. However, IT Ops may well choose to keep all their historical data in Hadoop and consume it in order to spot patterns of potential outage and correlate data from cloud/virtualised and on-premise infrastruture to ensure up-time and resolve issues.
If you’re building an application (be it web, mobile or otherwise) then you’re probably going to be asked to get some information out of Hadoop via MapReduce. I’d suggest that you’d probably rather use REST or even build the app, in your language of choice in an SDK on top of your big data. Call them “big data apps” or just consider Hadoop as a data source – you want that data consumable and packaged in the way that suits you as a developer.
With customer experience a hot topic and consumers becoming more informed – how does marketing find the value “needle” in a haystack of big data? Marketing will have more data to analyse than ever before from mobile devices, social media, etc. Marketing is going to have to use this data for marketing analytics, data driven next best action, improved personalisation and more accurate marketing campaigns to improve conversion rates.
Security – with all of this information – enterprise security is increasingly a big data problem. Most security teams now have more information and events to manage than ever before and big data needs to be packaged so they can combine and correlate data into real-time security analytics and present this as real-time threat intelligence.
Increasingly we all want to see the data someone has on record about us. This could be everything from all an individual’s historic purchases with an e-commerce site through to their personal health records. This is a big data issue in terms of the normal “v’s” of big data but, to stretch the “V’s” a bit further, the “validity” and “vibe” of the data. Is the data that an organisation keeps valid (and what is the impact of invalid data) and the vibe (what does an individual feel about this data being stored). Organisations are going to need to get the right balance of transparency, security and using the data for the right things.
Let’s be honest, most people use Excel to collect, explore and analyse their data. If we’re ever go crack the problem of putting all the operational big data somewhere everyone can use it (that isn’t a cloud-based spreadsheet) – it has to be as familiar as Excel or Google Docs to use, share and create data visualisations. Alternatively, if we do stick with spreadsheets until the end of time, it would be invaluable to use Hadoop as a data source.
My view, having spent some time at events talking to people, sitting in on some analyst events and thinking about the subject for this post is that we’re getting there. If we’re going to go beyond big data just being able to store lots of data, self service analytics for everyone could be the first “killer app”. As they say, patience is a virtue and if big data is going to be for everyone and deliver on its promise then ease of consumption and time to value from the data is probably going to be make or break.