With more than two billion Internet users globally, a number growing by the minute, an astonishing amount of data is created on a daily basis. Every minute, YouTube users upload 48 hours of video, Facebook users share 684,478 pieces of content, Instagram users share 3,600 new photos, and Tumblr sees 27,778 new posts published. That’s every minute of every day.
All this data comprises an important new economic resource – “Big Data.” And no wonder Big Data is on everyone’s to-do list. Almost weekly there is yet another new Big Data startup or a new powerful-sounding piece of technology that will help turn your Big Data into valuable business insight. However, although Big Data is on the minds of just about everyone, IT departments are still struggling to find ways to capture, manage, make sense of and ultimately get cash value from all this data and information.
In response, let’s address the fundamentals of how to generate value from it by working within your means – and with the tools, systems and skills you already have. The key: everything needs to be evolutionary, not revolutionary. Here are my top common sense recommendations about what to do, and to make that slogan actually work for you in terms of your Big Data ambitions.
First, beware of technology in search of problems. We are enjoying an explosion of new software technology, made accessible via Open Source. The flip side of this innovation is that the wealth of new ‘toys’ distracts us from the actual business problem. The message has to be, don’t become so enamoured with technology that you forget to solve the immediate problems in your business; identify those first, and then consider how to apply any great new technology to address them.
Second, don’t be so focused on the future you forget the past. Semantic Analysis, for example, is a very attractive activity. But what about paying attention to the problem of how to integrate existing data sources, like data warehouses or operational systems with the new Big Data sources, first?
Third, pick useful, relevant problems to address. So think about gathering CRM data and combining it with your Web traffic history to see if there is a correlation between clicks and revenue, for example. That kind of practical focus can give your Big Data project a solid foundation.
Fourth, given the sheer volume of Big Data, it is important to identify what should be kept and what should be thrown out. Take machine status logs, for example: even though the frequency readings in the log are in milliseconds, collecting a thousand “status OK” records every second has no value.
Don’t worry if not all the data you are collecting can’t be understood, at least not yet. Unstructured data sources (e.g. email, blogs and other forms of business ‘chatter’) will eventually be analysable as quickly as a SQL table is now – but the technology is still in its infancy, albeit it is developing fast. Keep collecting the unstructured material while focusing your attention on the structured data to get some encouraging early wins. That’s to say, look for success at every turn rather than failure and roadblocks and position yourself to be better and brighter.
Learn to match the Big Data problems you want to solve with the skills you already have in your toolbox. So that means don’t waste time lamenting you don’t have the right skills or worry about how long it will take to get up to speed with new ones, e.g. Hadoop’s MapReduce programming model. Certainly, do go ahead and install Hadoop and learn MapReduce, but see what you can already accomplish just using HIVE, because it’s easier.
Above all, don’t be afraid of making a mistake. Rather, view any mistakes you make as a fruitful learning exercise. Ideally, your goal should not be to run away from any problems, but run toward them in order to beat them. After all, in essence the analytic process is trial and error; if you succeed, refine your analysis, if you fail, also refine your analysis. In both instances, you’re better equipped for the next iteration.
Finally, the single most valuable bit of Big Data common sense I can give has nothing to do with product, technology or strategy. It is simply this: engage in Big Data discussions with the business whenever possible, in order to move forward productively, minimising customer churn and maximising customer loyalty. Why? Because if you don’t, you risk being outpaced by your competitors, who are chasing the same customers in the same tough market you are.
And let’s close on the long-term view with some advice for today’s youngsters. The future of Big Data looks brighter than ever and will be filled with high-paying jobs for years ahead, so tell your children to become statisticians, not lawyers!