Organisations are increasingly becoming data driven. For commercial enterprises, data is effectively a competitive weapon that underpins innovation and differentiation. Data-driven companies are rapidly gaining market share and big data is no longer a nice to have, but a necessity.
Hadoop is at the centre of the big data revolution and is changing how data is stored, processed and analysed. Hadoop represents a new data and compute stack that provides huge operational advantages and is being used to change how organisations compete.
Data across the enterprise is growing quickly both in terms of volume and data types. Historically, new applications and data sources have resulted in the creation of dedicated information silos. Organisations are struggling today with multiple fast growing data sources with machine generated log files, sensor data, and social media as just a few examples. Instead of erecting more specialised processing and analytic silos to deal with this growth, visionaries are instead deploying enterprise data hubs.
An enterprise data hub provides a nexus for data sources. The hub may contain data from CRM systems, websites, manufacturing systems as well as external data such as social media, and a myriad of other unstructured data including text and video.
One of the initial uses for a data hub is to offload processing and data storage from more expensive systems. For example, a data hub can act as an offload area for the Extract, Transform and Load (ETL) processes that prepare data for analysis within data warehouses.
Instead of loading large volumes of raw data into a data warehouse and perform complex transforms there (an ELT process), significant speed and cost savings can be realised by performing the transformations directly on the Hadoop cluster. Additional savings are realised by offloading “cold” data from a data warehouse. The typical cost per terabyte of data contained in a data warehouse is £10,000 and more.
In contrast, data can be offloaded to Hadoop for a few hundred pounds, as long as the Hadoop platform has the requisite data availability and protection features the data can be stored long-term with confidence. An enterprise data hub can also support a range of analytics that are performed directly on the data.
In essence, Hadoop allows you to load all these different data sets into an expandable cluster of servers and then distribute computational, analyses or indexing workload across the different servers and data sets. These are applications that combine operational processing as well as analytics to solve a pressing business problem.
For example, comScore, a digital marketing intelligence provider, uses Hadoop to process over 1.7 trillion Internet and mobile records every month. comScore uses this data to produce reports that allow their clients to gain behavioural insights into their mobile and online customers. The move to Hadoop removed several key bottlenecks resulting in a 10x increase in computation speed.
Another example is Cisco which uses Hadoop as part of business intelligence processes across globally distributed large data sets, including structured and unstructured information. The complete infrastructure solution focused around Hadoop lets Cisco analyse service sales opportunities in 1/10 the time, at 1/10 the cost; generated $40 million in incremental service bookings in the current fiscal year.
This is no accident that organisations such as comScore and Cisco have invested in Hadoop as the basis for platforms capable of delivery of new insights. The tangible cost benefits, reduction in complexity and ability to scale are all key benefits of the data hub and compelling reasons to examine the Hadoop technology.