Whether specialising in shoes or cellular phone service, auto insurance or auto parts, businesses everywhere seek to better understand their customers – what they buy, how they evaluate products, what they like and dislike.

And ever since the first online shopper pointed and clicked to make a purchase, the Internet has opened perhaps the largest window in recent memory on what consumers are doing and thinking. Mining clickstream data has become de rigueur for companies seeking intelligence about their customers, with wide variation in terms of analytic sophistication.

Urgency + Exploding Data Volumes = Today’s Analytic Challenge 

Beyond collecting data about shopping carts and click throughs, companies now need to know things like how well their latest mobile ad campaign is performing, how their brand is faring on social networking sites such as Facebook and Twitter, and what people are saying about their products on online reviews and comparison sites.

They also need to be able to analyse web-based information in conjunction with data from a variety of back-office systems, such as CRM or accounting. With decision cycles more urgent than ever, businesses also need to get this deep level of insight in as near real-time as possible so that they can take action to optimise online promotions, identify new revenue opportunities, and respond to potential competitive threats.

But getting the right information at the right time is complicated by the tsunami of online data that businesses must now wade through to extract targeted intelligence. In addition to click stream data, this includes terabytes of information generated by a host of applications and sources, including social networking and gaming sites, smart phones, e-mail servers, and even devices such as Xboxes and PlayStations.

How can companies capture, integrate, mine and analyse millions of rows and terabytes of data generated by thousands of devices and millions of transactions? How can they quickly transform this data into intelligence? And how can they do it without spending a fortune?

Wait… Aren’t there lots of web analytic tools?

There are, of course, both free web analytic tools such as Google Analytics or fee-based products from companies like Omniture or Webtrends that provide aggregated reporting on web traffic and clickstream data, including what content is being viewed by how many visitors; time-based summaries and comparisons; and information about how visitors are getting to a website and where they go from there. They can also keep track of visitor actions, such as how many clicked to download a white paper or respond to an ad.

But because these tools usually have pre-defined queries and look at everything from the point of view of the website (providing a “page-based” view of the data), their insight into customer behaviour is limited. While businesses can see which pages are of most interest to the most people or understand overall traffic trends, they can’t answer detailed questions about particular customers or groups of customers.

For example: “What are the characteristics of my best customers, and do my best customers use my website differently than other customers?” Or “Can I predict from what visitors do, who is most likely to move from a lead to a customer?”

Without an ability to run custom queries and integrate online data with other sources, businesses can’t do the kind of ad-hoc and iterative analysis that provides the most valuable and competitive insight.

The New Analytics Paradigm

In online and mobile-dominated environments, analytics need to be faster, simpler and more resource efficient, especially compared to the hardware-intensive data management and business intelligence platforms that have come before. They also need to be detailed enough address the dynamic information demands of today’s businesses. The new analytics paradigm demands the following:

Flexible querying capabilities – Because traditional data warehousing and management solutions are typically built to address specific tasks (such as producing financial trend reports) they are not well suited to the web, where intelligence needs are constantly changing and near real-time analysis is critical. Retrofitting these solutions to handle ad-hoc queries requires an enormous amount of manual fine-tuning as database designers must create indexes, partition data and perform other work to ensure fast query response. Indexing and partitioning also increase database size, in some cases by a factor of two or more. This usually means that you need more processing capacity and certainly more storage. For online analysis, users need to be able to answer many types of questions and achieve predictable performance regardless of whether the query was prepared in advance or thought up on the spur of the moment. Thus, a simple yet flexible way to query data is critical.

Fast data loading – Web analytic applications must also be able to load large volumes of data as fast as possible. Detailed intelligence is useless after all, if it arrives too late. Traditional batch processing is not an ideal loading option when there are low latency requirements, so organisations need to look at different approaches such as trickle feeding, micro-batches, query-while-load or parallel loading capabilities.

Efficient ways to handle big data – Increasing data volumes are bumping up against the ability of most organisations to store and analyse it all. Continuing to throw more servers at the problem creates massive infrastructure footprints that are extremely costly to scale, house, power and maintain. Columnar databases (which store data column-by-column rather than row-by-row) have emerged as an alternative architecture for high volume analytics. Because most analytic queries only involve a subset of the columns in a table, a columnar database focuses on retrieving only the data that is required, thus speeding queries and reducing disk I/O and computer resources. Combined with other innovative technologies, these types of databases can enable significant data compression and accelerated query processing. This means that users don’t need as many servers or as much storage to analyse the same volume of data – a particularly compelling capability when it comes to web analytics.

Support for data diversity – Finally, fully understanding customer behaviour demands a single view of their demographics, history and activities across diverse channels and information silos. These sources may include traditional clickstream data, event and log data, third party analytic sources (such as Google), mobile activities, as well as information from back-office systems. Data must therefore be properly integrated and transformed before running queries, requiring tools that support ELT (extract, load and transform) capabilities.

Businesses operating in the online and mobile arena require analytic solutions that provide the data integration, transformation, storage and query performance capabilities that take them beyond click stream analysis. Speed, affordability, simplicity and low maintenance are equally important so that users can get the timely answers they need – without going to great expense or draining IT resources. Approaches that address these next generation requirements will help organisations more effectively meet the challenges of a Web 2.0 world.