Often when you see this sort of heading you expect to read something like “long live the EDW” later in the piece. Not this time. The EDW (enterprise data warehouse) is dead. Period. Like a dodo. Like Monty Python’s parrot.
This came up last week at SAS’ analyst conference in Athens. I was having dinner with Keith Collins, CTO of SAS, and he asked me what I thought the future of the EDW was. I said that I thought that it had no future. You might think that this would have led to a long and interesting debate. It didn’t: he completely agreed with me. We swiftly moved onto other topics.
There are two ways to look at this. The first is to look at the broader picture. In this context there are three types of data that you want to analyse: structured, unstructured and streamed. Of course, these terms are hopelessly confusing. What is the difference between a 140 character product description, stored in a table in a relational database, and a tweet?
The truth is that data is neither inherently structured nor unstructured. We extract structure from data by using BI or search or similar capabilities and we impose structure by the way we store and manage data. Storing data in a relational database imposes structure on data, which is reflected as metadata.
What Hadoop is doing is imposing structure on so-called unstructured data. Streamed data is also structured, it’s just that the metadata is external: we know that a stock tick consists of a stock symbol followed by the value of the tick.
So, what we are talking about here is how you store and process data for query purposes and structured, unstructured and streamed data (assuming we can’t get rid of this terminology) have very different requirements. It is just about possible to conceive of a platform that supports relational, Hadoop-like and streaming data at some point in the future but it’s not going to happen soon-if it ever does. So, certainly for the time being, there is no prospect of an EDW supporting all of these different types of query processing.
However, even if that was conceivable, the EDW is not going to make a comeback even if we look at the narrower field of structured data alone. There are two reasons for this. The first is that I don’t think the concept of a single monolithic EDW was ever the right one. It was too time consuming and expensive to set up and run.
But even if you think that it was conceptually the right approach it has been overtaken by practical considerations: in particular, data marts and application-specific appliances have proliferated throughout large enterprises and there is no way that that is going to change. Further, data virtualisation and federation are now mature technologies that allow federated query access across data marts and warehouses so there is no longer any incentive to try and centralise everything.
Of course, smaller organisations might get away with a single data warehouse but this would be an SMEDW and, no doubt, there will be diehards who have invested so much money in their existing infrastructure that they are afraid to admit that they got it wrong but, going forward: RIP the EDW.