The suggestion is going around these days that with the proliferation of “big data,” data warehouses are fast becoming obsolete: today’s version of the mainframe (or even the buggy whip). It’s an understandable perception, since there is no doubt that big data puts a massive strain on organizing and data processing capabilities. But in the financial industry especially, it’s important to dig a bit below that first impression.
Data warehouse systems can’t handle really big data, that’s true, and new methods of handling such vast amounts of information have had to be developed. But that doesn’t mean that data warehouses have become obsolete, any more than railroads were abandoned when the automobile or the airplane were invented. A second look at the situation shows that data warehouses have plenty of applications in the real world, where in most instances data streams too small to be considered “big data” are still the most important factors for decision-making.
Big data refers to the task of managing huge volumes of data and is increasingly important in science and in certain kinds of commerce or by social media. A common dividing line is 100 terabytes of information. In strict accuracy, “big data” refers to data sets too big for common software tools to manage within an acceptable period of time, so the size in bytes is somewhat fluid as processing capacity improves.
What is certain, however, is that big data refers to a lot of unstructured data, and while that kind of volume is of increasing concern, it doesn’t describe all of the universe of data processing requirements for business. The financial industry in particular seldom deals with anything close to 100 terabytes of data. What’s more, data warehouses are designed for a related but somewhat different task: dealing with structured data from operational systems, rather than high volumes of raw data from immediate transaction sources. Most organizations deal with no more than 15 terabytes of data, and that is well within the capacity of the data warehouse model of processing, and can benefit from improved methods.
In addition to size factors, the purposes for which data is processed in the financial industry and many others (things like compliance, legal reporting, and risk management) are better served by structured data than by raw volume of data, and this fits the data warehouse model of processing better than anything developed to deal with big data.
In short, predictions of the demise of the data warehouse are premature, and in fact we are likely to find more use for the data warehouse in the future than ever before, especially when considering
Some CIOs and lead architects are looking into big data because they have the impression that their data warehouses are becoming bloated and cumbersome. On examining the situation, one often finds that the problem can best be addressed by applying improvements to the way the data is organized, specifically tailored for the financial industry. The real problem for that industry and some others is not the volume of data but the complexity.
This is not to say that there aren’t times when financial companies need to store vast volumes of data, such as historical tick data. But in the vast majority of cases, the challenge isn’t that, but to get multiple computer systems working together without either losing or duplicating data. This is a complexity puzzle, and it’s best solved by applying the data warehouse model, not by resorting to the slick new technologies available.
The potential of big data is exciting, to be sure. However, banks shouldn’t see it as a replacement for data warehouse methods that have proven useful over time, and remain useful. Remember not to let the tail wag the dog: technology is a tool for a business purpose, such as profitability, security, compliance, and productivity, not an end in itself. Accordingly, financial institutions shouldn’t be quick to replace a proven and useful method with another, but instead examine the virtues of both.