Recently, Mathworks came out with a report (Source not available as of July 2018) where they surveyed the financial industry, and concluded that the quality of data, rather than the volume of data, is by far the biggest data-related challenge financial institutions face. For anyone working with data warehousing at a bank or insurance company, this comes as no surprise: Data quality is widely considered to be one of the biggest challenges when it comes to creating and maintaining a data warehouse, especially in the financial sector.
Having worked for about 15 years with data warehousing and reporting in the financial industry, I’ve seen a lot of customers struggling with data quality. In my experience, one key mistake many organizations make, is that they have too narrow an understanding of what quality actually is, often equating data that is factually correct with quality data and not considering other factors that can lower data quality even when the facts are accurate. A close cousin to this mistake is assuming that there can be only one version of the data that is “correct”. This version is often referred to as “the truth”. A common consequence of this is that these organizations will often have a data warehouse that only will hold one single version of the value for each field.
The problem with this approach is that very often there are multiple valid definitions or interpretations for the value of a specific field, definitions or interpretations that all yield a result that is factually “correct,” but that may be in conflict with one another nonetheless. In some cases one user group (say, legal reporting) will want to use a certain price for a security and at the same time another user group (say, risk and performance) will want to use another price for the same security. If your data warehouse only supports one value per field, in this case price, then at least one of the user groups will be unsatisfied with the data and claim that the information is “wrong”.
Just before the holidays I met with a senior manager at an international bank who was experiencing exactly this problem. Their data warehouse only supported one definition of price and they had two influential departments with conflicting definitions of price. These two departments were locked in an ongoing battle. One department would register a change request to have a price changed to the close price. Two days later the other department would register a change request to have it changed back to mid price. Both prices were factually correct, but both departments were vehemently claiming that the price was wrong and had to be changed in accordance with their change request. During this struggle, the perception grew in the organization that the data held in their data warehouse was of poor quality (which it was, but not due to inaccuracy), much to the disappointment of the management who made a significant investment in their data warehouse to specifically address the continuous complaints in the organization over poor data quality.
The solution to this problem is not to try to unify all the data definitions for every department or to focus on data management, but to support multiple and parallel definitions of the same concept. In order for your data warehouse to be accepted as the default source of data for analytical and reporting purposes, the well founded needs of the users must outweigh the desire to establish a company wide definition of each piece of data, even if it at first glance might seem as a compelling proposition to have a ‘single version of the truth’. Without a proper support for multiple and parallel definitions of the same data entity, you risk not being able to capitalize on your investments in your data warehouse.
A data warehouse must be flexible enough to treat data in all of the ways that your organization requires. Failure to do this can result in poor data quality even when the accuracy of data has been assured with meticulous care.