One of the more difficult tasks when designing and building a data warehouse for an asset manager, insurance company or some other financial institution, is to create a data model that allows your data warehouse to fulfill the needs for as many users as possible. The data model is by far the most important part of your entire data warehouse system. If the data model is inadequate, then sooner or later your system will be unable to properly store the data required to create the reports or do the analysis that the users want to do, which in turn will render your financial data warehouse useless.
Over the years, I’ve seen a lot of projects for financial data warehouses come and go. One thing that many projects have in common is that there is wide gap between the business side and the IT side. One side understands the needs of the business and the other understands the technology, but neither understands both and the communication between them is sometimes poor. The result is often an inadequate data model and disappointing project deliverables.
I thought I’d make an effort to bridge that gap and describe a basic data model for a financial data warehouse along with some hopefully useful tips. Please note that this is a very broad description and obviously requires more work to fit your organization’s specific needs. Also note that the definition of data warehouse that I’ll use might be a little different than your traditional one. Rather than viewing a data warehouse as an architecture, I’ll use a more real world definition;
A data warehouse is a repository of data that enables all the users in your organization to do reports and analysis better.
The keyword here is “all.” In order to create something that will be truly useful and provide a long-term strategic benefit, you need to design your repository in such a way that it can used by the entire organization and not just by a small subset of users. This will ensure that you will be able to continually build on your previous work rather than starting over every time the requirements or conditions change significantly. The ability to continuously build on your previous work is one of the major points of differentiation between a data warehouse and a data mart.
Before we start, I’d like to share some general advice or strategies that I have found to be useful over the years.
1. Model your data so that you can reuse it as much as possible. This means applying as little logic or aggregation to your data as possible, prior to storing it. This will allow you to reuse the data in your repository across your organization for as many different cases as possible. If you require to unify the data before providing it to users, that should be done when your system reads the data, not when it stores it.
2. Complexity is the real problem, not data volume. Very, very few financial organizations have a need to store more than 10 Tb in their data warehouse and that’s well within the capabilities of all the major database vendors. So, design to be able to deal with complexity, not volume. (This, however, does not mean that you should bloat your database. I’ll have more to say on that later.)
3. Make sure that you can store all the data required by your organization. Being able hold all the data required for a reporting or analytical task is the key. If your repository does not hold or cannot hold the data the user needs, then the user has no use for your warehouse and has to create a similar repository to solve his/her needs, defeating one of the core purposes for building the warehouse in the first place.
Some general notes:
1. The natural key for securities should never be Reuter, Bloomberg or ISIN code. The natural key should at least be the combination of ISIN, exchange, currency and series/class.
2. Some derivatives have multiple legs which will produce more than one row in your positions view regardless of your choice of natural key.
3. Make sure that you include the source for every record.
4. Depending on your business, you might need to have multiple physical structures for at least positions and transactions. If not, you could potentially bloat your data warehouse. You structure your data differently if you are working with 10 000 accounts than if you are working with 10 000 000 accounts, and your data warehouse needs to support the optimized kinds of structures as well.
In addition to this basic structure, you need to create a supporting structure that can hold data that your core model does not support initially. This supporting structure is a value-pair structure that will allow you to easily and dynamically expand your data model as new requirements (or past mistakes) are discovered. In terms of performance, this additional structure will not be as fast as the rest of your data model, so at some point you may need to consider migrating some of the new fields into your core data model to improve performance.
These two building blocks together with the additional tips will give your data warehouse project a good starting point and help you steer clear from some of the mistakes I myself have made.
If you have any feedback or comments, don’t hesitate to send me an email at jonas.olsson[a]graz.se.