The Business Vault is the layer in the Data Vault 2.0 architecture where business logic is implemented to transform, cleanse and modify the data.
The book “Building a Scalable Data Warehouse with Data Vault 2.0” by Scalefree’s founders Dan Linstedt and Michael Olschimke and the Data Vault 2.0 Boot Camp shows how to implement such business logic using various Business Vault entities, such as computed satellites.
However, it is worth to note that this is only half the story, half the knowledge. The book shows computed satellites (and other entities) with a load date in the primary key of the computed satellite. Such satellites are great for capturing the results from business logic that is applied on the incoming deltas. However, there are two different types of granularities for business logic in the Business Vault:
- In some cases, business logic is applied on incoming deltas. For example, if phone numbers should be cleansed, the data cleansing rule is typically applied on all incoming deltas, regardless if the delta is actually used in information delivery. Sounds weird, but due to the capabilities of SQL optimizers, it is actually the more efficient approach.
- In other cases, the granularity of the business logic doesn’t rely on the incoming deltas – instead it relies on the outgoing information.
The second granularity can be best described by a practical example: consider the daily calculation of a customers lifetime value. The formula depends on time, e.g. the more time has passed since the last order, the lifetime value is decreasing until the customer issues another order.
In such case, the lifetime value will decrease, even though there is no new data is arriving in the Raw Data Vault – therefore, there is no incoming delta. Instead the value should be calculated for each customer on a regular basis (e.g., daily). And this regular basis actually matches the granularity of the snapshots defined in information delivery, which are identified by the snapshot date.
With that in mind, the computed satellite is modified: by replacing the load date from the primary key by the snapshot date, matching the granularity of the outgoing information.
For performance reasons, the computed satellite is often used in conjunction with a PIT table. Its worth to recognize that the primary key of such computed satellites actually matches the PIT’s alternate key, which is the hash key of the PIT’s parent and the snapshot date. In turn, the information-based computed satellite described, can be joined against the alternate key of the PIT table in information delivery.
Conclusion
There are more advanced techniques available when developing the Business Vault. However, they are very helpful when designing the data flows from the source system into the information asset to be produced for the business.