Federate Multiple Sets of Non-Historized Data
Welcome to another edition of Data Vault Friday! In this session, we’ll address an interesting question: how to integrate a second system into an existing setup with a non-historized link. This specific case involves transactions that can be updated for the last two months. We’ll explore the necessity of satellites, the role of non-historized links, and the concept of federating data in the business vault or information mart (BV/IM).
Let’s break it down, discuss the challenges, and provide actionable solutions.
In this article:
The Challenge
The question revolves around a scenario where two systems provide transactional data. The goal is to integrate these datasets into a single fact entity in the information mart, enabling aggregation across both systems. A specific challenge arises when transactions from one system need to be updated within the last two months, necessitating a satellite to track those changes. So, how do we effectively design and implement this integration?
Understanding Non-Historized Links
Non-historized links play a central role in integrating datasets across source systems. These links typically store only insert operations, ensuring that facts—such as transactions—remain immutable. However, the reality is often more complex, especially when updates or changes occur.
For transactions that change, the non-historized link’s primary purpose is to maintain a consistent structure. Attributes frequently used for aggregations, such as amounts, are stored directly in the link structure. Meanwhile, descriptive attributes that casual users don’t aggregate (e.g., line descriptions) are placed in satellites to keep the link slim and optimized for storage and performance.
Federation Across Systems
One of the first decisions when integrating two systems is determining whether their data can share the same link structure. The key considerations are:
- Semantic Consistency: Do both systems provide the same type of transactional data?
- Granularity: Are the transactions at the same level of detail?
- Key Uniqueness: Is there any overlap in identifiers that might require adding source-specific elements like a tenant ID?
If the answers indicate alignment, both systems can load into the same non-historized link. However, each source system would have its own satellite, ensuring clear separation of descriptive attributes.
Handling Updates in Transactions
Updates to transactional data pose a significant challenge, particularly when attributes such as amounts or hub references change. The typical strategy involves **technical counter transactions**, which ensure insert-only behavior while maintaining an accurate representation of changes. Here’s how it works:
- For a new transaction, insert the original record into the link.
- If an update occurs (e.g., an amount changes from €5 to €7), insert two additional records:
- A “counter” record negating the original amount (e.g., -€5).
- A new record reflecting the updated amount (e.g., +€7).
- Aggregations will reflect the latest transaction value while retaining a complete history of changes.
This approach keeps the data model consistent and avoids direct updates to the non-historized link.
Federation in the Business Vault and Information Mart
While the raw Data Vault integrates data from multiple systems, the Business Vault (BV) and Information Mart (IM) are where federated datasets shine. By standardizing and transforming data, you can provide end-users with a unified view of all transactions.
The information mart combines the transactional data from both systems into a single fact entity. This entity supports aggregations across all transactions, ensuring that analytical queries deliver accurate and actionable insights.
Best Practices and Recommendations
To ensure a robust implementation, consider these best practices:
- Design for Scalability: Keep the link structure slim to optimize performance, especially with large datasets.
- Clear Satellite Separation: Create one satellite per source system to maintain clarity and prevent mixing attributes from different sources.
- Use Technical Counter Transactions: Handle updates efficiently while preserving insert-only behavior.
- Test Aggregations: Validate the unified fact entity in the information mart to ensure accurate reporting across systems.
- Monitor Data Quality: Regularly check for inconsistencies, especially when integrating systems with different update patterns.
Watch the Video
Meet the Speaker
Michael Olschimke
Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!