Understanding Non-Historized Links and Satellites
In the world of data vault modeling, we often encounter situations where we need to deal with non-historized links. A non-historized link is a structure used to capture relationships between entities (hubs) in a data vault. Unlike historized links, these structures are not designed to track changes over time, making them particularly useful in certain high-performance scenarios.
However, when integrating with systems that can update transactions for recent periods, additional structures like non-historized satellites become essential. These satellites allow us to track updates to attributes associated with the link while maintaining the integrity and performance of the data model.
In diesem Artikel:
Design Considerations for Non-Historized Links
The design of non-historized links requires careful planning, particularly when dealing with wide datasets or platforms with specific storage constraints. A wide link, for instance, might contain attributes like transaction ID, account segments, accounting document, accounting period, and timestamps. These attributes uniquely identify a transaction and establish relationships with other entities in the data vault.
Here are key considerations when designing non-historized links:
- Include Essential Identifiers: Attributes required to uniquely identify a transaction and establish relationships with hubs should always be part of the link. This typically includes the transaction ID and any necessary alternate keys.
- Balance Row Width: Including too many attributes in the link can lead to performance issues on row-based storage systems, as wide rows reduce the number of records stored per page.
- Use Satellites for Additional Attributes: Attributes that are not essential for casual queries or dashboard aggregation should be moved to a satellite structure to keep the link lean and efficient.
Role of Non-Historized Satellites
Non-historized satellites play a critical role in complementing non-historized links. They store additional descriptive attributes that are not frequently queried or aggregated by typical users. This approach minimizes the join operations required during querying, improving performance.
For instance, while attributes like debit and credit amounts might be included in the link due to their importance in aggregations, other less frequently used attributes, such as transaction type or invoice descriptions, can reside in the satellite. This division ensures a balance between storage efficiency and query performance.
Query Performance Optimization
A significant design challenge lies in ensuring optimal query performance, particularly on column-based storage systems. Joining a link with its satellite can significantly impact performance. To mitigate this, consider the following:
- Prejoin Frequently Queried Attributes: Move attributes that are commonly used in aggregations or group-by operations into the link structure to eliminate the need for joins.
- Tailor Satellites for Specific Use Cases: Design satellites to serve specialized needs, such as data science or detailed analysis, where users can afford the higher cost of join operations.
- Adopt Hybrid Storage Strategies: Leverage hybrid storage solutions, such as Snowflake’s mini partitions, to combine row-based and column-based advantages and support wider rows for frequently queried data.
Practical Example
Let’s consider a scenario with a financial transactions table containing over a billion rows. The table includes attributes such as transaction ID, account segments, accounting document, accounting period, debit amount, credit amount, and timestamps.
Here’s how we can structure the non-historized link and satellite:
- Non-Historized Link: Include transaction ID, account segments, accounting document, accounting period, debit amount, and credit amount. These attributes are essential for aggregations and dimension references.
- Non-Historized Satellite: Store additional attributes like transaction type and invoice descriptions, which are not frequently queried by casual users.
This structure ensures that typical queries for dashboards and reports are efficient, while still supporting detailed analysis for specialized users.
Schlussfolgerung
Designing non-historized links and their satellites requires a deep understanding of data usage patterns and storage platform constraints. By carefully selecting which attributes to include in the link and which to offload to a satellite, we can strike a balance between query performance and storage efficiency. For most scenarios, prioritizing attributes that support frequent aggregations and groupings in the link structure is key, while satellites serve as a repository for less critical details.
As data modeling continues to evolve, adapting these principles to modern storage technologies and hybrid platforms will be crucial for building robust and performant data vault systems.
Das Video ansehen
Treffen mit dem Sprecher
Michael Olschimke
Michael has more than 20 years of experience in Information Technology. During the last eight years, he has specialised in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!