Skip to main content
search
0
Scalefree Knowledge Webinars Data Vault Friday Unit of Work in Data Vault

What Is the Unit of Work?

The Unit of Work is a fundamental concept in Data Vault modeling. It refers to the idea of capturing all the related business keys from a source system together in a single Link structure. This ensures that the relationships represented in the raw data are maintained intact in your Raw Data Vault.

Think of it as keeping all pieces of a puzzle together. When you break the pieces apart without a clear strategy to reassemble them, you risk losing the original picture—or worse, creating a picture that never existed in the first place.



The question posed in our Data Vault Friday session was:

“It would be good to refresh the topic of the Unit of Work. What is the information I’d be losing if I split wider Links into smaller pairs (DrivingKey -> Key1/Key2 etc)? I can think of a technical setup where I’d have multiple Links and Effectivity Satellites and I should be able to get the same results to any point in time… What information exactly gets lost if I split Links?”

Let’s start with a simple example to illustrate the point:

Imagine you have a relationship between three business entities—Customer, Store, and Product. Customers go into Stores and purchase Products. If you model this relationship as a single Link that includes all three Hubs (Customer, Store, Product), you have a clear and accurate picture of that transaction.

Now, suppose you decide to split this Link into two smaller Links:

  • Customer -> Store
  • Customer -> Product

At first glance, this may seem fine. However, when you attempt to reconstruct the original dataset by joining these two Links, you may create new combinations that never existed. For example:

  • Customer A visited Store K and purchased Product X (valid)
  • Customer A visited Store L (also valid), but when you join this with Product X, you create a false relationship—Customer A purchased Product X at Store L (which never happened in reality).

This phenomenon, where new, artificial rows are generated during joins, leads to data inconsistencies. You’ve effectively introduced relationships that didn’t exist in your source system. That’s the major risk when splitting Links.

The Jedi Test: Verifying Your Model

To avoid these issues, Michael recommends applying what he calls the Jedi Test (yes, Star Wars fans, rejoice!).

The Jedi Test is a simple but powerful validation: Try to reconstruct the original source dataset from your target Data Vault model. If you can recreate the original rows exactly—no more, no less—you’re golden. If you end up with extra rows or missing rows, you’ve got a problem.

This test helps you ensure that your Links are modeled correctly and that no data is lost or inaccurately recreated when splitting Links.

Driving Keys and Unit of Work

One scenario where splitting Links causes definitive problems is when you’re dealing with Driving Keys. Driving Keys are a subset of business keys that uniquely identify a relationship.

Michael explains that splitting components of a Driving Key almost always results in a Unit of Work problem. If you break apart a Driving Key, you risk losing the integrity of your data relationships. That’s why it’s best to keep all Hub references from the same source together in a Unit of Work Link within the Raw Data Vault.

He emphasizes that although he’s not a fan of Driving Keys (calling it a “hate-love relationship”), there are scenarios where they are necessary, and when they are, you must be cautious not to split them.

The best practice? Keep all Hub references from the same source table together in one Link—the Unit of Work Link.

This approach guarantees that the full relationship is captured as it appeared in the source data. You can always split these relationships later in the Business Vault, where transformations and additional rules can be applied. But the Raw Data Vault should act as your trusted, immutable source of truth.

By maintaining the Unit of Work in your Raw Data Vault, you protect yourself from potential data inconsistencies and ensure you can always rebuild accurate, reliable data structures downstream.

Sometimes, technical limitations force you to split a Link. For instance:

  • Your database system limits the number of Hub references you can include in a composite index.
  • You have descriptive data at different granularities, making a wide Link impractical.

In these cases, Michael suggests a few strategies:

  1. Keep the Unit of Work Link as your base Link.
  2. Create additional Links to capture specific granularity needs (Customer -> Product, Customer -> Store, etc.).
  3. If you split Links, ensure you also capture any missing relationships—for example, if you have Customer -> Store and Customer -> Product, you might also need Store -> Product to maintain the complete relationship picture.

These extra Links allow you to recreate the original dataset accurately if needed, but they add complexity. More Links mean more entities, more joins, and potentially slower query performance. That’s why Michael emphasizes the importance of an efficient model and recommends keeping Links together whenever possible.

Efficiency Considerations

From a performance perspective, fewer Links with more Hub references often result in faster queries. You reduce the number of joins and simplify your model. Conversely, more Links increase complexity and the potential for errors.

Therefore, the trade-off is clear: Simplicity and efficiency with a Unit of Work Link versus flexibility with additional Links—but at the cost of complexity and higher risk.

Final Thoughts and Recommendations

To summarize:

  • Always aim to keep Hub references from the same source together in a Unit of Work Link in your Raw Data Vault.
  • Perform the Jedi Test to ensure you aren’t losing or falsely creating data relationships.
  • If you must split Links, document and manage them carefully, ensuring you maintain all necessary relationships (even those not obvious at first glance).
  • Save complex splits and transformations for the Business Vault, where they can be more easily corrected without reloading the entire dataset.

At the end of the day, a well-structured Unit of Work ensures your Data Vault remains a solid, reliable foundation for analytics, reporting, and data science initiatives.

Watch the Video

Meet the Speaker

Profile picture of Michael Olschimke

Michael Olschimke

Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!

The Data Vault Handbook

Build your path to a scalable and resilient Data Platform

The Data Vault Handbook is an accessible introduction to Data Vault. Designed for data practitioners, this guide provides a clear and cohesive overview of Data Vault principles.

Read the Book Now

Leave a Reply

Close Menu