Null Business Keys
Data Vault is a methodology used for modeling data in large-scale data warehouse environments. It’s designed to handle rapidly changing data and easily scalable. One of the challenges faced by data modelers is handling null business keys in Data Vault. In this article, we’ll explore how to model links with null business keys, using an example scenario involving the Entity_Roles
table.
In this article:
Understanding the Challenge
The question we’re dealing with here revolves around how to model the Entity_Roles
table when there are null values in the linked data fields. In particular, the table might have records where certain fields like Project
or Task
are null, creating a challenge when attempting to create links between different entities. Let’s break down the situation and find the best way to handle these null values and ensure smooth data modeling.
Context: The Entity_Roles Table
The Entity_Roles
table holds information about the relationship between entities and their roles in different tasks and projects. However, there may be situations where a given record in the table has missing (null) values for certain business keys, such as Project
or Task
.
What Are Business Keys and Links?
Before diving into the solution, let’s first clarify what business keys and links are in Data Vault. A business key is a unique identifier for a specific business object, such as a project or task. Links are used in Data Vault to represent relationships between business keys. The Entity_Roles
table, for example, might act as a link between an entity, its role, and the task or project it’s associated with.
Approaching the Problem: Handling Null Business Keys
The key challenge here is dealing with the null values in the Entity_Roles
table. Null values in business keys can create issues when trying to establish relationships between entities. A typical solution in Data Vault is to replace these null values with zero keys. A zero key represents an unknown business key, allowing the link to still function properly despite missing data.
Option 1: Using Zero Keys
One approach to handling null values is to use zero keys for any missing business keys. For instance, if a Task
or Project
is null, we can assign a zero key to represent the unknown value. The advantage of using zero keys is that it helps maintain query efficiency by ensuring consistent joins, and it allows us to deal with missing data without breaking the link.
Option 2: Splitting the Link into Two Views
Another option could be to pre-stage and create two views to split the Entity_Roles
table: one where Task
is not null, and another where Project
is not null. However, this approach is not recommended because splitting the data may lead to unnecessary complexity and potential inconsistencies in the data. In general, it’s a good practice in Data Vault to avoid splitting data at this level.
Option 3: Handling Multi-Activity Data
If the Entity_Roles
table contains multiple rows for the same relationship (e.g., an entity linked to both a task and a project), you may need to account for multi-activity data. This can be done by using a Multi-Active Satellite, which allows you to capture multiple descriptions of the same link. This ensures that you can store all the variations of a relationship, such as different roles or project-task assignments, in the same link.
Designing the Solution
When it comes to designing the solution, there are a few key design decisions to make:
1. Create a Single Link
Rather than splitting the Entity_Roles
table, it’s generally better to keep it as a single link. This approach allows you to maintain all relationships in one place and makes the system simpler to manage.
2. Using a Multi-Active Satellite
If there are multiple active roles or tasks associated with a single entity, then using a Multi-Active Satellite is a good option. This satellite would store all the variations of the role-task-project relationships, allowing you to track all relevant data points.
3. Handling Null Values with Zero Keys
As mentioned earlier, replacing null values with zero keys is a good practice in Data Vault. This helps maintain referential integrity, ensures that your links remain intact, and avoids the need for complex filtering or splitting of data.
4. Use of CDC Satellites for Intraday Changes
If your data involves intraday changes (i.e., updates that occur within a single day), you may want to use Change Data Capture (CDC) Satellites. These satellites track changes at the record level, allowing you to preserve the history of the relationships between entities, roles, tasks, and projects.
Additional Considerations
There are a few other considerations when modeling links with null business keys:
1. Should I Create a Weak Hub for ID?
No, it’s generally not recommended to create a weak hub for the ID. Instead, treat the ID as a descriptive attribute and store it in the satellite attached to the link.
2. Where Should I Put Descriptive Data?
Descriptive data should go in the satellite. This can include things like roles, entity types, or other attributes that provide more context to the link between entities.
3. Should I Put the Role in the Link?
If the role is an important business key, you could consider putting it in its own hub. However, if the role is just a descriptive attribute, it’s best to store it in the satellite attached to the link. If the role becomes a business key later, you can refactor the design and create a hub for it.
Final Thoughts
Modeling links with null business keys is a common challenge in Data Vault. The key is to handle null values appropriately, whether that’s by using zero keys, leveraging multi-active satellites, or managing intraday changes with CDC satellites. By maintaining a clear and consistent approach to handling these null values and relationships, you can ensure that your Data Vault design is scalable, efficient, and capable of handling complex data scenarios.
Watch the Video
Meet the Speaker

Michael Olschimke
Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!