Using Multi-Active Satellites the Correct Way

Multi-Active Satellites in Data Vault 2.0

With multi-active satellites, you’re able to store multiple active records for one business key. Depending on how the data arrives from your source, there are different ways to implement multi-activity in Data Vault 2.0. In this post, we’ll explain your options for modeling.

In this article:

What are Multi-Active Satellites?
When to use Multi-Active Satellites?
An Example
Different Options for Implementation
Multi-Active Satellites Rows
Type
Extra Hub
Add the Multi-Active Attribute to the Composite Key
Conclusion

What are Multi-Active Satellites?

Multi-active satellites are similar to standard satellites and their structure. As said before, they store multiple active records per key at a point in time. This exact structure depends on the use case though.
See the example Data Vault model in Figure 1.

Multi-active satellites - Hub with Satellite

Figure 1: Data Vault model

When to use Multi-Active Satellites?

For this reason, multi-active satellites are suitable for multiple active records. So, we use multi-active satellites when the source system delivers them. Furthermore, it is also appropriate for use when the delivery mechanism doesn’t provide enough metadata, for example by using XML.

An Example

Let’s say you get your customer’s data delivered in semi-structured XML files. Within it are several phone numbers in an array. However, the metadata doesn’t provide more information. So the number, type, and order of the records are unknown. When flattening the XML file, you will receive as many records as the number of tuples in the array (figure 2). If you create a standard satellite out of these, only one row per customer business key can be active at the same time. Otherwise, you would run into a primary key violation having the same hash key and ldts.

Figure 2 – customer.xml

Different Options for Implementation

To avoid these problems, and also be able to query the active phone numbers at any point in time, there are several options. These depend on whether a “multi-active key” is available though, in addition to performance issues and the usability of it when you query the data. In the following section, we’ll briefly describe them. That said, in the next post, we’re going to go more into detail.

Multi-Active Satellites Rows

In case you don’t have a multi-active key and you want to use multi-active satellites, you need to create your own one by adding a sub-sequence number per business key in the staging area. In addition to the hash key and ldts, you need to add this to the primary key (figure 3). Here, though, you have to pay close attention if the data changes in the next load. For example, if the order of the phone numbers changes, you cannot assign a specific phone number to a sub-sequence.

Figure 3 – Multi-active satellite with sub-sequence

Type

If it is possible to define type codes and add them to the source feed, the best approach is to add these to the primary key of the satellite. Here, you rarely need to change types as well. In our example, we would define types like ‘home’, ‘business’, ‘cell’, and ‘fax’ as shown in Figure 4. But remember, this must delivered by the source system. If not, you have to use the sub-sequences.

Figure 4 – Type code solution

Another option to avoid changing types requires pivoting in which you create different columns per type. Here, you receive the same granularity as the hub. But, while also considering the need for reengineering in case of changing types of attributes.

Extra Hub

The last way to solve issues regarding multi-active records is to create an extra, weaker hub for the multi-active attribute, in this case, the phone number, and link them to the main hub. This requires that this attribute qualifies as a business key. Here, the number of hubs may increase heavily depending on how many multi-active data sets per business key you have. Furthermore, the new hub might not present as a real business object. So, while this is not the first way to go, it is a feasible solution in some cases.

Add the Multi-Active Attribute to the Composite Key

It’s not recommended to simply add the multi-active key to the business key. This will change the original grain and the semantic meaning. Which makes the results in that the original intention of the hub disappears.
Keep these rules for a Raw Data Vault hub in mind:

Don’t super- or subtype in a hub
Don’t mix different semantic meanings in a hub
Don’t mix granularities in a hub

Conclusion

Implementing multi-active satellites can be used in many data vault projects, especially when working with semi-structured data. That said, the best practice in the case of multi-active records is to work with type codes that are added to the source feed. However, when they aren’t available, you need to use the sub-sequence. Though, as a last solution you can create an extra hub.

In the next post, we’re going to talk more about the advantages and disadvantages of the options and how exactly to implement them. Topics will be delta-checks, ldts, PITs and we’ll talk about a specific hash function to calculate multi-row hashdiffs.

– by Marvin Geerken (Scalefree)

Using Multi-Active Satellites the Correct Way – Part 1

Multi-Active Satellites in Data Vault 2.0

What are Multi-Active Satellites?

When to use Multi-Active Satellites?

An Example

Different Options for Implementation

Multi-Active Satellites Rows

Type

Extra Hub

Add the Multi-Active Attribute to the Composite Key

Conclusion

Build your path to a scalable and resilient Data Platform

Subscribe to our
free monthly newsletter

Leave a Reply Cancel Reply

Subscribe to our
free monthly newsletter

SOLUTIONS

TRAININGS

EVENTS

KNOWLEDGE HUB

CAREERS

COMPANY

Using Multi-Active Satellites the Correct Way – Part 1

Multi-Active Satellites in Data Vault 2.0

What are Multi-Active Satellites?

When to use Multi-Active Satellites?

An Example

Different Options for Implementation

Multi-Active Satellites Rows

Type

Extra Hub

Add the Multi-Active Attribute to the Composite Key

Conclusion

Build your path to a scalable and resilient Data Platform

Subscribe to our free monthly newsletter

You May Also Like

How to Combine Managed Self-Service BI with Data Vault 2.0?

Managed Self-Service BI: Success in Spite of Stringent Regulation

Data Vault 2.0: Best of Breed from Data Warehousing and Data Lakes

Leave a Reply Cancel Reply

Subscribe to our free monthly newsletter

SOLUTIONS

TRAININGS

EVENTS

KNOWLEDGE HUB

CAREERS

COMPANY

Subscribe to our
free monthly newsletter

Subscribe to our
free monthly newsletter