Modelling Salesforce History Tables in Data Vault

Modelling Salesforce History Tables

Salesforce tracks changes to configured attributes by storing them in history tables. This data, which includes record ID, field name, old and new values, and timestamps, presents a unique challenge for Data Vault modeling. In this article, we’ll explore an optimal way to model this data using Data Vault principles.

In this article:

Understanding Salesforce History Tables
Challenges in Modeling Salesforce History Data
Approach: Multi-Active Satellite
Optimized Approach: Non-Historized Link
Efficient Data Retrieval: Pivoting
Alternative Consideration: JSON Storage
Conclusion
Watch the Video
Meet the Speaker

Understanding Salesforce History Tables

Salesforce allows tracking of specific attribute changes within objects like Contacts. These changes are stored in history tables such as ContactHistory. Each entry logs:

Record ID (e.g., Contact ID)
Field Name
Old Value
New Value
Timestamp

Challenges in Modeling Salesforce History Data

When designing a Data Vault model for this history data, there are key challenges to consider:

Handling multiple changes for the same record within a short time frame
Maintaining referential integrity
Efficiently querying and pivoting data for reporting

Approach: Multi-Active Satellite

A common initial approach is to model the history table as a multi-active satellite attached to a Contact Hub, with the field name as the dependent key. However, this approach has pitfalls:

Duplicates can arise if multiple changes occur for the same field in the same batch
Timestamp-based keys are unreliable due to possible duplicate timestamps

To counter this, a unique sequence number should be assigned in the staging area and used as a dependent key.

Optimized Approach: Non-Historized Link

Instead of a multi-active satellite, a non-historized link can be used to model Salesforce history data more efficiently. Here’s how it works:

Create a non-historized link connecting the Contact and User hubs.
Store change-related attributes (field name, old value, new value, timestamp) directly within this link.
Use the timestamp as an event-based attribute rather than part of the primary key.

This approach avoids the need for complex joins and simplifies querying.

Efficient Data Retrieval: Pivoting

Since history tables are structured in a key-value format, queries often require pivoting. By using database pivot functions, we can restructure the data into a more usable format for reporting without excessive joins.

Alternative Consideration: JSON Storage

Another approach is to store change data as a JSON object in a standard satellite. This method offers flexibility, particularly when dealing with a large number of attributes. However, it complicates querying and should be used only when necessary.

Conclusion

For most cases, a non-historized link is the optimal way to model Salesforce history tables in Data Vault. It simplifies data storage, reduces the need for extensive joins, and enhances query performance. Multi-active satellites are an alternative but require careful handling of duplicate timestamps and field changes.

Watch the Video

Meet the Speaker

Michael Olschimke

Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!

Modelling Salesforce History Tables in Data Vault