In our initial post of this series, we delved into the creation of our staging layer using DataVault4dbt, an open-source package designed for Data Vault 2.0 within dbt. In this installment, we embark on the journey to construct our first standard entities in the Raw Vault, including Hubs, Links, and Satellites.
As in our previous post, we recommend staying up-to-date with the latest changes and adaptations in the DataVault4dbt package by referring to the project’s GitHub repository Wiki.
Before We Start
Before we get started, ensure that you have the DataVault4dbt package correctly installed in your packages.yml file and that you’ve executed dbt deps.
For this tutorial, we’ll be using the TPCH Snowflake Sample Data. Moreover, we assume you’ve already established your staging model, which includes the calculation of hashkeys and hashdiffs. Here’s a snippet from our staging model’s configuration, which we’ll need later when creating the Raw Vault entities:
stg_orders
A. Standard Hub
Hubs are constructed based on a unique list of business keys, making their configuration relatively straightforward. In this example, we’ll be creating the Hub for orders:
order_h
- hashkey: the hashkey name in the staging model
- business_keys: name of the business key used as input for the previously mentioned hashkey
- source_models: name of our staging model
B. Standard Link
Link models establish connections between business keys. In our case, we’ll create a connection between the previously formed Order Hub and the Customer Hub:
order_customer_1
- link_hashkey: hashkey of the Link, generated using the foreign keys from the Hubs in the staging layer
- foreign_haskeys: a list of foreign hashkeys to be included in our link
- source_models: name of our staging model
C. Standard Satellite Version 0
Following Data Vault 2.0 standards, Version 0 Satellites are created as incremental tables. In our example, the Satellite will be connected to the previously generated Order Hub:
order_0s
- parent_hashkey: name of the parent entity’s hashkey, in our case, the Order Hub
- src_hashdiff: hashdiff already calculated on the staging model
- src_payload: original columns used in the hashdiff calculation
- source_model: name of our staging model
D. Standard Satellite Version 1
Additionally, the Version 1 Satellite is a virtually generated entity created on top of our Version 0 Satellite. Beyond the materialization type, the main difference with the V0 Satellite is the introduction of a new column for calculating the load end date. The load end date will be useful for us downstream when dealing with PIT tables in the Business Vault.
order_s
- sat_v0: name of the related Version 0 Satellite
- hashkey: hashkey name of the parent entity, in our case, the order Hub
- hashdiff: hashdiff already calculated on the staging model
- ledts_alias: name of the load end date column to be generated
- add_is_current_flag: when true, it generates a new column flagging the last loaded rows based on the load end date
Final Remarks
In this journey through the creation of Raw Vault standard entities, we’ve established a strong foundation for our Data Vault 2.0 architecture. By utilizing DataVault4dbt within dbt, we’ve simplified the development of Hubs, Links, and Satellites. These fundamental building blocks are the cornerstone of a robust and scalable data warehousing solution.
As we progress in this series, we’ll continue to explore advanced concepts and delve into the intricacies of Data Vault modeling, preparing us to unlock the full potential of our data.
– Hernan Revale (Scalefree)
Get Updates and Support
Please send inquiries and feature requests to [email protected].
For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com.
To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.