It can be a bit overwhelming for beginners to start using Data Vault 2.0, as well as learning how and where to implement it. It’s important to note that Data Vault 2.0 is often assumed to be only a modeling technique but it encompasses a lot more than that. Not only that, but it is a whole BI solution composed of agile methodology, architecture, implementation, and modeling.
So why start using Data Vault?
- Data Vault 2.0 allows you to build automated loading processes/patterns and generate models very easily
- Platform independence
- Auditability
- Scalability
- Supports ELT instead of ETL processes
Now that we answered the why, you may be wondering what steps are needed to implement Data Vault 2.0 in your project ?
It depends on a lot of factors like your business case, the architecture you want to have in place, how your sources are loaded, the sprint timeline of your project, etc.
Feature by Feature Architecture
One thing is for sure: the architecture should be built vertically, not horizontally. This means not layer by layer but feature by feature.
A common approach here is the Tracer Bullet approach. Based on business value, which is defined by a report, a dashboard or an information mart, the source data needs to be identified, modeled, and loaded through all layers of the architecture.
For example, let’s say the business request was to build a dashboard to analyze the company’s sales:
1. Extract
First thing, we need to extract the data from the source systems and load the data as it is somewhere. In this example, we put it in a Transient Staging Area but you could choose a persistent one in a Data Lake as well.
2. Transform
Next, you should apply some hard rules if necessary, be careful with this as you do not want to make business calculations here, using a transformation tool. There are a lot of different warehouse automation tools that you can choose from: dbt, Coalesce, WhereScape, etc.
3. Load
Load your Raw Stage into the Raw Vault.
4. Model Business requirements
Model the Data Vault entities needed for the business requirement to be fulfilled. If we have some Sales transactions and customers data, for example, we will model a Non-historized Link, also known as Transactional Link, and a Customer Hub, along with any additional Satellites for holding the Customer descriptive data that we want to see in the Sales Dashboard in the end.
5. Apply Business Logic
Next we need some calculations and aggregations to be performed, so we will build some business logic on top of the raw entities, loading it into the business vault.
6. Build an Information Mart
Now, we could directly use the data stored in the Raw and Business Vault into charts/dashboard, but we want to structure the data, so it can be easily read and fetched by business users, so we will build an information mart with a star schema model with a fact table and dimensions.
7. Visualize Data
To build the Sales Dashboard in a BI visualization tool like PowerBI or Tableau, we now fetch directly from the star schema in the information mart, which has all the information we need, using a connection to my data warehouse in our database.
Data Vault 2.0 offers an agile, scalable, and flexible approach to Data Warehousing Automation. As demonstrated in the example, we only modeled the Data Vault tables that were necessary for accomplishing the handed task of building a Sales dashboard. This way you can scale up your business by demand, so you don’t have to figure out and map out the whole enterprise in one go.
The answer to how to implement Data Vault 2.0 can be translated into a simple phrase: Focus on business value!
If you would like to see an explanation of this step by step implementation with some demonstration of actual data using dbt as the chosen transformation tool, check out the webinar recording.
– Barbara Schlottfeldt Maia (Scalefree)
Get Updates and Support
Please send inquiries and feature requests to [email protected].
For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com.
To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.