Data is an important asset in the decision making process. As we have previously discussed in another post, Data Vault 2.0 is the right choice when the goal of an enterprise data warehouse is to have fully historized and integrated data. Additionally, it is also better suited to instances in which data from many source systems needs to be combined. You can find the previous blog post here.
While Data Vault 2.0 focuses on the “what”, there are many options for the “how” of technically translating a Data Vault model into physical tables and views in the enterprise data warehouse, as well as for orchestrating and loading/processing the procedures. And this is where dbt comes in.
ABOUT DBT
The data build tool (dbt) transforms your data directly in your data warehouse. For reference, dbt is the “T” in ELT. Therefore, dbt assumes that data is already loaded into a database from which the current database can query from. In contrast, ETL extracts data then transforms it and before loading it into the target. With ELT, the data is not transformed as it moves into the data warehouse. (with transform we mean transformation by soft business rules which would change the meaning of the data. Of course we have to make sure that the data fits into the target table (data type adjustments etc.). Here we are talking about “hard rules”.)
Dbt is particularly compatible and useful in cloud DWH solutions such as Snowflake, Azure Synapse Analytics, BigQuery and Redshift and performs transformations and modeling directly on the database in order to take advantage of the performance of these highly scalable platforms.
HOW DBT WORKS
Models and SQL statements can be easily created, tested and managed in dbt itself. A powerful combination of the scripting language, Jinja2, and the all-time classic SQL allows users to build models. The simple interface enables data analysts without engineering know-how to initiate appropriate transformations. Data team workflows become more efficient and cost-effective because of this. Behind dbt sits an open source community that is constantly and passionately developing the tool. As such, dbt is available both as a free, reduced core version and as a comprehensive and flexible cloud version.
What is the place of dbt in Data Vault 2.0?
Part of the Data Vault 2.0 methodology is the model which focuses on how to design the core data warehouse with a scalable solution. The core Data Vault entities are hubs, links and satellites. That said, dbt provides the ability to generate Data Vault models and also allows you to write your data transformations using SQL and code-reusable macros powered by Jinja2 to run your data pipelines in a clean and efficient way.
dbt does not reinvent the world, but when it comes to building a new EDW – especially if it is in the clouds – dbt provides a very helpful basic framework with many important functions for Continuous Integration and Deployment already defined. dbt brings the standards of software development into the world of data transformation. This allows developers to concentrate on the core tasks of data modeling and business logic. Especially, but not only for smaller projects, dbt offers a lightweight and extremely affordable alternative to other data warehouse automation solutions.
– by Ole Bause (Scalefree)
Get Updates and Support
Please send inquiries and feature requests to [email protected].
For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com.
To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.
Hi Ole,
since you mention dbt in the context of the Data Vault, it’s not clear whether you actually meant dbtvault, which supports the Raw Vault structures, but does not, for example, currently support Azure Synapse Analytics, or whether your intention was to recommend using the generic dbt for generating the Raw Vault structures. Could you clarify?
Thanks,
Gideon
Hi Gideon,
this article just describes how you can benefit from using dbt in your Data Vault project in general.
AutomateDV(formerly dbtvault) is currently not planning to support Synapse in the future and has also some drawbacks e.g. in loading patters that are not consistent with the Data Vault 2.0 standards.
We have released our own package datavault4dbt which is fully compliant with data vault 2.0 standards and also supports more entity types and we are currently working on Synapse support for our package. I can definetely recommend using this package.
If you can’t wait until the support is released you would need to implement everything by your own. Then you should definetely take use of macros to have repeatable patterns to generate your raw vault structures.