Zum Hauptinhalt springen
Suche
0

Einführung

The larger a data warehouse project grows, the more people begin to rely and work with the data provided. This work could be consuming the data, applying business rules, modeling facts and dimensions, or other typical tasks in a data environment. In a large organization, all these users might be scattered across different divisions, and the data they are working with might belong to different business domains.

At some point, the entire organization faces the challenge of data sharing and governance guidelines, which might prohibit users of the sales department from accessing data from the finance department. A data mesh offers a solution that helps organizations to deal with these challenges. If you want to learn more about the data mesh, check our recent blog article about Data Vault and data mesh hier!

Wir bieten auch ein Webinar zu genau diesem Thema an. Verpassen Sie es nicht und die Aufzeichnung ansehen umsonst!

What is dbt Mesh?

Dbt Mesh is a recently added feature that makes dbt Cloud work more efficiently with a data mesh approach. The already familiar {{ ref() }} function is no longer limited to models within one dbt project, instead it can refer to models of other dbt projects.

Why would I want to refer to other dbt projects?

Imagine a big organization that uses dbt Cloud for their Data Vault implementation. The project might have 400 sources defined, 2000 models implemented, and is used actively by 30 developers. Out of these 30 developers, there might be 5 people specifically working on the Business Data Vault and Information Mart layer for finance-related objects. Another 5 developers are working on the same layers but for sales-related objects.

 

At some point, you might want to avoid finance people messing around with the sales-related dbt models, so a data mesh architecture is to be implemented. This would allow the organization to define policies regarding data sharing, data ownership, and other governance measures.

With dbt Mesh, both the Sales and the Finance team would get their own dbt project. Since both should be based on the same Raw Data Vault, an additional foundational dbt project is created exclusively for staging and Raw Data Vault objects. Both domain-specific dbt projects, sales and finance, can now refer to Raw Vault objects inside the foundational dbt project, avoiding actually physically replicating the data.

Scale Up your Data Vault Project - with dbt Mesh

How can I leverage dbt Mesh in a Data Vault powered Data Mesh?

Define Data Contracts

Dbt models, or groups of models, can now be configured to have data contracts. Inside the already familiar .yml files, models can now be set to be publicly available (within an organization), data owners can be enforced, and table schemas can be locked.

Create a Foundational dbt project

In a Data Mesh architecture, the most common way to implement Data Vault 2.0, is to have a commonly shared Raw Vault as a foundation, and both Business Vault and Information Marts are divided by business domains. In dbt Mesh, this would reflect in a foundational dbt project, that includes all staging and Raw Data Vault objects. Only the Raw Data Vault objects would be configured to be accessible by other dbt projects, since the staging models should not be used outside of Raw Data Vault models.

Add domain-level dbt projects

Based on the foundational Raw Vault dbt project, each domain team can now work in their own dbt project. They access the Raw Data Vault via the (extended) {{ ref() }} function and don’t have to worry about maintaining these Raw Vault objects. Additionally, they can define which of their artifacts might be useful for other domains, these can be shared via their own data contracts.

Distribute Responsibilities

Typically, a power user does not create Hubs, Links, and Satellites. And it’s not their responsibility to ensure a reliable Raw Data Vault to build transformations on. Therefore, it is important to define responsibilities within each dbt project. Especially objects that are shared outside of one project should always have data contracts and defined owners. This ensures that users of these shared objects can rely on it.

Zusammenfassung

All in all, dbt Mesh offers a fantastic way to properly implement a true data mesh approach. It is especially relevant, when different business domains of one organization are working together in dbt to create trustable deliverables. In most scenarios, it makes sense to already start using dbt Mesh, although your project might not be too big yet. Having clear responsibilities and data contracts always helps maintain trust and transparency for your data!

 

If you want to see dbt Mesh in action, die Aufzeichnung ansehen from the webinar covering the powerful combination of Data Vault and dbt Mesh!

- Tim Kirschke (Scalefree)

Updates und Support erhalten

Bitte senden Sie Anfragen und Funktionswünsche an [email protected]

Für Anfragen zu Data Vault-Schulungen und Schulungen vor Ort wenden Sie sich bitte an [email protected] oder registrieren Sie sich unter www.scalefree.com.

Um die Erstellung von Visual Data Vault-Zeichnungen in Microsoft Visio zu unterstützen, wurde eine Schablone implementiert, die zum Zeichnen von Data Vault-Modellen verwendet werden kann. Die Schablone ist erhältlich bei www.visualdatavault.com.

Scalefree

Eine Antwort hinterlassen

Menü schließen