Watch the Video
Understanding Data Vault Mixed Models: Integrating Non-Data Vault Entities in the Business Vault
Data Vault architecture is a widely used methodology in data warehousing, providing a highly adaptable model for managing complex data environments. It organizes data primarily in three core components: Hubs, Links, and Satellites. These elements support business keys, relationships, and descriptive data to create a comprehensive data structure within the Raw Data Vault layer. However, as with many methodologies, real-world data often introduces elements outside the strict boundaries of this structure, sparking questions around flexibility.
This article explores the concept of a “mixed model” in Data Vault, where non-Data Vault entities coexist with Raw Data Vault components and how they might be integrated within a Business Vault structure. While the purest Data Vault models focus on auditability and lineage, allowing for seamless automation, mixed models can sometimes be practical if managed thoughtfully. So, is it permissible to mix non-Data Vault entities with the Raw Data Vault within a Business Vault? Let’s dive into this topic.
In this article:
What is a Data Vault Mixed Model?
A “mixed model” in Data Vault refers to a scenario where traditional Data Vault structures (Hubs, Links, and Satellites) are used alongside other non-Data Vault tables or entities. In essence, while most data resides in the structured Raw Data Vault, there are other data components within the same database that do not conform to Data Vault architecture. This raises questions about integrating these disparate data types in the Business Vault.
The Business Vault is designed to serve as a refined, operational version of the Raw Data Vault. It enhances the raw data with business logic and transformations to create actionable insights. In scenarios where a mixed model is necessary, the goal is often to leverage existing non-Data Vault tables to derive business insights while minimizing disruption to the original data model.
Can You Integrate Non-Data Vault Entities with Raw Data in the Business Vault?
According to Data Vault principles, the ideal approach is to structure all data as Hubs, Links, and Satellites to ensure consistency, auditability, and lineage. However, a mixed model approach can sometimes be necessary. For instance, you may have a database that combines data stored in the Raw Data Vault with tables or entities that don’t follow Data Vault structures. So, is it allowed?
The short answer is yes, you can technically integrate non-Data Vault entities within the Business Vault, but it comes with caveats. Here’s a deeper look at the implications:
- Temporary Solutions Only
Mixing non-Data Vault data with Raw Data Vault entities is generally seen as a temporary solution. It may help in quickly bridging data that doesn’t yet fit into the Data Vault model, allowing for rapid integration. However, over time, this approach can lead to complexity in querying and reduce the consistency that Data Vault offers. - Impact on Automation and Maintainability
Introducing non-standard tables complicates automation within the Business Vault. Data Vault design leverages automation tools like dbt, Wherescape, and Vaultspeed, among others. These tools facilitate a streamlined workflow in Data Vault implementations by allowing for automated lineage, auditing, and data transformations. When introducing non-Data Vault entities, the automation capabilities are hindered, requiring custom scripts or queries that deviate from standard Data Vault patterns. - Jeopardizing Auditability and Lineage
One of Data Vault’s strongest value propositions is its focus on data lineage and auditability. In a mixed model, these aspects may be compromised. Without adhering to the structure of Hubs, Links, and Satellites, it becomes challenging to track data history, version control, and capture all changes comprehensively. For organizations that rely on these features for regulatory or quality purposes, compromising lineage may be a serious drawback. - User Mart as an Alternative
An alternative approach is to build a “User Mart” for ad hoc or analytical queries that combine data from the Raw Data Vault with non-Data Vault tables. This User Mart allows users to query both Raw Data Vault and external entities without disrupting the core Business Vault structure. This approach is particularly useful when users have specific reporting or analytical requirements that may not require full Data Vault transformation. - Pragmatic Approach: Virtual Hubs and Links
A practical solution in Data Vault projects is to create “virtual” Hubs, Links, and Satellites for non-standard tables, which serve as placeholders within the Raw Data Vault structure. This approach allows for quick integration while maintaining some level of standardization. For example, if there’s a reference table with country names and codes, you might create a virtual Hub for the country and map descriptive details as a virtual Satellite. This doesn’t achieve full lineage but can serve as a bridge until a proper Data Vault structure can be implemented. - Reference Tables and Non-Critical Data
In scenarios where data like reference tables (e.g., country codes, zip codes) doesn’t require full lineage or version tracking, a flat and wide reference table can be used. If a reference Hub and Satellite are unnecessary, keeping the data simple with a primary key and descriptive columns is often sufficient. This approach can work well for non-essential data, where maintaining Data Vault-style rigor may not be worth the effort.
Strategies for Long-Term Success with a Mixed Model
If you decide to proceed with a mixed model, it’s crucial to plan for a future transition toward a fully Data Vault-compliant design. Here are some tips:
- Prioritize Refactoring Non-Data Vault Entities
Establish a clear roadmap for converting non-Data Vault tables into Hubs, Links, and Satellites over time. This phased approach enables you to work within existing constraints while planning for a more robust and compliant Business Vault. - Minimize Technical Debt
Track instances of non-Data Vault elements within your data ecosystem and treat them as “technical debt” to be managed and resolved in the long term. This keeps you aware of areas where auditability or automation might be compromised. - Use Metadata-Driven Automation
Employ metadata-driven automation tools as much as possible to simplify future integrations and transitions. These tools enable automated data processing across the Data Vault pipeline, making it easier to add and transform new data sources into compliant Data Vault structures. - Implement Strict Governance for User-Generated Data
In cases where users introduce their own data models within the User Mart or Business Vault, set governance policies to standardize data usage and maintain some level of alignment with Data Vault patterns. These policies can mitigate risks related to data quality and ensure that non-Data Vault data remains manageable.
Practical Example of a Mixed Model in Action
Consider a financial services organization that maintains a Raw Data Vault with transaction data but also has a separate schema for customer reference tables, such as customer demographics and location details. Rather than directly integrating these tables into the Business Vault, the organization could create virtual Hubs and Links that link customer IDs and locations to transactions. This allows them to continue working within the Raw Data Vault framework while planning to reformat reference tables in alignment with Data Vault standards.
Another example might involve a large retail company where user-generated data models in the User Mart are frequently used to support marketing analysis. Here, the organization could implement a temporary mixed model that accommodates fast-paced analysis while planning for a phased migration to Data Vault structures over time.
Conclusion: Balancing Flexibility with Data Vault Integrity
While a mixed model is not ideal within Data Vault architecture, it can serve as a temporary, pragmatic solution when there’s an immediate need to integrate non-Data Vault entities. Virtual Hubs and Links, User Marts, and strict governance policies can help manage the complexity introduced by non-standard tables. However, organizations should prioritize migrating all data into the Data Vault model over time to preserve the long-term benefits of auditability, lineage, and automation that Data Vault offers.
In the end, remember that the strength of Data Vault lies in its flexibility, auditability, and scalability. Introducing non-Data Vault tables as a quick fix is feasible, but for sustainable and reliable insights, a fully Data Vault-compliant model remains the optimal choice.
Meet the Speaker
Michael Olschimke
Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!