With the advent of Data Vault 2.0, which adds architecture and process definitions to the Data Vault 1.0 standard, Dan Linstedt standardized the Data Vault symbols used in modeling. Based on these standardized symbols, the Visual Data Vault (VDV) modeling language was developed, which can be used by EDW architects to build Data Vault models. The authors of the book “Building a Scalable Data Warehouse”, who are the founders of Scalefree, required a visual approach to model the concepts of Data Vault in the book. For this purpose, they developed the graphical modeling language, which focuses on the logical aspects of Data Vault. The Microsoft Visio stencils and a detailed white paper are available on www.visualdatavault.com as a free download.
Hubs in Visual Data Vault
Business keys play an important role in every business, because they are referenced by business transactions and relationships between business objects. Whenever a business identifies and tracks business objects, business keys are used throughout business processes. This is one of the reasons why Data Vault is based on the business keys. In Data Vault models, business keys are stored in hub entities. The challenge is to identify the business keys which represent a business object uniquely. That can be just one business key, but also a composite key or a smart key. The first image shows a hub with only one business key attribute:
Here, the attribute Invoice Number is sufficient to identify the invoice. No other attribute is required (such as the invoice year). In other cases, it is not as easy, as the following diagram shows:
In this case, the accountant is identified by a Country Code attribute (such as the ISO2 code) and an Employee Number attribute. One attribute alone would not be sufficient to identify the accountant: the employee number by itself might be overlapping across all countries and have only a local meaning (employee number 10006 might be used in multiple countries and identify a different accountant in each country). Therefore, the local key is extended by the country code to uniquely identify the accountant. Be aware, the country code has to be in the source data to make this a valid model in Data Vault (in the end, we do model source data, in the Raw Data Vault, not the desired model of the business).
Another example extends this concept into a so-called smart-key:
Here, the IBAN number, which is used to identify banking accounts internationally, consists of 4 physical elements in the number:
- Country code
- Checking number
- Account number
- Bank Identifier code
In order to model a smart key (a key that comprises multiple parts or keys), add a smart key to the hub and then add business keys to identify the sections of the smart key. As you can see from above figure, the logical symbol of a smart key is similar to that of a business key. However, the icons are slightly different and the shape indicates a stack.
In this example, each business key is modeled as an individual attribute in the hub entity. The combination identifies a business object in the business. The checking number is actually not modelled, because it contains no business value (except the ability to serve as a technical checksum). However, you’re not wrong with adding it to the model, too.