In a previous blog post, we discussed how to implement ghost records within a Data Vault 2.0 solution. This time around, we’d like to talk about “the other” concept, namely zero keys, which oftentimes are referenced interchangeably with ghost records.
As discussed in the previous part of this series, a ghost record is a dummy record in satellite entities containing default values. Simply put, zero keys are the entry in each hub and link entity that is a counterpart to the satellite’s ghost record containing its hash key. In this manner, the term “zero key” is oftentimes used to describe the ghost record’s hash key, which might show up in other Data Vault entities such as in Point-in-Time (PIT) tables or links. Accompanying the zero hash key is, similar to a ghost record, a default value for the business key . Or, in the case of a composite business key, multiple default values for each of its components.
With the hub and link entry for the zero key in place, each and every entry in its related satellite will then have a parent hash key, avoiding so-called hash key orphans.
In Data Vault 2.0, it is only required to insert a single ghost record to each satellite entity. However, it is possible to have multiple zero keys in place. At Scalefree internally and in many of our projects, we distinguish two types of missing objects through different hub zero keys.
Please note the hash algorithm in use is MD5:
- 00000000000000000000000000000000 (32 times the digit ‘0’) for general “unknown” cases where a business key is missing.
- ffffffffffffffffffffffffffffffff (32 times the letter ‘f’): a dedicated zero key for “erroneous” cases of missing business keys that show.
A good example that calls for the “error” zero key is in an erroneous or broken mandatory object relationship in the source. In that case, the zero key ffffffffffffffffffffffffffffffff will be found in the link entity, indicating an unexpectedly absent hub reference. Bear in mind, should you choose to implement the error zero key, it is not required to insert a ghost record with the error zero key as a parent hash key in satellite entities.
As for the zero key in link entities, it is only necessary to have one entry containing the zero hash key as both link hash key and hub reference.
It is also important to point out that all examples we provide in this blog series involve the hash algorithm MD5, which outputs 32-hexadecimal-digit sequences. For Data Vault 2.0 projects that adopt other hash algorithms, such as SHA256, simply adjust the length of the zero keys we proposed (“0000…” / “ffff…”) to the desired hash output length.
Conclusion
We hope that this blog post helped to clarify the implementation of zero keys in a Data Vault 2.0 solution and the differences between the concepts of ghost records and zero keys. Feel free to share your experience with implementing these concepts in the comments below!
– by Trung Ta (Scalefree)
Get Updates and Support
Please send inquiries and feature requests to [email protected].
For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com.
To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.
Before hashkeys became a thing I used the Kimball error codes as surrogate keys for precisely this purpose. It allowed us to monitor the movement from records with missing keys, into complete records (a case of late arriving keys, usually) and also enabled us to set reporting on the volume of active satellite records for the “missing key” hub item. This was supposed to stay under a certain level, because otherwise the load had probably not loaded all it should have been loading.
Hello Ronald,
Thank you for sharing the neat idea!
Cheers!
Trung
Thanks for this article. Very informative!
Would like to ask how would you handle incase of composite business key?
Let’s say a composite business key with 3 columns in a hub. Bk1 + Bk2 + Bk3
If the value of all of the 3 business keys comes as Null from source, then ofcourse, i think can apply the replacement for all 3 to be (unknown) or (error).
How about if only part of bk comes null?
Bk1 =123
Bk2=abc
Bk3=NULL
What i can think of is that
Replace Bk3 with (unknown) and put it as part of normal hash process
Any thoughts on this?
Hi Nikunj,
in case of composite business key:
– If the source delivers all BK parts with NULL values, then it is quite clearly that the BK is simply missing, which is a case for ‘(unknown)’ and the 0000 zero key. You can argue that for example a master data record must not miss the BK, and you’d be right – in that case, we recommend writing this record to an Error mart, and you’ll have to handle this record separately. Either way, that record will not be inserted into the target Hub and Satellite, as your loading pattern should deny zero keys from incoming data feed.
– If the source delivers one/some part(s) with NULL values, then the business key is erroneous, meaning even in the source, the business key you’d expect cannot be fully constructed as per your BK definition. In that case, you can ouput ‘(error)’ as default value and the ffff zero key to indicate the fact that the business key was not delivered properly.
Hope this helps – let us know if we can assist you further on this!
Trung Ta