Satellite splitting criteria plays a vital role in a satellite’s structure. Being such, it is not recommended that the entirety of beschreibende Daten related to a business key should be stored in a single satellite structure. Instead, raw data should preferably be split by certain criteria.
In general, we have defined the following types of satellite splits:
- Splitting by source system
- Splitting by rate of change
Additionally, we have defined two more types of splits as mentioned below:
- Splitting by level of security and by the level of privacy
- Business-driven split
A satellite split by source system is strongly recommended to prevent two issues when loading the data into the enterprise data warehouse: First, if two different source systems with different relational structures should be loaded into the same satellite entity, a transformation of the structure might be required. However, structural transformation requires business logic sooner or later and that should be deferred to the information delivery stage to support fully-auditable environments as well as the application of multiple business perspectives.
The second issue is that two sources loaded into the same satellite entity leads to the so-called “flip-flop effect”: if both systems store contradicting data (e.g. out-of-sync) regarding the business key to be described, the satellite will absorb two deltas per day, capturing both descriptions, leading to high storage consumption and data inconsistencies. Therefore, splitting a satellite by source system helps to reduce the storage consumption drastically.
The advantages of splitting satellites by source system include the enhancement of parallelism, multiple source systems data can be loaded in parallel, as well. It also allows for the integration of real-time data without the need to integrate with raw data from a batch load.
In addition to the split by source system, the storage consumption can be further reduced by splitting the satellite by rate of change:
Figure: Multiple satellites (split by source system) depends on a hub
For splitting a satellite based on rate of change, one should determine the frequency of change regarding all attributes; grouping data into those that never change, sometimes change, or change very frequently. Splitting a satellite by rate of change separates the quickly changing attributes from the slowly changing attributes and therefore prevents the consumption of unnecessary storage when a quickly changing attribute is changing.
A satellite split by source system and the technical split by rate of change of data, not required when page compression is available in the database, are common and recommended practices when it comes to splitting descriptive attributes. However, we have decided to split raw data even further, both technically and by business meaning.
Im Rahmen unseres Verfahrens reichen die Sicherheitsstufen von:
- The lowest confidentiality level – level 0, 1: no security measure required, for public data
- Limited access to certain internal parties – level A, R, C, F.
- To the highest confidentiality level – level S: top secret.
Moving forward, the business-driven satellite split distributes raw data into different satellite tables utilizing certain business meanings of data content.
Zu diesem Zweck haben wir mehrere Klassifikationen definiert, um nur einige zu nennen: "Kontakt" für Kontaktdaten und "Aktivität" für Daten, die die Interaktionen der Nutzer mit dem Quelldatensatz verfolgen.
Darüber hinaus können Datenmodellierer benutzerdefinierte Geschäftsklassifikationen für bestimmte eindeutige Geschäftsbedeutungen in Geschäftsobjekten definieren.
Zum Beispiel alle Datenattribute einer auf der CRM-Plattform installierten Anwendung Salesforce are often stored within a single satellite structure. The main reason behind business driven satellites is that we can either add or remove apps while reducing the impact of structural changes to the EDW.
Im Folgenden finden Sie ein Beispiel für einen Satellitennamen in unserer internen EDW-Lösung:
kunde_kontakt_sfdc_lcp_s
The above is a satellite of a business object labelled Customer and holds customers’ contact information from the source system Salesforce. Thus, its content has a low rate of change, a security level of C and contains personal data.
Zusammenfassung
Der blog post introduced a Data Vault entity, Satellite, and we have defined our basic recommendations on how to split a satellite in different ways as well as their benefits accordingly. We also have recommended additional ways to split a satellite which are being followed in Scalefree based on source data. In our next blog post, we are going to take a deeper look into Satellite modelling in regards to any structural changes made in the source system.
– by Samatha Balla (Scalefree)
Updates und Support erhalten
Bitte senden Sie Anfragen und Funktionswünsche an [email protected].
Für Anfragen zu Data Vault-Schulungen und Schulungen vor Ort wenden Sie sich bitte an [email protected] oder registrieren Sie sich unter www.scalefree.com.
Um die Erstellung von Visual Data Vault-Zeichnungen in Microsoft Visio zu unterstützen, wurde eine Schablone implementiert, die zum Zeichnen von Data Vault-Modellen verwendet werden kann. Die Schablone ist erhältlich bei www.visualdatavault.com.
Newsletter
Jeden Monat neue Erkenntnisse über Data Vault