Skip to main content

Watch the Video

In today’s digital age, GDPR compliance is a crucial aspect for any organization dealing with personal data. With the rise of data warehousing and advanced modeling solutions like Data Vault 2.0 (DV 2.0), questions often arise about how to handle Personally Identifiable Information (PII) within these frameworks. This article addresses some common concerns and provides practical recommendations for ensuring GDPR compliance in data warehouses.



Understanding the Challenge

GDPR mandates that personal data must be handled with the utmost care, ensuring individuals’ privacy and security. In the context of data warehousing, this often translates to managing business keys that might contain PII. Let’s dive into some specific questions raised around this topic:

  1. How should activity history be managed when the main hub contains a PII business key?
  2. Is it best practice to use hashed business keys in link tables to improve load performance?
  3. Should artificial keys originate from each business domain, and how should they be managed if not?

Question #1: Managing Activity History with PII Business Keys

The Problem

In a typical data warehouse model, customer records might include PII, such as social security numbers or tax IDs. According to GDPR, it’s crucial that activity history is not traceable back to the individual once they exercise their right to be forgotten.

The Solution

One effective approach is to split descriptive attributes into different satellites—one for personal data and another for non-personal data. This way, when a deletion request is made, only the personal satellite needs to be purged. The non-personal satellite can retain anonymized data, maintaining the integrity of the dataset while ensuring compliance.


The Problem

Hashing business keys is often recommended in DV 2.0 to improve load performance. However, directly using business keys in link tables can pose a challenge, especially when those keys contain PII.

The Solution

In DV 2.0, it’s a standard practice to use hashed values of business key components rather than the business keys themselves. This approach ensures better performance and security. Here’s how it works:

  1. Hash the Business Key: Use a cryptographic hash function (e.g., SHA-256) to convert the business key into a hashed key.
  2. Use Hashed Keys in Link Tables: The hashed key then serves as the foreign key in link tables, ensuring that PII is not directly exposed.

Question #3: Originating and Managing Artificial Keys

The Problem

There’s a debate on whether artificial keys should be generated within each business domain or within the data warehouse itself. This raises concerns about consistency and management, especially if the artificial key must be derived from PII.

The Solution

Artificial keys should ideally be generated within the data warehouse to maintain consistency and control. Here’s the process:

  1. Generate a UUID: Use a universally unique identifier (UUID) for the artificial key. This ensures randomness and reduces the risk of duplication.
  2. Link Artificial Keys to Business Keys: Establish a relationship between the artificial key and the business key within the data warehouse, ensuring that the artificial key is never exposed in operational systems.

Handling Scenarios Without Artificial Keys

If generating artificial keys within the data warehouse is not feasible, the data warehouse should still generate these keys upon ingestion. This method ensures that all keys are managed consistently and securely.


Ensuring Compliance and Security

Satellite Splitting

By splitting satellites into personal and non-personal data, organizations can easily manage deletion requests without compromising data integrity.

Cryptographic Hashing

Utilizing cryptographic hashing for business keys in link tables enhances both security and performance, crucial for maintaining GDPR compliance.

Artificial Keys Management

Generating artificial keys within the data warehouse ensures consistency and security, reducing the risk of PII exposure.

Regular audits and consultations with legal experts ensure ongoing compliance with GDPR and other regulations. Implementing these practices helps organizations stay ahead of potential compliance issues.


Conclusion

Handling PII in data warehouses requires careful planning and robust solutions. By implementing satellite splitting, cryptographic hashing, and consistent artificial key management, organizations can ensure GDPR compliance while maintaining data integrity and performance. Regular audits and legal advice further bolster these practices, ensuring that data handling processes remain secure and compliant with evolving regulations.

Meet the Speaker

Still Struggling with GDPR?

Michael Olschimke

Michael has more than 15 years of experience in Information Technology. During the last eight years he has specialized in Business Intelligence topics such as OLAP, Dimensional Modelling, and Data Mining. Challenge him with your questions!

Get Updates and Support

Please send inquiries and feature requests to [email protected]

For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com.

To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.

Scalefree

Leave a Reply