Running Modern ETL Processes with Framework-Based Tools 2/2

Moderne ETL-Prozesse mit Framework-basierten Tools durchführen - Teil 2

Unter Sandra KrennOktober 26, 2021#!29Mo, 26 Feb 2024 10:02:29 +0100+01:002929#29Mo, 26 Feb 2024 10:02:29 +0100+01:00-10Europe/Berlin2929Europe/Berlin202429 26am29am-29Mo, 26 Feb 2024 10:02:29 +0100+01:0010Europe/Berlin2929Europe/Berlin2024292024Mo, 26 Feb 2024 10:02:29 +01000210022amMontag=965#!29Mo, 26 Feb 2024 10:02:29 +0100+01:00Europe/Berlin2#Februar 26th, 2024#!29Mo, 26 Feb 2024 10:02:29 +0100+01:002929#/29Mo, 26 Feb 2024 10:02:29 +0100+01:00-10Europe/Berlin2929Europe/Berlin202429#!29Mo, 26 Feb 2024 10:02:29 +0100+01:00Europe/Berlin2#

In the last Blog-Beitrag, we introduced Singer, the open-source framework, as a powerful tool for ETL processes. This time, we’d like to discuss how you can implement the framework in your own projects.

How to start working with Singer

Starting a test run is rather simple. First, you need to create a python environment, for which step-by-step instructions to do so are available online.

As soon as you’ve done that, it’s time to create your first virtual environment inside python.
Please note before beginning that it’s a best practice to create and use an individual virtual environment for every tap and target. This avoids any conflicts between module requirements for the different modules.

The next step is to install the tap and target you’ve chosen into their corresponding virtual environment. This installation can be performed very easily using a pip install command. This example command installs the tap-salesforce to the load data from your Salesforce account:

After installing the tap, you’ll need to configure the connection to the Salesforce api.
This is done by creating a config.json inside your working directory containing all required configuration parameters. If searching for the file, remember that these can always be found on the tap-specific documentation and mainly consists of the api-credentials.

Singer uses a method called discover to identify all available data objects in your source system. This information is stored either in a catalog.json or in a properties.json, based on the tap you are using. For example, if you are using tap-salesforce, you can create a properties.json with the following command:

Der properties.json is used to select all the objects you want to load with your ETL process. Objects are selected by adding a selected: true row to the metadata section of every object you want to select inside the properties.json.

Now, your tap is ready to go and can either be combined with a target or be run as a standalone process, sending the data to stdout, using the following command:

Please note that the square brackets are not a part of the code and are only used to show the option of piping the source data to a target, in this case the target-csv.

Since the setup of a new tap or target is generally the same, you can easily add more source systems or target databases using the same process.

Though easy to install, your growing Singer instance may affect a few comfort features. A further complication that may arise is the eventual need to schedule your ELT processes to run on a daily basis for consistent staging purposes. Which might require a bit more work to accomplish. Luckily, there also is something for you in that case.

Ausblick

Overall, we consider the Singer framework a great environment for easy-to-use ETL processes. That said, while the process is straightforward, you may run into some problems when working with a larger number of different taps and targets. This is due to the presence of different virtual environments which make the commands grow longer and more complex with time.

Though, there is another open-source tool that uses the Singer standards and even improves upon its ease of use and flexibility.

This tool is called Meltano and we’ll discuss its benefits in the next newsletter!

Stay tuned!

-von Tim Kirschke (Scalefree)

Updates und Support erhalten

Bitte senden Sie Anfragen und Funktionswünsche an [email protected].

Für Anfragen zu Data Vault-Schulungen und Schulungen vor Ort wenden Sie sich bitte an [email protected] oder registrieren Sie sich unter www.scalefree.com.

Zur Unterstützung bei der Erstellung von Visual Data Vault-Zeichnungen in Microsoft Visio wurde eine Schablone entwickelt, mit der Data Vault-Modelle gezeichnet werden können. Die Schablone ist erhältlich bei www.visualdatavault.com.

Beteiligen Sie sich an der Diskussion 2 Comments

ijannick sagt:

Oktober 27, 2021 um 2:33 pm Uhr

Hi Tim,

could you please ealborate a bit more on the ‘T’.

I am having the perspective on ETL processes as the transformation of data into valuable information for, in that wording, information clients or customers. While seeing singer providing ins and outs – taps and targets connected via json interchange, where and how does the actual creation of new information based on sourced data and defined logics takes place. So the ‘and’ between the in and out?

Regards
Ingo

Antworten
- Tim Kirschke sagt:
  
  Februar 10, 2022 um 1:09 pm Uhr
  
  Hi Ingo and thanks for your interesting question!
  
  Singer alone does not support any transformations on the way from the input to the output. But there are Tools like Meltano that wrap around Singer and add the functionality of using dbt for data transformation. You can read more about Meltano in two recent blog articles by a colleague here:
  – Part 1 : https://www.scalefree.com/scalefree-newsletter/open-source-production-grade-data-integration-part-1/
  – Part 2:https://www.scalefree.com/scalefree-newsletter/open-source-production-grade-data-integration-part-2/
  
  These articles give you an introduction of Meltano, but cover the transformation aspect only briefly. But to learn more about that specific part, you can check out the Meltano documentation about their transformation implementation here: https://docs.meltano.com/guide/transformation .
  I hope this helps you, feel free to reach out again!
  
  Greetings, Tim
  
  Antworten

Moderne ETL-Prozesse mit Framework-basierten Tools durchführen - Teil 2

How to start working with Singer

Ausblick

Updates und Support erhalten

Abonnieren Sie unseren
kostenlosen monatlichen Newsletter

Beteiligen Sie sich an der Diskussion 2 Comments

Eine Antwort hinterlassen Antwort abbrechen

Abonnieren Sie unseren
kostenlosen monatlichen Newsletter

LÖSUNGEN

AUSBILDUNGEN

VERANSTALTUNGEN

KNOWLEDGE HUB

KARRIERE

UNTERNEHMEN

Moderne ETL-Prozesse mit Framework-basierten Tools durchführen - Teil 2

How to start working with Singer

Ausblick

Updates und Support erhalten

Abonnieren Sie unseren kostenlosen monatlichen Newsletter

Das könnte Ihnen auch gefallen

Meltano Open Source Datenintegration für die Produktion - Teil 2

Die Potenziale von Microsoft Fabric für Business Intelligence-Lösungen

Erforschung von Datavault4DBT: Eine praktische Serie über das DBT-Paket für Data Vault 2.0 - Band 1: Die Bereitstellungsschicht

Beteiligen Sie sich an der Diskussion 2 Comments

Eine Antwort hinterlassen Antwort abbrechen

Abonnieren Sie unseren kostenlosen monatlichen Newsletter

LÖSUNGEN

AUSBILDUNGEN

VERANSTALTUNGEN

KNOWLEDGE HUB

KARRIERE

UNTERNEHMEN

Abonnieren Sie unseren
kostenlosen monatlichen Newsletter

Abonnieren Sie unseren
kostenlosen monatlichen Newsletter