Meltano Open Source Production Grade Data Integration

Meltano Open Source Datenintegration für die Produktion - Teil 2

Unter adminDezember 28, 2021#!28Mi., 05 Feb. 2025 11:33:04 +0100+01:000428#28Mi., 05 Feb. 2025 11:33:04 +0100+01:00-11Europe/Berlin2828Europe/Berlin202528 05am28am-28Mi., 05 Feb. 2025 11:33:04 +0100+01:0011Europe/Berlin2828Europe/Berlin2025282025Mi., 05 Feb. 2025 11:33:04 +01003311332amMittwoch=965#!28Mi., 05 Feb. 2025 11:33:04 +0100+01:00Europe/Berlin2#Februar 5th, 2025#!28Mi., 05 Feb. 2025 11:33:04 +0100+01:000428#/28Mi., 05 Feb. 2025 11:33:04 +0100+01:00-11Europe/Berlin2828Europe/Berlin202528#!28Mi., 05 Feb. 2025 11:33:04 +0100+01:00Europe/Berlin2#

Meltano in action

In unserem last overview, we talked about Meltano and its architecture. Now, we would like to illustrate the ease in which you can use Meltano to create a data integration pipeline.
Before we start, please ensure that you have already installed Meltano on your machine. If you haven’t yet, you can follow Meltano’s official installation guide.

First we will initialize a Meltano project.
Initialize a new project in a directory of your choice by using “meltano init”.

This will create a new directory with, among other things, your “meltano.yml” project file.

The project folder

The project folder is the single source of truth for all your data needs. It is a simple directory and version controllable.

The main part of your project is the meltano.yml project file. This file defines your plugins, pipelines and configuration.

The project and meltano.yml files are manageable using the Meltano CLI and are instantly containerized for Docker/Kubernetes deployment. Though, please note that there are not any defined plugins or pipeline schedules created yet.
We will do this in the next step.

Adding an extractor and loader

Now, let’s initialize the pipeline’s components. The first plugin you’ll want to add is an extractor which will be responsible for pulling data out of your data source.

To find out if an extractor for your data source is supported out of the box, you can check the Extractors list on MeltanoHub or run “meltano discover”.

We will use the Tap for Gitlab in this example as we don’t need to create API credentials.

Meltano manages setup, configuration and handles invocation.

We will configure the extractor to pull the data from the repository meltano/meltano. Additionally, we will define it to only extract data as of 1 January 2021 and to include only data under the “Tags” stream.

Data selection is way easier than using just Singer!

The extractor is now set up.
Now, we will add a loader to store the data into a CSV file and define our destination path:

Remember that the directory needs to be previously created, as it will not be created automatically.

This is what our meltano.yml looks like:

Instead of using the CLI, we can make changes directly in the YAML.

This way, it’s also possible to configure the extractors and loaders in the Meltano UI:

Start the Meltano UI web server using meltano ui. Unless configured otherwise, the UI will now be available at http://localhost:5000.

Run a pipeline

Now it’s time to run a pipeline. To run a one-time pipeline, we can just use the meltano elt command:

Ergebnis

And we are done! It took us just ten commands to create a data integration pipeline.

Zusammenfassung

It’s clear why Meltano is a great choice for building your data platform: it’s powerful but simple to maintain and its open-source model makes it flexible, budget-friendly and reliable.

-von Ole Bause (Scalefree)

Meltano Open Source Datenintegration für die Produktion - Teil 2

Meltano in action

The project folder

Adding an extractor and loader

Run a pipeline

Ergebnis

Zusammenfassung

Build your path to a scalable and resilient Data Platform

Abonnieren Sie unseren
kostenlosen monatlichen Newsletter

Eine Antwort hinterlassen Antwort abbrechen

Abonnieren Sie unseren
kostenlosen monatlichen Newsletter

LÖSUNGEN

AUSBILDUNGEN

VERANSTALTUNGEN

KNOWLEDGE HUB

KARRIERE

UNTERNEHMEN

Meltano Open Source Datenintegration für die Produktion - Teil 2

Meltano in action

The project folder

Adding an extractor and loader

Run a pipeline

Ergebnis

Zusammenfassung

Build your path to a scalable and resilient Data Platform

Abonnieren Sie unseren kostenlosen monatlichen Newsletter

Das könnte Ihnen auch gefallen

Daten-Streaming in Snowflake

Moderne ETL-Prozesse mit Framework-basierten Tools durchführen - Teil 1

Bewährte Praktiken zur Maximierung von Effizienz und Effektivität bei der Arbeit mit WhereScape

Eine Antwort hinterlassen Antwort abbrechen

Abonnieren Sie unseren kostenlosen monatlichen Newsletter

LÖSUNGEN

AUSBILDUNGEN

VERANSTALTUNGEN

KNOWLEDGE HUB

KARRIERE

UNTERNEHMEN

Abonnieren Sie unseren
kostenlosen monatlichen Newsletter

Abonnieren Sie unseren
kostenlosen monatlichen Newsletter