Meltano in action
In unserem last overview, we talked about Meltano and its architecture. Now, we would like to illustrate the ease in which you can use Meltano to create a data integration pipeline.
Before we start, please ensure that you have already installed Meltano on your machine. If you haven’t yet, you can follow Meltano’s official installation guide.
First we will initialize a Meltano project.
Initialize a new project in a directory of your choice by using “meltano init”.
This will create a new directory with, among other things, your “meltano.yml” project file.
The project folder
The project folder is the single source of truth for all your data needs. It is a simple directory and version controllable.
The main part of your project is the meltano.yml project file. This file defines your plugins, pipelines and configuration.
The project and meltano.yml files are manageable using the Meltano CLI and are instantly containerized for Docker/Kubernetes deployment. Though, please note that there are not any defined plugins or pipeline schedules created yet.
We will do this in the next step.
Adding an extractor and loader
Now, let’s initialize the pipeline’s components. The first plugin you’ll want to add is an extractor which will be responsible for pulling data out of your data source.
To find out if an extractor for your data source is supported out of the box, you can check the Extractors list on MeltanoHub or run “meltano discover”.
We will use the Tap for Gitlab in this example as we don’t need to create API credentials.
Meltano manages setup, configuration and handles invocation.
We will configure the extractor to pull the data from the repository meltano/meltano. Additionally, we will define it to only extract data as of 1 January 2021 and to include only data under the “Tags” stream.
Data selection is way easier than using just Singer!
The extractor is now set up.
Now, we will add a loader to store the data into a CSV file and define our destination path:
Remember that the directory needs to be previously created, as it will not be created automatically.
This is what our meltano.yml looks like:
Instead of using the CLI, we can make changes directly in the YAML.
This way, it’s also possible to configure the extractors and loaders in the Meltano UI:
Start the Meltano UI web server using meltano ui. Unless configured otherwise, the UI will now be available at http://localhost:5000.
Run a pipeline
Now it’s time to run a pipeline. To run a one-time pipeline, we can just use the meltano elt command:
Ergebnis
And we are done! It took us just ten commands to create a data integration pipeline.
Zusammenfassung
It’s clear why Meltano is a great choice for building your data platform: it’s powerful but simple to maintain and its open-source model makes it flexible, budget-friendly and reliable.
-von Ole Bause (Scalefree)
Updates und Support erhalten
Bitte senden Sie Anfragen und Funktionswünsche an [email protected].
Für Anfragen zu Data Vault-Schulungen und Schulungen vor Ort wenden Sie sich bitte an [email protected] oder registrieren Sie sich unter www.scalefree.com.
Zur Unterstützung bei der Erstellung von Visual Data Vault-Zeichnungen in Microsoft Visio wurde eine Schablone entwickelt, mit der Data Vault-Modelle gezeichnet werden können. Die Schablone ist erhältlich bei www.visualdatavault.com.