In our past blog post, we introduced an open source framework for ELT processes called Singer. This framework can be wrapped up using another open source tool which adds more interesting features to Singer including installation, setup of environments, monitoring, scheduling and orchestration. At Scalefree, we moved all of our ELT pipelines into this framework on AWS and are pleased with the results.
Please note, there are a large number of platforms for managing data integration but there is a lack of robust and easy-to-use, free open source solutions. The Meltano project aims to provide a solution to that situation. Meltano is a full-package data integration platform that challenges the most established players in the data space. Meltano is built on top of the best open source tools for data integration and infuses them with DataOps best practices.
Meltano is the easiest way to build, run and orchestrate ELT pipelines made-up of Singer taps, targets and dbt models. It is open source, self-hosted and version controlled as well as containerized.
Meltano’s open source model lets you easily adapt it to your own needs and reduces cost.
What is an ELT pipeline?
ELT stands for “extract, load, and transform” — the processes a data pipeline uses to replicate data from a source system into a target system such as a cloud data warehouse.
- Extraction: This first step involves copying data from the source system.
- Loading: During the loading step, the pipeline replicates data from the source into the target system, which might be a data warehouse or data lake.
- Transformation: Once the data is in the target system, organizations can run whichever transformations they need. Often organizations will transform raw data in different ways for use with different tools or business processes.
ELT provides a modern alternative to ETL. Instead of transforming the data before it’s written, ELT leverages the target system to do the transformation. The data is copied to the target and then transformed in place.
What is Meltano?
Meltano is a self-hosted ELT solution created by GitLab. Initially made for internal use by the GitLab data team, Meltano quickly grew into an independent entity when its team became aware that many organizations were facing the same issues that Meltano intended to solve.
Meltano’s mission is to enable every organization to make the best decisions possible by becoming data-informed. To achieve this mission, they have built an open source platform for the complete DataOps lifecycle. It integrates best-in-class open source components and enables teams to collaborate on data projects and pipelines more efficiently and with higher confidence.
Meltano is ELT for the DataOps era:
- Open-Source
- Self-hosted
- CLI-first
- Debuggable
- Extensible
Meltano is built in a modular fashion, combining open source tools like Singer and dbt.
You can develop and test locally before deploying into production within your orchestrator of choice or with the built-in Airflow integration.
Not “all or nothing”: Incremental adoption is encouraged
Meltano brings together different tools in a single project repo:
- Extract & Load: Singer
- Transform: dbt
- Testing: dbt test (soon: Great Expectations)
- Orchestration: Airflow (built-in) (also: Dagster, Prefect and more)
Analysis: Meltano UI (soon: Jupyter, Superset)
Embracing Singer
Meltano has embraced Singer and provides a clear path to production with existing Singer taps and targets where there wasn’t one before. Meltano supports every Singer tap and target and thus offers the possibility to use an incredible number of integrations for source systems.
They have launched the MeltanoHub for Singer which is the only central place to find Singer taps and targets. This is the Singer equivalent of PyPi or Docker Hub. At this point in time, 290 source systems, taps, are already listed.
Meltano’s ELT Pipelines
As the pipelines in Meltano are code, you can apply any modern software development principle. Additionally, they are ready to be version controlled, containerized and developed continuously.
Meltano makes sure everything “just works”:
- Installs taps and targets
- Manages credential storage
- Orchestrates data exchange between taps and targets
- Triggers dbt to transform your data
- and much more
Write your own tap or target
With Melatno, writing a tap for a new data source is rather easy. The SDK for Singer Taps and Targets enables developers to build their own connectors without having to be an expert on the specs.
Outlook
Now, since we’ve gone through an overview of Meltano and its advantages, you may ask yourself whether it really works that way. Therefore, we will build a complete data pipeline in Meltano as an example in the next newsletter.
So, don’t forget to sign up for our newsletter if you haven’t already done so and don’t miss your opportunity to level up your ELT-processes.
-by Ole Bause (Scalefree)
Get Updates and Support
Please send inquiries and feature requests to [email protected].
For Data Vault training and on-site training inquiries, please contact [email protected] or register at www.scalefree.com.
To support the creation of Visual Data Vault drawings in Microsoft Visio, a stencil is implemented that can be used to draw Data Vault models. The stencil is available at www.visualdatavault.com.