Links
🐧

dbt Core

These instructions will help you set up an integration to sync metadata from dbt Core to Rasgo via PyRasgo. If you're just wondering what will be sync'd from dbt Core, scroll down to the bottom of this page.

1. Set up PyRasgo, the Rasgo Python SDK

You can install PyRasgo via pip, and get your API from the Rasgo UI. For instructions, go here.

2. Write a script to push your dbt manifest file to the Rasgo API

Every time you call dbt run, dbt generates a manifest file with the details of your run. Rasgo then parses this file to ingest datasets, metrics, and associated metadata for you. Adapt this python script to pick up the manifest file generated after each dbt run and push it to Rasgo. You can schedule this using your preferred orchestration system (Airflow, AWS Lambda, GitHub Actions, etc.).
import json
import pyrasgo
rasgo = pyrasgo.connect(PYRASGO_API_KEY)
manifest_filepath = '/dbt/manifest_test.json' # <-- location of your manifest.json file
with open(manifest_filepath) as manifest:
rasgo.publish.dbt_manifest(manifest=json.load(manifest))
The import process will run async after the manifest file is submitted, so you do not need to keep the python environment open. Imports can take a while, depending on how many dbt sources and models you have.

3. dbt Sources and Models show up in Rasgo automatically

Once the python script is running on a schedule, the integration is set up and good to go! Rasgo will now ingest your dbt metadata every time the script is run to make sure it stays updated in Rasgo.
Here is the metadata that Rasgo will ingest from dbt Core:
  • dbt sources -> Rasgo datasets
    • Lineage
    • Columns
      • Column descriptions
  • dbt models -> Rasgo datasets
    • Description
    • Lineage
    • Columns
      • Column descriptions
    • SQL
  • dbt metrics -> Rasgo metrics
    • Metric definition