Read the latest artifacts from an environment¶
This is a guideline on how to query the dbt cloud metadata given an environment by using dbt CLoud Discovery API. It's neither not requiring JOB ID nor JOB RUN ID, this is the dbt Cloud's ENVIRONMENT ID. Especially, with this method, dbterd doesn't require to download files before hands anymore, the ERD will be generated on fly 🚀.
dbterd is now understanding GraphQL connection which is exposed by dbt CLoud Discovery API endpoint:
Replace
{YOUR_ACCESS_URL}with the appropriate Access URL for your region and plan
Prerequisites
- dbt Cloud multi-tenant or single tenant account ☁️
- You must be on a Team or Enterprise plan 💰
- Your projects must be on dbt version 1.0 or later 🏃
The assumption is that you've already get the dbt Cloud project ready and is having at least 1 environment, and 1 job run successfully in this environment.
1. Prepare the environment variables¶
As mentioned above, the API Endpoint will look like:
For example, if your multi-tenant region is North America, your endpoint is https://metadata.cloud.getdbt.com/graphql. If your multi-tenant region is EMEA, your endpoint is https://metadata.emea.dbt.com/graphql.
And the dbt Cloud's Environment will have the URL constructed as:
In the above:
| URL Part | Environment Variable | CLI Option | Description |
|---|---|---|---|
host_url | DBTERD_DBT_CLOUD_HOST_URL | --dbt-cloud-host-url | Host URL (also known as Access URL) with prefix of metadata. |
environment_id | DBTERD_DBT_CLOUD_ENVIRONMENT_ID | --dbt-cloud-environment-id | dbt Cloud environment ID |
Besides, we need another one which is very important, the service token:
- Go to Account settings / Service tokens. Click + New token
- Enter Service token name e.g. "ST_dbterd_metadata"
- Click Add and select
Metadata Onlypermission. Optionally, select the right project or all by default - Click Save
- Copy token & Pass it to the Environment Variable (
DBTERD_DBT_CLOUD_SERVICE_TOKEN) or the CLI Option (--dbt-cloud-service-token)
Finally, fill in your_value and execute the (Linux or Macos) command below:
export DBTERD_DBT_CLOUD_HOST_URL=your_value e.g. metadata.cloud.getdbt.com
export DBTERD_DBT_CLOUD_SERVICE_TOKEN=your_value
export DBTERD_DBT_CLOUD_ENVIRONMENT_ID=your_value
Or in Powershell:
$env:DBTERD_DBT_CLOUD_HOST_URL="your_value"
$env:DBTERD_DBT_CLOUD_SERVICE_TOKEN="your_value"
$env:DBTERD_DBT_CLOUD_ENVIRONMENT_ID="your_value"
2. Generate ERD file¶
We're going to use a new command as dbterd run-metadata to tell dbterd to use dbt Cloud Discovery API with all above variables.
The command will be looks like:
Behind the scenes, it will try use to the ERD GraphQL query built-in at include/erd_query.gql
and then, here is the sample console log:
2024-02-03 19:57:57,514 - dbterd - INFO - Run with dbterd==1.0.0 (main.py:54)
2024-02-03 19:57:57,515 - dbterd - INFO - Looking for the query in: (hidden)/dbterd/adapters/dbt_cloud/include/erd_query.gql (query.py:25)
2024-02-03 19:57:57,516 - dbterd - DEBUG - Getting erd data...[URL: https://metadata.cloud.getdbt.com/graphql/, VARS: {'environment_id': '(hidden)', 'model_first': 500, 'source_first': 500, 'exposure_first': 500, 'test_first': 500}] (graphql.py:40)
2024-02-03 19:57:58,865 - dbterd - DEBUG - Completed [status: 200] (graphql.py:48)
2024-02-03 19:57:58,868 - dbterd - INFO - Metadata result: 5 model(s), 2 source(s), 1 exposure(s), 21 test(s) (discovery.py:169)
2024-02-03 19:57:58,880 - dbterd - INFO - Collected 5 table(s) and 1 relationship(s) (test_relationship.py:44)
2024-02-03 19:57:58,881 - dbterd - INFO - (hidden)\target (base.py:179)
Voila! Happy ERD with dbt Cloud Metadata 🎉!