Read the latest artifacts from an environment¶
This is a guideline on how to query the dbt cloud metadata given an environment by using dbt CLoud Discovery API. It's neither not requiring JOB ID
nor JOB RUN ID
, this is the dbt Cloud's ENVIRONMENT ID
. Especially, with this method, dbterd
doesn't require to download files before hands anymore, the ERD will be generated on fly 🚀.
dbterd
is now understanding GraphQL connection which is exposed by dbt CLoud Discovery API endpoint:
Replace
{YOUR_ACCESS_URL}
with the appropriate Access URL for your region and plan
Prerequisites
- dbt Cloud multi-tenant or single tenant account ☁️
- You must be on a Team or Enterprise plan 💰
- Your projects must be on dbt version 1.0 or later 🏃
The assumption is that you've already get the dbt Cloud project ready and is having at least 1 environment, and 1 job run successfully in this environment.
1. Prepare the environment variables¶
As mentioned above, the API Endpoint will look like:
For example, if your multi-tenant region is North America, your endpoint is https://metadata.cloud.getdbt.com/graphql
. If your multi-tenant region is EMEA, your endpoint is https://metadata.emea.dbt.com/graphql
.
And the dbt Cloud's Environment will have the URL constructed as:
In the above:
URL Part | Environment Variable | CLI Option | Description |
---|---|---|---|
host_url | DBTERD_DBT_CLOUD_HOST_URL | --dbt-cloud-host-url | Host URL (also known as Access URL) with prefix of metadata. |
environment_id | DBTERD_DBT_CLOUD_ENVIRONMENT_ID | --dbt-cloud-environment-id | dbt Cloud environment ID |
Besides, we need another one which is very important, the service token:
- Go to Account settings / Service tokens. Click + New token
- Enter Service token name e.g. "ST_dbterd_metadata"
- Click Add and select
Metadata Only
permission. Optionally, select the right project or all by default - Click Save
- Copy token & Pass it to the Environment Variable (
DBTERD_DBT_CLOUD_SERVICE_TOKEN
) or the CLI Option (--dbt-cloud-service-token
)
Finally, fill in your_value
and execute the (Linux or Macos) command below:
export DBTERD_DBT_CLOUD_HOST_URL=your_value e.g. metadata.cloud.getdbt.com
export DBTERD_DBT_CLOUD_SERVICE_TOKEN=your_value
export DBTERD_DBT_CLOUD_ENVIRONMENT_ID=your_value
Or in Powershell:
$env:DBTERD_DBT_CLOUD_HOST_URL="your_value"
$env:DBTERD_DBT_CLOUD_SERVICE_TOKEN="your_value"
$env:DBTERD_DBT_CLOUD_ENVIRONMENT_ID="your_value"
2. Generate ERD file¶
We're going to use a new command as dbterd run-metadata
to tell dbterd
to use dbt Cloud Discovery API with all above variables.
The command will be looks like:
Behind the scenes, it will try use to the ERD GraphQL query built-in at include/erd_query.gql
and then, here is the sample console log:
2024-02-03 19:57:57,514 - dbterd - INFO - Run with dbterd==1.0.0 (main.py:54)
2024-02-03 19:57:57,515 - dbterd - INFO - Looking for the query in: (hidden)/dbterd/adapters/dbt_cloud/include/erd_query.gql (query.py:25)
2024-02-03 19:57:57,516 - dbterd - DEBUG - Getting erd data...[URL: https://metadata.cloud.getdbt.com/graphql/, VARS: {'environment_id': '(hidden)', 'model_first': 500, 'source_first': 500, 'exposure_first': 500, 'test_first': 500}] (graphql.py:40)
2024-02-03 19:57:58,865 - dbterd - DEBUG - Completed [status: 200] (graphql.py:48)
2024-02-03 19:57:58,868 - dbterd - INFO - Metadata result: 5 model(s), 2 source(s), 1 exposure(s), 21 test(s) (discovery.py:169)
2024-02-03 19:57:58,880 - dbterd - INFO - Collected 5 table(s) and 1 relationship(s) (test_relationship.py:44)
2024-02-03 19:57:58,881 - dbterd - INFO - (hidden)\target (base.py:179)
Voila! Happy ERD with dbt Cloud Metadata 🎉!