Developing an Algorithm Adapter¶
Want to teach dbterd a new trick for sniffing out relationships in your dbt project? Perhaps you've got a clever way to detect foreign keys using naming conventions, or maybe you want to leverage some metadata that the existing algorithms don't understand. Welcome to the algorithm adapter development guide!
What is an Algorithm Adapter?¶
An algorithm adapter is responsible for parsing dbt artifacts (manifest and catalog) to extract tables and relationships. Different algorithms use different strategies to discover how tables are connected:
- test_relationship - Uses dbt's
relationshipstest to find foreign keys - semantic - Uses dbt's Semantic Layer entities (primary/foreign) to determine connections
Your custom algorithm might detect relationships through naming conventions (e.g., *_id columns), comments, tags, or any other creative method you can dream up.
Quick Start¶
Here's the minimal skeleton:
"""My relationship detection algorithm for dbterd."""
from dbterd.core.adapters.algo import BaseAlgoAdapter
from dbterd.core.models import Ref, Table
from dbterd.core.registry.decorators import register_algo
from dbterd.helpers.log import logger
from dbterd.types import Catalog, Manifest
@register_algo("my_algo", description="Detect relationships using my clever method")
class MyAlgo(BaseAlgoAdapter):
"""My custom algorithm adapter."""
def parse_artifacts(self, manifest: Manifest, catalog: Catalog, **kwargs) -> tuple[list[Table], list[Ref]]:
"""Parse from file-based manifest/catalog artifacts."""
# Extract tables using inherited helper
tables = self.get_tables(manifest=manifest, catalog=catalog, **kwargs)
tables = self.filter_tables_based_on_selection(tables=tables, **kwargs)
# Extract relationships using YOUR logic
relationships = self.get_relationships(manifest=manifest, **kwargs)
relationships = self.make_up_relationships(relationships=relationships, tables=tables)
logger.info(f"Collected {len(tables)} table(s) and {len(relationships)} relationship(s)")
return (
sorted(tables, key=lambda tbl: tbl.node_name),
sorted(relationships, key=lambda rel: rel.name),
)
def parse_metadata(self, data: dict, **kwargs) -> tuple[list[Table], list[Ref]]:
"""Parse from dbt Cloud metadata API response."""
# Implement if you want to support dbt Cloud metadata
raise NotImplementedError("Metadata API not supported yet")
def get_relationships(self, manifest: Manifest, **kwargs) -> list[Ref]:
"""Your custom relationship detection logic."""
refs = []
# Your magic here!
return refs
The Base Class¶
Your adapter must inherit from BaseAlgoAdapter which provides:
| Method | Type | Description |
|---|---|---|
parse() | method | Entry point - dispatches to parse_artifacts() or parse_metadata() |
parse_artifacts() | abstract | You implement this - parse file-based artifacts |
parse_metadata() | abstract | You implement this - parse dbt Cloud metadata API |
get_tables() | inherited | Extracts tables from manifest/catalog |
get_tables_from_metadata() | inherited | Extracts tables from metadata API |
filter_tables_based_on_selection() | inherited | Filters tables by selection rules |
make_up_relationships() | inherited | Filters refs and applies entity name format |
get_unique_refs() | inherited | Deduplicates relationships |
enrich_tables_from_relationships() | inherited | Adds missing columns from relationships |
find_related_nodes_by_id() | virtual | Override to support single-model ERDs |
Step-by-Step Guide¶
1. Create Your Adapter File¶
Create a new file in dbterd/adapters/algos/:
2. Implement the Required Methods¶
parse_artifacts() - The main workhorse for file-based artifacts:
def parse_artifacts(self, manifest: Manifest, catalog: Catalog, **kwargs) -> tuple[list[Table], list[Ref]]:
"""Parse tables and relationships from dbt artifacts."""
# Step 1: Get all tables (the base class does the heavy lifting)
tables = self.get_tables(manifest=manifest, catalog=catalog, **kwargs)
# Step 2: Apply selection filters (--select, --exclude)
tables = self.filter_tables_based_on_selection(tables=tables, **kwargs)
# Step 3: Get relationships using YOUR detection method
relationships = self.get_relationships(manifest=manifest, **kwargs)
# Step 4: Filter relationships to only include selected tables
# and apply entity_name_format to table names
relationships = self.make_up_relationships(relationships=relationships, tables=tables)
# Step 5: (Optional) Add columns discovered from relationships
tables = self.enrich_tables_from_relationships(tables=tables, relationships=relationships)
logger.info(f"Collected {len(tables)} table(s) and {len(relationships)} relationship(s)")
# Return sorted for deterministic output
return (
sorted(tables, key=lambda tbl: tbl.node_name),
sorted(relationships, key=lambda rel: rel.name),
)
parse_metadata() - For dbt Cloud metadata API support:
def parse_metadata(self, data: dict, **kwargs) -> tuple[list[Table], list[Ref]]:
"""Parse from dbt Cloud metadata API response."""
data_list = data if isinstance(data, list) else [data]
# Get tables from metadata
tables = self.get_tables_from_metadata(data=data_list, **kwargs)
tables = self.filter_tables_based_on_selection(tables=tables, **kwargs)
# Get relationships from metadata (implement your logic)
relationships = self.get_relationships_from_metadata(data=data_list, **kwargs)
relationships = self.make_up_relationships(relationships=relationships, tables=tables)
logger.info(f"Collected {len(tables)} table(s) and {len(relationships)} relationship(s)")
return (
sorted(tables, key=lambda tbl: tbl.node_name),
sorted(relationships, key=lambda rel: rel.name),
)
3. Register Your Algorithm¶
The @register_algo decorator automatically registers your adapter:
@register_algo("my_algo", description="Detect relationships using naming conventions")
class MyAlgo(BaseAlgoAdapter):
...
Users can then use it via CLI:
Understanding the Data Flow¶
┌─────────────────┐ ┌─────────────────┐
│ manifest.json │ │ catalog.json │
└────────┬────────┘ └────────┬────────┘
│ │
└───────────┬───────────┘
│
▼
┌───────────────────────┐
│ Your Algorithm │
│ parse_artifacts() │
└───────────┬───────────┘
│
┌───────────┴───────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ list[Table] │ │ list[Ref] │
│ │ │ │
│ - name │ │ - name │
│ - columns │ │ - table_map │
│ - database │ │ - column_map │
│ - schema │ │ - type │
│ - ... │ │ - ... │
└─────────────────┘ └─────────────────┘
│ │
└───────────┬───────────┘
│
▼
┌───────────────────────┐
│ Target Adapter │
│ (DBML, Mermaid...) │
└───────────────────────┘
Working with Manifest and Catalog¶
Manifest Structure¶
The manifest is a typed object with these key attributes:
manifest.nodes # Dict of models, tests, seeds, snapshots
manifest.sources # Dict of source definitions
manifest.exposures # Dict of exposures
manifest.semantic_models # Dict of semantic models (dbt 1.6+)
Each node has properties like:
node = manifest.nodes["model.my_project.users"]
node.name # "users"
node.database # "analytics"
node.schema_ # "public" (note the underscore!)
node.columns # Dict of column metadata
node.depends_on.nodes # List of upstream dependencies
node.meta # Custom metadata dict
node.description # Model description
Catalog Structure¶
The catalog contains runtime information:
catalog.nodes # Dict with column types from the actual database
catalog.sources # Source column information
Each catalog node has:
catalog_node = catalog.nodes["model.my_project.users"]
catalog_node.columns # Dict with actual column types
# catalog_node.columns["id"].type -> "INTEGER"
Inherited Helper Methods¶
Table Extraction¶
You typically don't need to reimplement table extraction - just use the inherited methods:
# For file-based artifacts
tables = self.get_tables(manifest=manifest, catalog=catalog, **kwargs)
# For metadata API
tables = self.get_tables_from_metadata(data=data_list, **kwargs)
# Filter by --select and --exclude
tables = self.filter_tables_based_on_selection(tables=tables, **kwargs)
The entity_name_format kwarg controls table naming:
resource.package.model→model.jaffle_shop.ordersschema.model→public.ordersdatabase.schema.table→analytics.public.orders
Relationship Processing¶
After you create your Ref objects, use these helpers:
# Remove duplicates
refs = self.get_unique_refs(refs=refs)
# Filter to selected tables and apply entity_name_format
relationships = self.make_up_relationships(relationships=refs, tables=tables)
# Add missing columns discovered from relationships
tables = self.enrich_tables_from_relationships(tables=tables, relationships=relationships)
Complete Example¶
Here's a complete algorithm that detects relationships based on *_id naming conventions:
"""Naming convention algorithm adapter for dbterd.
Detects relationships by matching column names ending in '_id'
to primary key columns in other tables.
"""
from dbterd.core.adapters.algo import BaseAlgoAdapter
from dbterd.core.models import Ref, Table
from dbterd.core.registry.decorators import register_algo
from dbterd.helpers.log import logger
from dbterd.types import Catalog, Manifest
@register_algo("naming_convention", description="Detect relationships via *_id naming patterns")
class NamingConventionAlgo(BaseAlgoAdapter):
"""Algorithm adapter using naming conventions.
Finds relationships by matching columns ending in '_id' to
tables with matching names (e.g., user_id -> users.id).
"""
def parse_artifacts(self, manifest: Manifest, catalog: Catalog, **kwargs) -> tuple[list[Table], list[Ref]]:
"""Parse from file-based manifest/catalog artifacts."""
tables = self.get_tables(manifest=manifest, catalog=catalog, **kwargs)
tables = self.filter_tables_based_on_selection(tables=tables, **kwargs)
relationships = self.get_relationships(tables=tables, **kwargs)
relationships = self.make_up_relationships(relationships=relationships, tables=tables)
tables = self.enrich_tables_from_relationships(tables=tables, relationships=relationships)
logger.info(f"Collected {len(tables)} table(s) and {len(relationships)} relationship(s)")
return (
sorted(tables, key=lambda tbl: tbl.node_name),
sorted(relationships, key=lambda rel: rel.name),
)
def parse_metadata(self, data: dict, **kwargs) -> tuple[list[Table], list[Ref]]:
"""Parse from dbt Cloud metadata API response."""
data_list = data if isinstance(data, list) else [data]
tables = self.get_tables_from_metadata(data=data_list, **kwargs)
tables = self.filter_tables_based_on_selection(tables=tables, **kwargs)
relationships = self.get_relationships(tables=tables, **kwargs)
relationships = self.make_up_relationships(relationships=relationships, tables=tables)
logger.info(f"Collected {len(tables)} table(s) and {len(relationships)} relationship(s)")
return (
sorted(tables, key=lambda tbl: tbl.node_name),
sorted(relationships, key=lambda rel: rel.name),
)
def get_relationships(self, tables: list[Table], **kwargs) -> list[Ref]:
"""Detect relationships based on naming conventions.
Looks for columns ending in '_id' and matches them to tables
with the corresponding singular name.
Args:
tables: List of parsed tables
**kwargs: Additional options
Returns:
List of detected relationships
"""
refs = []
table_names = {t.node_name.split(".")[-1].lower(): t for t in tables}
for table in tables:
for column in table.columns:
col_name = column.name.lower()
# Check if column ends with '_id'
if not col_name.endswith("_id"):
continue
# Skip 'id' column itself
if col_name == "id":
continue
# Try to find the referenced table
# e.g., user_id -> users, order_id -> orders
base_name = col_name[:-3] # Remove '_id'
possible_tables = [
base_name, # user
f"{base_name}s", # users
f"{base_name}es", # boxes
]
for possible_name in possible_tables:
if possible_name in table_names:
target_table = table_names[possible_name]
refs.append(
Ref(
name=f"naming.{table.node_name}.{col_name}",
table_map=(target_table.node_name, table.node_name),
column_map=("id", col_name),
type="n1", # many-to-one
)
)
logger.debug(
f"Found relationship: {table.node_name}.{col_name} -> {target_table.node_name}.id"
)
break
return self.get_unique_refs(refs=refs)
Supporting dbt Cloud Metadata API¶
If you want to support dbt Cloud's metadata API, implement parse_metadata(). The data structure is different from file-based artifacts:
def get_relationships_from_metadata(self, data: list, **kwargs) -> list[Ref]:
"""Extract relationships from metadata API response."""
refs = []
for data_item in data:
# Models are under data_item["models"]["edges"]
for model in data_item.get("models", {}).get("edges", []):
node = model.get("node", {})
unique_id = node.get("uniqueId")
# Your logic here...
return self.get_unique_refs(refs=refs)
Testing Your Adapter¶
Create tests in tests/unit/adapters/algos/test_my_algo.py:
"""Tests for naming convention algorithm adapter."""
import pytest
from dbterd.adapters.algos.naming_convention import NamingConventionAlgo
from dbterd.core.models import Column, Table
@pytest.fixture
def algo():
return NamingConventionAlgo()
@pytest.fixture
def sample_tables():
return [
Table(
name="users",
database="db",
schema="public",
columns=[Column(name="id", data_type="integer")],
node_name="model.project.users",
),
Table(
name="orders",
database="db",
schema="public",
columns=[
Column(name="id", data_type="integer"),
Column(name="user_id", data_type="integer"),
],
node_name="model.project.orders",
),
]
class TestNamingConventionAlgo:
def test_get_relationships_finds_user_id(self, algo, sample_tables):
refs = algo.get_relationships(tables=sample_tables)
assert len(refs) == 1
assert refs[0].table_map == ("model.project.users", "model.project.orders")
assert refs[0].column_map == ("id", "user_id")
def test_get_relationships_ignores_id_column(self, algo):
tables = [
Table(
name="items",
database="db",
schema="public",
columns=[Column(name="id", data_type="integer")],
node_name="model.project.items",
),
]
refs = algo.get_relationships(tables=tables)
assert len(refs) == 0
Run tests:
Tips and Best Practices¶
-
Start with existing algorithms - Study
test_relationship.pyandsemantic.pyto understand the patterns and edge cases. -
Use the inherited helpers - Don't reinvent table extraction. Use
get_tables()and focus your effort on relationship detection. -
Return node_name in table_map - The
make_up_relationships()helper expects full node names (e.g.,model.project.users), not formatted names. -
Handle missing data gracefully - Manifest and catalog may have incomplete information. Check for
Noneand missing attributes. -
Log your discoveries - Use
logger.debug()to help users understand what relationships your algorithm found (and why). -
Support
find_related_nodes_by_id()- If you want to support single-model ERDs, override this method to find related tables: -
Support relationship types - If your detection method can determine cardinality, set the
typefield appropriately: "0n"- zero-to-many"01"- zero-to-one"11"- one-to-one"nn"- many-to-many"1n"- one-to-many-
"n1"- many-to-one (default) -
Document your algorithm - Add a doc page in
docs/nav/guide/explaining how to use your algorithm and what it detects.
Happy relationship hunting! 🔍