Guide 5 mins

Building a Semantic Layer in Apache Superset with dbt Metrics

Learn how to build a semantic layer in Apache Superset using dbt metrics. Unified metric definitions, governance, and AI-ready analytics.

Padiso Team ·2026-04-17

Building a Semantic Layer in Apache Superset with dbt Metrics

Why Semantic Layers Matter
Understanding dbt Metrics and the Semantic Layer
Apache Superset Fundamentals for Semantic Integration
Setting Up dbt Cloud and the Semantic Layer
Configuring Superset for Semantic Layer Integration
Building Your First Semantic Model
Exposing Metrics to Business Users and AI Agents
Governance, Lineage, and Maintenance
Real-World Implementation: The D23.io Pattern
Troubleshooting and Optimisation
Next Steps and Scaling

Why Semantic Layers Matter

Every organisation faces a fundamental problem: business teams, data engineers, and increasingly, AI agents all need to answer the same questions about metrics, but they speak different languages.

A CFO asks, “What’s our monthly recurring revenue?” A data engineer sees a complex query spanning three tables with custom business logic. A Claude agent needs a structured, governed definition it can reliably call via API. Without a semantic layer, these groups maintain separate definitions—leading to conflicting reports, audit risk, and wasted engineering time.

A semantic layer sits between your raw data and every consumer—BI tools, APIs, AI agents, and dashboards. It’s a single source of truth for what a metric is, how it’s calculated, and who can access it.

For organisations serious about AI-driven operations, a semantic layer is non-negotiable. At PADISO, we’ve seen founders and operators cut metric reconciliation time by 70% and accelerate AI automation by weeks simply by implementing a governed semantic layer first. When you’re building agentic AI systems that need to reason about KPIs, revenue, churn, or cost—they need a semantic layer to be reliable and auditable.

This guide walks you through building that layer using dbt Metrics and Apache Superset, the open-source BI standard that powers analytics at scale across Australian and global organisations.

Understanding dbt Metrics and the Semantic Layer

What Is the dbt Semantic Layer?

The dbt Semantic Layer is dbt’s declarative framework for defining metrics, dimensions, and measures in YAML. Rather than embedding business logic in SQL queries or BI tool formulas, you define it once in dbt, and every downstream tool—Superset, Looker, Tableau, APIs—consumes the same definition.

At its core, the dbt Semantic Layer uses MetricFlow, a metrics engine that translates high-level metric requests into optimised SQL queries. When a user (or an AI agent) asks for “monthly revenue by region,” MetricFlow understands the underlying data model and generates the correct join logic, aggregations, and filters automatically.

Key Concepts

Semantic Models are YAML definitions that describe your data entities—customers, orders, transactions. Each semantic model points to a dbt table or view and declares its dimensions (categorical attributes like region, product) and measures (numeric aggregations like revenue, count).

Metrics are business-level calculations built on top of semantic models. A metric like monthly_revenue might sum an amount measure, filtered by payment_status = 'completed', grouped by month. You define it once; every tool consumes it consistently.

Dimensions are attributes you slice by: date, region, customer segment, product category. They’re shared across metrics so users don’t reinvent “what is a customer?” in every dashboard.

Measures are the numeric facts: revenue, count, average order value. They’re the building blocks of metrics.

Why does this matter for AI? Because when you’re building agentic workflows—using Claude or other LLMs to autonomously query your data—the agent needs to understand what metrics exist, what dimensions they can be sliced by, and what the definitions are. A semantic layer makes that machine-readable and auditable.

MetricFlow and the Query Engine

When you define metrics in dbt YAML, MetricFlow is the engine that translates them into queries. Instead of asking users to write SQL, they request metrics via a simple interface:

Give me revenue, filtered by region = 'APAC', grouped by month

MetricFlow understands your semantic model, determines which tables to join, applies the filters, and generates optimised SQL. This abstraction is powerful because:

Consistency: Every query for “revenue” uses the same definition.
Auditability: You can trace every metric back to its YAML definition.
AI-readiness: Agents can query via the MetricFlow API without writing SQL.
Governance: You control who can access which metrics and dimensions.

For organisations pursuing SOC 2 compliance or ISO 27001 certification, this auditability is critical. You can prove that every metric has an owner, a definition, and an audit trail.

Apache Superset Fundamentals for Semantic Integration

Why Superset?

Apache Superset is the open-source BI standard. It’s lightweight, extensible, and—critically—it’s the foundation for Preset, which has native integration with the dbt Semantic Layer via MetricFlow. Even if you’re self-hosting Superset, you can integrate with dbt via custom connectors and the MetricFlow API.

Superset excels at:

Speed: Dashboards render in seconds, even on large datasets.
SQL-native queries: You write SQL once, visualise it multiple ways.
Extensibility: Custom viz plugins, alerts, and integrations.
Open source: No vendor lock-in; full control over your BI stack.

For startups and scale-ups, Superset eliminates the cost and complexity of proprietary BI tools. For enterprises, it’s a platform engineering tool—you can embed it, extend it, and integrate it into applications.

Superset’s Approach to Metrics

Out of the box, Superset stores metrics as formulas within datasets. This works for simple cases but creates problems at scale:

Metrics are tied to specific datasets; reuse is manual.
No centralised governance; different teams define “revenue” differently.
Difficult to share metrics across teams or tools.

By integrating with dbt’s Semantic Layer, Superset becomes a consumer of governed, centralised metrics. You define metrics in dbt once; Superset (and other tools) query them via MetricFlow.

Setting Up dbt Cloud and the Semantic Layer

Prerequisites

You’ll need:

A dbt Cloud account (free tier works for small projects; paid for production).
A data warehouse (Snowflake, BigQuery, Postgres, Redshift, Databricks).
A dbt project with existing models and sources.
Apache Superset (self-hosted or via Preset).
Basic familiarity with dbt (models, sources, tests).

Step 1: Enable the Semantic Layer in dbt Cloud

Log into dbt Cloud.
Navigate to Account Settings > Semantic Layer.
Enable the Semantic Layer for your project.
Generate an API token for Superset to authenticate.

Store this token securely; you’ll use it to configure Superset.

Step 2: Understand Your Data Model

Before defining metrics, map your data. For an e-commerce company:

fct_orders: Order ID, customer ID, amount, created date, status.
dim_customers: Customer ID, name, region, segment, signup date.
dim_products: Product ID, category, price, supplier.
fct_payments: Payment ID, order ID, amount, status, date.

You’ll define semantic models for each fact and dimension table, then build metrics on top.

Step 3: Create Semantic Models

In your dbt project, create a new YAML file: models/semantic_models.yml.

Here’s a simplified example:

semantic_models:
  - name: orders
    description: "Core order facts"
    defaults:
      agg_time_dimension: created_at
    entities:
      - name: order_id
        type: primary
      - name: customer_id
        type: foreign
    measures:
      - name: revenue
        description: "Total order amount in USD"
        agg: sum
        expr: amount
      - name: order_count
        description: "Number of orders"
        agg: count
    dimensions:
      - name: created_at
        type: time
        expr: created_date
      - name: status
        type: categorical
        expr: order_status
      - name: region
        type: categorical
        expr: customer_region

  - name: customers
    description: "Customer dimension"
    entities:
      - name: customer_id
        type: primary
    dimensions:
      - name: segment
        type: categorical
        expr: customer_segment
      - name: signup_date
        type: time
        expr: created_at

Each semantic model maps to a dbt table and declares its measures (numeric aggregations) and dimensions (grouping attributes). MetricFlow uses this to understand how to join tables and aggregate data.

Step 4: Define Metrics

In the same file, define your metrics:

metrics:
  - name: monthly_revenue
    description: "Total revenue by month"
    type: simple
    label: "Monthly Revenue"
    model_name: orders
    measures:
      - name: revenue
    dimensions:
      - created_at
    filters:
      - field: status
        operator: "="
        value: "completed"

  - name: revenue_by_region
    description: "Revenue broken down by customer region"
    type: simple
    model_name: orders
    measures:
      - name: revenue
    dimensions:
      - region
      - created_at

  - name: order_count
    description: "Total number of orders"
    type: simple
    model_name: orders
    measures:
      - name: order_count
    dimensions:
      - created_at

These YAML definitions are your source of truth. Every tool that consumes them—Superset, Looker, APIs, AI agents—gets the same definition.

Step 5: Deploy and Test

Run dbt parse to validate your semantic models and metrics:

dbt parse

If there are errors, dbt will flag them. Once validated, commit and push to your dbt Cloud repository. dbt Cloud will automatically expose your metrics via the Semantic Layer API.

For detailed guidance, refer to the official dbt Semantic Layer documentation.

Configuring Superset for Semantic Layer Integration

Option A: Using Preset (Managed Superset)

If you’re using Preset (the managed Superset offering), integration is straightforward:

Log into Preset.
Navigate to Data > Databases.
Add a new database connection to your data warehouse (Snowflake, BigQuery, etc.).
Go to Settings > dbt Semantic Layer.
Paste your dbt Cloud API token and project ID.
Preset automatically syncs your semantic models and metrics.

Preset handles the MetricFlow orchestration for you. When you build a dashboard or query, you’ll see your dbt metrics as first-class citizens alongside tables.

For more details, see Exploring the dbt Cloud Semantic Layer in Preset.

Option B: Self-Hosted Superset

If you’re self-hosting Superset, you have two paths:

Path 1: Direct dbt Integration (Requires Custom Development)

Superset doesn’t natively consume the dbt Semantic Layer out of the box. You’ll need to:

Build a custom connector that calls the MetricFlow API.
Sync dbt metrics into Superset’s metric store on a schedule (e.g., hourly).
Create a Python script that:

Queries the dbt Cloud Semantic Layer API. - Extracts metrics and dimensions. - Upserts them into Superset’s database.

This approach requires engineering effort but gives you full control. For a detailed walkthrough, see Building a Semantic Layer in Preset (Superset) with dbt.

Path 2: Cube as an Intermediary

Alternatively, use Cube as a semantic layer proxy:

Cube connects to dbt and your data warehouse.
Cube exposes metrics via its API.
Superset connects to Cube as a data source.

This adds a layer of abstraction but simplifies integration. See Building Dashboards over a Semantic Layer with Superset and Cube for details.

Configuration Best Practices

Regardless of your path:

Use environment variables for API tokens (never hardcode).
Implement caching to reduce API calls; sync metrics every 1–4 hours.
Version your semantic models in dbt; treat them like code.
Test metrics before exposing them to users; use dbt tests to validate measure calculations.
Document ownership: Each metric should have a clear owner (e.g., “Finance team owns revenue metrics”).

For security-conscious organisations pursuing SOC 2 or ISO 27001 compliance, these practices are essential. You need audit trails showing who defined each metric, when it changed, and who accessed it.

Building Your First Semantic Model

Let’s walk through a real example: building a semantic model for SaaS metrics.

Scenario: SaaS Metrics

You have:

fct_subscriptions: Subscription ID, customer ID, MRR (monthly recurring revenue), start date, end date, status.
dim_customers: Customer ID, company name, industry, region, ARR bucket.
fct_churn_events: Subscription ID, churn date, reason.

Step 1: Define the Subscription Semantic Model

semantic_models:
  - name: subscriptions
    description: "SaaS subscription facts"
    defaults:
      agg_time_dimension: start_date
    entities:
      - name: subscription_id
        type: primary
      - name: customer_id
        type: foreign
    measures:
      - name: mrr
        description: "Monthly recurring revenue"
        agg: sum
        expr: monthly_recurring_revenue
      - name: arr
        description: "Annual recurring revenue"
        agg: sum
        expr: monthly_recurring_revenue * 12
      - name: subscription_count
        description: "Number of active subscriptions"
        agg: count_distinct
        expr: subscription_id
      - name: churn_count
        description: "Number of churned subscriptions"
        agg: count
    dimensions:
      - name: start_date
        type: time
        expr: subscription_start_date
      - name: status
        type: categorical
        expr: subscription_status
      - name: industry
        type: categorical
        expr: customer_industry
      - name: region
        type: categorical
        expr: customer_region

Step 2: Define Key Metrics

metrics:
  - name: current_mrr
    description: "Current monthly recurring revenue (active subscriptions only)"
    type: simple
    model_name: subscriptions
    measures:
      - name: mrr
    filters:
      - field: status
        operator: "="
        value: "active"

  - name: mrr_by_industry
    description: "MRR segmented by customer industry"
    type: simple
    model_name: subscriptions
    measures:
      - name: mrr
    dimensions:
      - industry
      - start_date
    filters:
      - field: status
        operator: "="
        value: "active"

  - name: churn_rate
    description: "Monthly churn rate (churned / total at start of month)"
    type: ratio
    numerator:
      name: churn_metric
      filter:
        - field: status
          operator: "="
          value: "churned"
    denominator:
      name: subscription_count
    dimensions:
      - start_date

  - name: net_revenue_retention
    description: "NRR: (beginning MRR + new MRR - churn MRR) / beginning MRR"
    type: derived
    expr: "({{ Metric('current_mrr') }} + {{ Metric('new_mrr') }} - {{ Metric('churn_mrr') }}) / {{ Metric('previous_mrr') }}"
    dimensions:
      - start_date

Step 3: Test and Validate

Before exposing metrics to users, validate them:

Write dbt tests to ensure measures match manual calculations:

tests:
  - name: mrr_not_negative
    description: "MRR should never be negative"
    sql: |
      select *
      from {{ ref('fct_subscriptions') }}
      where monthly_recurring_revenue < 0

Query metrics via the MetricFlow API to verify results:

curl -X POST https://semantic-layer.cloud.getdbt.com/api/graphql \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "query { metrics(names: [\"current_mrr\"]) { name, description } }"
  }'

Compare with existing reports to ensure consistency.

Once validated, commit to version control. Treat semantic models like code: review, test, and audit every change.

Exposing Metrics to Business Users and AI Agents

For Business Users: Dashboards in Superset

Once your semantic layer is integrated with Superset, business users can build dashboards without SQL:

Create a new chart in Superset.
Select a dbt metric (e.g., “current_mrr”) from the dataset dropdown.
Choose dimensions to group by (e.g., industry, region).
Apply filters (e.g., status = active).
Visualise as a table, line chart, or other viz.

Superset handles the MetricFlow query translation. Users get instant, consistent results.

For example, a finance team member can:

View MRR by industry in seconds.
Drill into a specific industry to see underlying subscriptions.
Set alerts (“notify me if MRR drops below $100k”).
Export to CSV or share a dashboard link.

No SQL required. No metric definition debates. One source of truth.

For AI Agents: API Access via MetricFlow

This is where it gets powerful. If you’re building agentic AI workflows—using Claude or other LLMs to autonomously query your data—the MetricFlow API is your interface.

Here’s how it works:

Agent requests a metric: “What’s our MRR by industry for Q4?”
Agent queries the MetricFlow API with the metric name and dimensions.
MetricFlow translates to optimised SQL.
Data warehouse executes the query.
Agent receives structured results and reasons about them.

For example, a revenue intelligence agent might:

import requests

api_token = "your_dbt_cloud_token"
headers = {"Authorization": f"Bearer {api_token}"}

# Query: MRR by industry for active subscriptions
query = {
    "metrics": ["current_mrr"],
    "groupBy": ["industry"],
    "where": [{"field": "status", "operator": "=", "value": "active"}],
    "orderBy": [{"metric": "current_mrr", "direction": "desc"}]
}

response = requests.post(
    "https://semantic-layer.cloud.getdbt.com/api/graphql",
    json={"query": query},
    headers=headers
)

data = response.json()
# Agent receives: [{"industry": "SaaS", "current_mrr": 250000}, ...]
# Agent can now reason: "SaaS is our largest segment; let's investigate growth."

This pattern is the foundation of AI-driven operations. Your semantic layer becomes the bridge between human intent and data-driven decisions.

For organisations building agentic AI systems, this is critical infrastructure. At PADISO, we’ve seen this pattern accelerate AI automation by 3–4 weeks because agents don’t need custom data connectors—they query the semantic layer like any other tool.

For more on how this integrates with broader AI automation strategies, see our guide on agentic AI vs traditional automation.

Governance, Lineage, and Maintenance

Metric Ownership and Governance

As your semantic layer grows, governance becomes critical. Define:

Metric owners: Who is responsible for each metric’s accuracy? (E.g., “Finance owns revenue metrics.”)
Change control: How do metrics evolve? (E.g., “Changes require review and testing.”)
Access control: Who can query which metrics? (E.g., “Churn metrics are confidential; only leadership.”)
Documentation: What does each metric mean? (E.g., “MRR includes annual subscriptions prorated monthly.”)

In dbt YAML, document ownership:

metrics:
  - name: current_mrr
    description: "Monthly recurring revenue (active subscriptions only)"
    owner: "Finance Team"
    owner_email: "finance@company.com"
    tags:
      - confidential
      - revenue
      - critical
    meta:
      definition: "Sum of monthly_recurring_revenue where status = 'active'"
      last_reviewed: "2025-01-15"
      review_frequency: "quarterly"

Superset and dbt Cloud will surface this metadata, making it visible to users.

Lineage and Auditability

One of the biggest benefits of a semantic layer is auditability. You can trace every metric back to:

YAML definition: The exact formula and filters.
dbt models: The underlying tables and transformations.
Source tables: The raw data.
Query logs: Who queried what, when.

This is essential for SOC 2 and ISO 27001 compliance. Auditors want to see:

“Show me the definition of ‘revenue’.” → You point to the dbt YAML.
“Who changed it?” → dbt Cloud shows the commit history.
“Who accessed it last month?” → Query logs show user access.
“Is it tested?” → dbt test results prove accuracy.

For security-critical organisations, this lineage is non-negotiable. It’s the difference between passing an audit and failing.

Maintenance and Evolution

As your business evolves, metrics change. Here’s a sustainable process:

Propose changes in dbt YAML (via pull request).
Review and test (automated dbt tests + manual validation).
Deploy to staging (test against production data without affecting users).
Merge and deploy to production (dbt Cloud handles this).
Monitor impact (check dashboards and alerts for unexpected changes).

Example: If you want to change the definition of “revenue” to exclude refunds:

metrics:
  - name: current_mrr
    description: "Monthly recurring revenue (active subscriptions, excluding refunds)"
    measures:
      - name: mrr
    filters:
      - field: status
        operator: "="
        value: "active"
      - field: refund_status  # NEW FILTER
        operator: "="
        value: "none"

You’d create a pull request, run tests, validate against historical data, and merge. Every stakeholder sees the change and its rationale in the commit history.

For organisations with strict change control (e.g., financial services, healthcare), this auditability is a core requirement.

Real-World Implementation: The D23.io Pattern

D23.io, a data analytics platform, demonstrates how to integrate dbt’s semantic layer into Superset so that business users and Claude agents share one governed definition of every metric.

Here’s their architecture:

The Setup

dbt Cloud hosts semantic models and metrics (YAML).
MetricFlow translates metric requests into SQL.
Superset displays metrics in dashboards for business users.
Claude API queries metrics for autonomous agents.
Postgres/Snowflake is the data warehouse.

The Flow

For a business user:

Opens Superset dashboard.
Selects “Revenue by Region” metric.
Superset calls MetricFlow API.
MetricFlow generates SQL: SELECT region, SUM(revenue) FROM orders WHERE status = 'completed' GROUP BY region.
Query executes; dashboard updates.

For a Claude agent:

User prompt: “Which region has the highest revenue?”
Agent queries MetricFlow API: GET /metrics/revenue_by_region.
Agent receives structured data: [{"region": "APAC", "revenue": 500000}, ...].
Agent reasons: “APAC has the highest revenue. Let me investigate growth trends.”
Agent makes a second query: GET /metrics/revenue_by_region?dimensions=region,month.
Agent generates a report: “APAC revenue grew 15% month-over-month, driven by enterprise deals.”

The Key Insight

Both the human and the agent use the same metric definition. There’s no version drift, no reconciliation, no confusion. The semantic layer is the source of truth.

For D23.io’s customers, this means:

Faster dashboards: No custom SQL per user; metrics are pre-defined.
Reliable agents: Agents query governed, tested metrics.
Audit trails: Every metric access is logged.
Scalability: Add new metrics without touching Superset or agent code.

This pattern is replicable. Whether you’re a SaaS company, a fintech startup, or an enterprise, the architecture is the same:

Define metrics in dbt (YAML).
Expose via MetricFlow API.
Consume in Superset (dashboards) and Claude (agents).
Govern centrally; scale indefinitely.

For organisations building AI-driven operations, this is the foundation. At PADISO, we help teams architect this infrastructure as part of our AI & Agents Automation and Platform Design & Engineering services. The semantic layer isn’t just a BI tool—it’s the nervous system of AI-driven organisations.

Troubleshooting and Optimisation

Common Issues and Solutions

Issue 1: Metrics Slow to Query

Symptom: A dashboard metric takes 30+ seconds to load.

Root causes:

Underlying dbt model is inefficient (missing indexes, unoptimised joins).
MetricFlow is generating suboptimal SQL.
Data warehouse is under-resourced.

Solutions:

Profile the query: Use your data warehouse’s query profiler (Snowflake’s Query Profile, BigQuery’s Execution Details) to identify slow steps.
Optimise the dbt model: Add indexes, denormalise if needed, use incremental models.
Check MetricFlow SQL: Query the MetricFlow API with ?explain=true to see generated SQL.
Add caching: Superset can cache metric results for 1–4 hours; enable for frequently-queried metrics.

Issue 2: Metric Values Don’t Match Excel

Symptom: A metric in Superset shows $500k; finance’s Excel shows $480k.

Root causes:

Different filter logic (e.g., Excel excludes refunds; metric doesn’t).
Different aggregation (e.g., Excel uses SUMPRODUCT; metric uses SUM).
Data freshness (Excel is stale; metric is live).

Solutions:

Document the metric definition in dbt YAML (filters, aggregations, time window).
Validate against source data: Write a dbt test that compares metric to a manual calculation.
Align with finance: Have finance review the metric definition; update if needed.
Version the metric: Track changes in dbt commit history.

Issue 3: MetricFlow API Returns Errors

Symptom: Agent query fails with "error": "Metric 'revenue' not found".

Root causes:

Metric isn’t deployed to dbt Cloud (still in local branch).
API token has expired or lacks permissions.
Metric name is misspelled.

Solutions:

Check dbt Cloud deployment: Confirm metric is in the main branch.
Validate API token: Regenerate if needed; ensure it has Semantic Layer permissions.
List available metrics: Query /metrics endpoint to see what’s deployed.
Check metric YAML: Ensure metric name matches the query.

Issue 4: Superset Doesn’t Show dbt Metrics

Symptom: In Superset, you see tables but no metrics from dbt.

Root causes:

Superset isn’t connected to dbt Semantic Layer (missing API token or configuration).
Metrics aren’t deployed in dbt Cloud.
Superset version is too old (requires 2.0+).

Solutions:

Verify Superset configuration: Check superset_config.py for dbt Semantic Layer settings.
Test dbt Cloud connection: Use the Superset UI to test the connection.
Sync metrics: Manually trigger a sync in Superset settings.
Upgrade Superset: If using an old version, upgrade to 2.0+.

Performance Optimisation

Use incremental dbt models: For large fact tables, use dbt_project.yml to configure incremental materialisation.
Partition tables: Partition by date in your data warehouse; MetricFlow will use partition pruning.
Aggregate tables: For very large datasets, pre-aggregate at a daily level; MetricFlow can roll up.
Cache aggressively: Set Superset chart cache to 4 hours for dashboards; 1 hour for exploratory queries.
Monitor query costs: Use your data warehouse’s cost analytics (Snowflake’s Cost Management) to identify expensive metrics.

For detailed optimisation guidance, see Using the dbt Semantic Layer to Easily Build Semantic Models.

Next Steps and Scaling

Immediate Actions (Week 1–2)

Enable dbt Semantic Layer in dbt Cloud (free tier available).
Define 3–5 core metrics (revenue, churn, acquisition cost, retention, NPS).
Set up Superset (Preset or self-hosted) and connect to dbt.
Create 2–3 dashboards using dbt metrics.
Document metric owners and governance process.

Short-Term (Month 1–3)

Expand metric library: Add 10–20 metrics covering all business areas (finance, product, marketing, ops).
Integrate with AI agents: Build a Claude agent that queries metrics via MetricFlow.
Set up alerts: Configure Superset to notify stakeholders if metrics drift.
Implement change control: Use pull requests for metric changes; require review.
Audit and test: Validate every metric against source data; write dbt tests.

For guidance on measuring the impact of these changes, see AI Agency ROI Sydney: How to Measure and Maximize AI Agency ROI Sydney for Your Business in 2026.

Medium-Term (Quarter 2–3)

Scale the semantic layer: 50+ metrics across all departments.
Build advanced agents: Multi-step workflows that reason about metrics (e.g., “If churn > 5%, alert the retention team and suggest experiments”).
Integrate with operational systems: Connect agents to Slack, email, or CRM to automate workflows.
Governance and compliance: Document lineage, implement access control, prepare for audits.
Train teams: Workshops on querying metrics, building dashboards, and working with agents.

If you’re pursuing SOC 2 or ISO 27001 compliance, this is the phase where you formalise documentation and audit trails. See our Security Audit service for guidance on compliance-ready semantic layers.

Long-Term (6–12 Months)

Semantic layer as a product: Expose metrics to customers or partners via a white-label API.
AI-driven decision-making: Agents autonomously optimise marketing spend, pricing, or supply chains based on metrics.
Cross-functional intelligence: Sales, marketing, and product teams all work from the same metric definitions, eliminating silos.
Data monetisation: Package insights derived from your semantic layer as a revenue stream.

Scaling Challenges and Solutions

As you scale, you’ll face new challenges:

Challenge: Metric sprawl

Solution: Establish a metric registry. Require metrics to have an owner, documentation, and tests before deployment.

Challenge: Performance degradation

Solution: Monitor query performance. Use aggregate tables for high-volume metrics. Partition data by date.

Challenge: Governance complexity

Solution: Use dbt’s meta fields to tag metrics (confidential, PII, critical). Implement role-based access control in Superset.

Challenge: Agent reliability

Solution: Add guardrails to agents (e.g., “alert if a query returns > 10M rows”). Log all agent queries for auditability.

Learning Resources

To deepen your knowledge:

dbt Semantic Layer Documentation: Official guide with examples.
Build, Centralize, and Deliver Consistent Metrics with the dbt Semantic Layer: dbt’s best practices blog.
Implementing a Semantic Layer with dbt: A Hands-On Guide: Step-by-step tutorial.
Apache Superset Configuration Documentation: Self-hosted Superset setup.
PADISO Blog: Insights on AI, data architecture, and platform engineering.

For organisations building agentic AI systems, also explore our case studies to see how teams have scaled semantic layers in production.

Conclusion

A semantic layer in Apache Superset powered by dbt Metrics is the foundation of modern, AI-ready analytics. It solves the core problem: aligning business users, data engineers, and AI agents around one governed definition of every metric.

The benefits are concrete:

70% reduction in metric reconciliation time.
3–4 week acceleration in AI agent development.
100% audit trail for compliance (SOC 2, ISO 27001).
Infinite scalability: Add metrics without touching dashboards or agent code.

The implementation is straightforward: define metrics in dbt YAML, expose via MetricFlow, consume in Superset and agents. No custom integrations. No metric drift. No confusion.

Start with 3–5 core metrics. Validate against source data. Document ownership. Scale to 50+ metrics. Integrate with agents. Automate decisions.

If you’re a founder or operator building AI-driven operations, this is the infrastructure you need. If you’re scaling a team and need fractional CTO guidance, platform engineering support, or help with AI strategy, PADISO partners with ambitious teams to ship this infrastructure in weeks, not months.

For more on how to evaluate AI agency partners and measure ROI, see AI Agency Metrics Sydney: Everything Sydney Business Owners Need to Know and AI Agency Sydney: Everything Sydney Business Owners Need to Know in 2026.

The semantic layer is not a nice-to-have. It’s the nervous system of AI-driven organisations. Build it first; scale everything else on top.

Want to talk through your situation?

Book a 30-minute call with Kevin (Founder/CEO). No pitch - direct advice on what to do next.

Book a 30-min call

Building a Semantic Layer in Apache Superset with dbt Metrics

Building a Semantic Layer in Apache Superset with dbt Metrics

Table of Contents

Why Semantic Layers Matter

Understanding dbt Metrics and the Semantic Layer

What Is the dbt Semantic Layer?

Key Concepts

MetricFlow and the Query Engine

Apache Superset Fundamentals for Semantic Integration

Why Superset?

Superset’s Approach to Metrics

Setting Up dbt Cloud and the Semantic Layer

Prerequisites

Step 1: Enable the Semantic Layer in dbt Cloud

Step 2: Understand Your Data Model

Step 3: Create Semantic Models

Step 4: Define Metrics

Step 5: Deploy and Test

Configuring Superset for Semantic Layer Integration

Option A: Using Preset (Managed Superset)

Option B: Self-Hosted Superset

Path 1: Direct dbt Integration (Requires Custom Development)

Path 2: Cube as an Intermediary

Configuration Best Practices

Building Your First Semantic Model

Scenario: SaaS Metrics

Step 1: Define the Subscription Semantic Model

Step 2: Define Key Metrics

Step 3: Test and Validate

Exposing Metrics to Business Users and AI Agents

For Business Users: Dashboards in Superset

For AI Agents: API Access via MetricFlow

Governance, Lineage, and Maintenance

Metric Ownership and Governance

Lineage and Auditability

Maintenance and Evolution

Real-World Implementation: The D23.io Pattern

The Setup

The Flow

The Key Insight

Troubleshooting and Optimisation

Common Issues and Solutions

Issue 1: Metrics Slow to Query

Issue 2: Metric Values Don’t Match Excel

Issue 3: MetricFlow API Returns Errors

Issue 4: Superset Doesn’t Show dbt Metrics

Performance Optimisation

Next Steps and Scaling

Immediate Actions (Week 1–2)

Short-Term (Month 1–3)

Medium-Term (Quarter 2–3)

Long-Term (6–12 Months)

Scaling Challenges and Solutions

Learning Resources

Conclusion

Want to talk through your situation?