Guide 5 mins

From Notebooks to Dashboards: Promoting dbt Models into Apache Superset

Turn exploratory analyst notebooks into governed Superset dashboards. Learn the workflow for promoting dbt models with lineage, metadata, and audit-readiness.

Padiso Team ·2026-04-17

From Notebooks to Dashboards: Promoting dbt Models into Apache Superset

The Problem: Notebooks Don’t Scale
Why dbt + Superset Is the Right Stack
The Workflow: From Exploration to Governance
Setting Up Your dbt Foundation
Connecting dbt Models to Superset
Building Governed Dashboards
Maintaining Lineage and Metadata
Scaling Without Losing Agility
Security, Compliance, and Audit-Readiness
Next Steps and Implementation

The Problem: Notebooks Don’t Scale

Your analysts are brilliant. They spin up Jupyter notebooks, Databricks notebooks, or SQL scripts. They explore data, find patterns, build models, and generate insights. The work is exploratory, iterative, and fast. That’s the strength of notebooks.

But then the business asks: “Can we turn this into a dashboard?” Or worse: “Who owns this analysis? What data went into it? Is it still accurate?” Suddenly, that beautiful exploratory work becomes a liability. Notebooks are:

Not version-controlled in any meaningful way (git diffs are painful, collaboration is messy)
Not reusable (each analyst rewrites the same transformations)
Not governed (no metadata, no lineage, no audit trail)
Not scalable (running 50 notebooks by hand isn’t sustainable)
Not compliant (if you’re pursuing SOC 2 or ISO 27001 audit-readiness, notebooks are a red flag)

You need a way to capture that exploratory energy but move it into a system that’s production-ready, auditable, and reusable. That’s where dbt and Apache Superset come in.

Why dbt + Superset Is the Right Stack

If you’re building AI-driven products or modernising your data infrastructure, you’ve probably heard of dbt. If not, here’s the one-liner: dbt (data build tool) is a framework for transforming data in your warehouse using SQL, with version control, testing, and documentation built in.

Apache Superset is the open-source BI layer. It’s lightweight, flexible, and designed to work with any SQL database. Together, they form a clean separation of concerns:

dbt handles the “T” in ELT: transformations, testing, lineage, and metadata
Superset handles the “V” in visualization: dashboards, charts, and user-facing analytics

This stack is especially valuable if you’re working with startups or mid-market teams who need to move fast. Unlike Tableau or Looker, you’re not buying expensive enterprise licenses. Unlike a custom dashboard built from scratch, you’re not reinventing the wheel. You get governance, auditability, and speed.

For teams pursuing SOC 2 or ISO 27001 compliance via Vanta, this approach is also cleaner. dbt models live in version control. Superset dashboards can be defined as code. Audit trails are explicit. Change management is traceable.

The Workflow: From Exploration to Governance

Let’s map out the journey from notebook to dashboard.

Stage 1: Exploration (Notebooks)

Your analyst writes exploratory SQL in a notebook. They’re asking questions:

“What’s the churn rate by cohort?”
“How do customer acquisition costs trend over time?”
“Which features correlate with retention?”

They iterate fast. They write throwaway queries. They refine until they find something interesting.

Stage 2: Codification (dbt)

Once the analysis is solid, it moves into dbt. The analyst writes a dbt model (a .sql file with a SELECT statement). They add tests to ensure data quality. They document the model and its columns. They commit to git.

At this stage, the transformation is:

Reproducible (same input, same output, every time)
Testable (row counts, uniqueness, referential integrity)
Documented (descriptions, owner, business logic)
Versioned (git history, rollback capability)

Stage 3: Exposure (Superset)

Superset connects to your warehouse and reads the dbt models. The analyst (or a BI engineer) builds a dashboard on top of these models. The dashboard is:

Governed (row-level security, column-level access)
Reusable (other analysts can build on the same models)
Auditable (who viewed what, when)
Shareable (embedded, scheduled, exported)

Stage 4: Maintenance (dbt + Superset)

When the underlying data changes (a column is renamed, a calculation is updated), you update the dbt model once. All dependent dashboards inherit the change. You don’t have 50 notebooks to update.

This workflow is not theoretical. It’s how leading data teams at high-growth startups structure their analytics. And if you’re working with a venture studio or fractional CTO, this is the pattern they’ll recommend.

Setting Up Your dbt Foundation

Before you can promote models into Superset, you need a solid dbt project. Here’s what that looks like.

Project Structure

Start with a standard dbt project layout:

my-analytics/
├── dbt_project.yml          # Project config
├── models/
│   ├── staging/             # Raw data, lightly transformed
│   │   ├── stg_customers.sql
│   │   ├── stg_orders.sql
│   │   └── stg_events.sql
│   ├── marts/               # Business logic, ready for BI
│   │   ├── fct_orders.sql
│   │   ├── dim_customers.sql
│   │   └── dim_time.sql
│   └── intermediate/        # Reusable building blocks
│       └── int_customer_metrics.sql
├── tests/                   # Data quality tests
│   └── assert_*.sql
├── macros/                  # Reusable SQL snippets
└── docs/                    # Documentation
    └── schema.yml

This structure matters. Staging models are close to raw data. Marts are business-ready. Intermediate models are reusable components. When you expose these in Superset, you’ll want to expose only the marts (and maybe intermediate models). Staging models are implementation details.

Writing Testable Models

Every dbt model should have tests. Tests catch data quality issues before they reach a dashboard. Common tests include:

Not null: column must have a value
Unique: column must be unique (e.g., customer_id)
Accepted values: column must be one of a set of values (e.g., status IN (‘active’, ‘inactive’))
Relationships: foreign key integrity (e.g., order.customer_id references customer.id)

Here’s an example schema.yml file that documents and tests your models:

version: 2

models:
  - name: fct_orders
    description: "One row per order. Core fact table."
    columns:
      - name: order_id
        description: "Primary key."
        tests:
          - unique
          - not_null
      - name: customer_id
        description: "Foreign key to dim_customers."
        tests:
          - relationships:
              to: ref('dim_customers')
              field: customer_id
      - name: order_amount
        description: "Total order value in USD."
        tests:
          - not_null
      - name: order_status
        description: "Order state."
        tests:
          - accepted_values:
              values: ['pending', 'confirmed', 'shipped', 'delivered', 'cancelled']

  - name: dim_customers
    description: "One row per customer. Slowly changing dimension."
    columns:
      - name: customer_id
        description: "Primary key."
        tests:
          - unique
          - not_null

When you run dbt test, these checks execute. If they fail, the build stops. Bad data never reaches Superset.

Documentation and Metadata

The schema.yml file is also where you document your data. Each model and column gets a description. These descriptions flow into Superset’s metadata layer, so your BI users understand what they’re looking at.

You can also add custom metadata using dbt properties. For example:

models:
  - name: fct_orders
    meta:
      owner: "analytics"
      pii: false
      refresh_frequency: "daily"
      business_unit: "finance"

This metadata is queryable via dbt’s manifest.json and can be used to enforce governance rules in Superset.

Running dbt in Production

Once your models are solid, you need a way to run them on a schedule. Common approaches:

dbt Cloud (managed service, built by the dbt team)
Airflow (orchestration, full control)
GitHub Actions (lightweight, git-native)
Cron jobs (simple, but less visibility)

For a startup or small team, dbt Cloud is the easiest. For larger teams with complex dependencies, Airflow is more flexible. The key is: models run on a schedule (e.g., daily), and Superset always reads the latest results.

Connecting dbt Models to Superset

Now that your dbt models are production-ready, let’s get them into Superset.

Option 1: Direct Database Connection

The simplest approach: point Superset at your data warehouse (Postgres, BigQuery, Snowflake, etc.) and let it discover your dbt models.

Add a database connection in Superset:

Go to Settings → Data Connections → + Database - Choose your warehouse type - Enter connection details (host, username, password, database name) - Test the connection

Refresh the schema:

Go to Data → Datasets - Superset auto-discovers tables and views - Your dbt models appear as tables (if they’re built as tables) or views (if they’re built as views)

Expose only the marts:

In Superset, uncheck the staging and intermediate models from the “show in UI” setting - Only the business-ready marts are visible to BI users

This approach works, but it’s not optimal. Superset doesn’t know about dbt metadata (descriptions, tests, lineage). Dashboards are decoupled from dbt.

Option 2: Preset Cloud + dbt Integration

Preset is Superset’s official managed cloud service. It has native dbt integration. When you connect Preset to your dbt Cloud account, something magical happens:

dbt model descriptions automatically populate Superset dataset columns
dbt exposures (a dbt feature for documenting downstream usage) become Superset dashboard metadata
Model lineage flows into Superset’s UI
You can define metrics in dbt and expose them in Superset

This is the gold standard. Your dbt project becomes the single source of truth for both transformations and analytics. Here’s the workflow:

Define dbt exposures in your schema.yml:

exposures:
  - name: customer_health_dashboard
    type: dashboard
    maturity: production
    owner:
      name: "Analytics Team"
      email: "analytics@company.com"
    description: "Monitor customer churn and LTV by cohort."
    depends_on:
      - ref('fct_orders')
      - ref('dim_customers')
    url: "https://superset.company.com/dashboard/customer-health"

Sync to Preset:

Preset reads your dbt Cloud metadata - Model descriptions and lineage populate automatically - Exposures appear as documented dashboard references

Build dashboards in Superset:

Charts are built on top of dbt models - Metadata is inherited from dbt - Lineage is tracked end-to-end

If you’re working with a Sydney-based AI agency or venture studio helping you scale, they’ll likely recommend Preset for exactly this reason: it collapses the gap between data engineering and analytics.

Option 3: dbt Artifacts + Custom Sync

If you’re self-hosting Superset and want dbt integration without Preset, you can use a Python package like dbt-superset-lineage to sync dbt metadata into Superset’s metadata store.

The workflow:

dbt runs and generates a manifest.json (contains all model metadata)
A scheduled Python script reads the manifest
The script updates Superset’s dataset columns with dbt descriptions
Lineage is tracked in Superset’s database

This approach requires more engineering but gives you full control. It’s useful if you have a custom data stack or need to integrate dbt with other tools (like a data catalog).

Building Governed Dashboards

Once your dbt models are in Superset, it’s time to build dashboards. But not all dashboards are created equal.

Governance-First Design

When you’re promoting notebooks into production, governance matters. You need to ensure:

Row-level security (RLS): analysts only see data they’re authorised to see
Column-level access: sensitive columns (PII, revenue) are hidden from unauthorised users
Audit trails: who viewed what, when
Change tracking: who modified the dashboard, what changed

Superset supports RLS through SQL filters. For example:

-- Only show orders for the user's region
SELECT * FROM fct_orders
WHERE region = '{{ current_user.region }}'

You can also use Superset’s native RBAC (role-based access control) to restrict dashboard and dataset access by role.

Dashboard as Code

One of Superset’s underrated features: you can export dashboards as JSON and version-control them. This is powerful because:

Dashboards are reproducible: you can recreate them in a new Superset instance
Changes are tracked: git history shows who changed what
Dashboards can be templated: you can build a dashboard once and clone it for multiple teams

To export a dashboard:

Go to the dashboard
Click Actions → Download as JSON
Commit the JSON to git

To import:

Go to Dashboards → Import Dashboard
Upload the JSON file
Superset rebuilds the dashboard

This pattern is especially useful if you’re working with platform engineering or custom software development teams who want to codify your entire analytics layer.

Building Reusable Charts

Instead of building one-off charts for each dashboard, build reusable chart components. For example:

A “Daily Active Users” chart that takes a table name as a parameter
A “Cohort Retention” chart that works across multiple cohort definitions
A “Revenue Waterfall” chart that aggregates by any dimension

Superset supports parameterised charts through native filters. You define a filter (e.g., “Select a region”), and charts update dynamically. This reduces duplication and makes dashboards easier to maintain.

Maintaining Lineage and Metadata

Here’s where the dbt + Superset stack really shines: lineage and metadata are explicit and auditable.

dbt Lineage

Every dbt model knows its dependencies. If fct_orders depends on stg_orders, which depends on the raw orders table, dbt tracks this. You can visualise this in dbt Cloud or generate a DAG (directed acyclic graph).

When you make a change to stg_orders, you immediately know which downstream models are affected. This is invaluable when you’re debugging data issues or planning migrations.

Superset Lineage

Superset has a lineage feature (in recent versions) that shows which dashboards depend on which datasets. Combined with dbt lineage, you get end-to-end visibility:

raw.orders table
  ↓
stg_orders (dbt model)
  ↓
fct_orders (dbt model)
  ↓
fct_orders dataset (Superset)
  ↓
Revenue Dashboard (Superset)

If the raw data changes, you know exactly which dashboards are affected.

Metadata Governance

Metadata is the glue. It includes:

Ownership: who owns this model or dashboard
Freshness: when was it last updated
Quality: how many tests does it have, do they pass
PII flags: does this model contain sensitive data
Business context: what business question does this answer

Store this in dbt’s meta properties:

models:
  - name: fct_orders
    meta:
      owner: "finance-team"
      pii: false
      sla: "daily"
      business_unit: "revenue"
      tags: ["core", "production"]

Then, use this metadata to enforce governance rules:

Models with pii: true require additional access controls
Models with sla: daily trigger alerts if they’re stale
Models tagged core require additional testing

This is especially important if you’re pursuing SOC 2 or ISO 27001 compliance. Auditors want to see that you track data ownership, freshness, and quality. dbt + Superset gives you that visibility.

Scaling Without Losing Agility

As your analytics layer grows, you’ll have 50+ models and 100+ dashboards. How do you maintain agility?

Modular Model Design

The key is modularity. Build your dbt project in layers:

Staging models (stg_*): lightly transformed raw data, one model per source table
Intermediate models (int_*): reusable building blocks, not exposed to BI users
Marts (fct_ and dim_): business-ready, exposed to BI users

This structure means:

Analysts can reuse intermediate models without duplicating logic
Changes to raw data are isolated to staging models
Business logic is centralised in marts
Dashboards only depend on marts, not raw data

Metrics Layer

Recent versions of dbt introduce a metrics layer. Instead of hardcoding calculations in dashboards, you define metrics in dbt:

metrics:
  - name: revenue
    label: "Total Revenue"
    model: ref('fct_orders')
    description: "Sum of all order amounts."
    calculation_method: sum
    expression: order_amount
    timestamp: order_date
    dimensions:
      - customer_id
      - region
      - product_category

Then, in Superset, you can build charts directly on these metrics. If the revenue calculation changes, you update it once in dbt, and all dashboards inherit the change.

This is how modern data teams at scale structure their analytics. It’s also how leading AI agencies help portfolio companies avoid the “dashboard sprawl” problem.

Documentation as a Living System

As your project grows, documentation becomes critical. In dbt, documentation is code:

models:
  - name: fct_orders
    description: |
      # Order Facts
      
      One row per order. Updated daily.
      
      ## Grain
      - order_id (unique)
      
      ## Key Relationships
      - customer_id → dim_customers
      - product_id → dim_products
      
      ## Freshness
      - Updated: {{ run_started_at }}
      - SLA: 24 hours
      
      ## Ownership
      - Team: Finance
      - Slack: #analytics-finance

When you run dbt docs generate, this becomes an interactive website. Analysts can explore your data model without leaving their browser. This is invaluable for onboarding new team members.

Security, Compliance, and Audit-Readiness

If you’re working with a startup or mid-market company that’s pursuing SOC 2 or ISO 27001 compliance via Vanta, this section is critical.

Version Control and Change Management

All dbt models live in git. Every change is tracked:

Who made the change (git author)
When (commit timestamp)
Why (commit message)
What changed (git diff)

For auditors, this is gold. You have a complete audit trail of all data transformations. You can answer questions like:

“When was the revenue calculation changed?”
“Who has access to the customer table?”
“What changed in the churn model between January and February?”

Superset dashboards can also be version-controlled (as JSON exports). Combined with dbt, you have end-to-end change tracking.

Access Control and PII Handling

With dbt + Superset, you can implement fine-grained access control:

Data warehouse level: use warehouse-native RLS (Snowflake, BigQuery, Postgres all support this)
dbt level: tag models with meta.pii: true and enforce stricter governance
Superset level: use RBAC and SQL filters to restrict dashboard access

For example, if you have a customers table with PII (email, phone, SSN), you might:

Create a dbt staging model stg_customers that includes PII (for internal use only)
Create a dbt mart model dim_customers_safe that excludes PII (for BI users)
Expose only dim_customers_safe in Superset
Tag stg_customers with meta.pii: true to flag it for auditors

Testing and Data Quality

Data quality is a compliance requirement. Every dbt model should have tests. Tests are:

Automated: run on every dbt build
Auditable: test results are logged
Enforceable: failed tests block deployment

Common test patterns:

models:
  - name: fct_orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: customer_id
        tests:
          - not_null
          - relationships:
              to: ref('dim_customers')
              field: customer_id
      - name: order_amount
        tests:
          - not_null
          - dbt_utils.expression_is_true:
              expression: "order_amount >= 0"

When you run dbt test, these checks execute. If any fail, the build stops. Bad data never reaches Superset.

For auditors, this demonstrates that you have:

Automated data quality checks
Documented business rules
Enforcement mechanisms

Freshness and SLAs

dbt supports freshness checks. You can define SLAs for each model:

sources:
  - name: raw
    database: raw_db
    tables:
      - name: orders
        loaded_at_field: created_at
        freshness:
          warn_after: {count: 12, period: hour}
          error_after: {count: 24, period: hour}

If the raw data hasn’t been updated in 24 hours, dbt flags it as stale. You can integrate this with monitoring tools (Datadog, New Relic, etc.) to alert when SLAs are breached.

For auditors, this shows that you have:

Defined SLAs for critical data
Automated monitoring
Alerting mechanisms

Next Steps and Implementation

You now understand the workflow for turning notebooks into governed dashboards. Here’s how to get started.

Phase 1: Foundation (Weeks 1-4)

Audit your existing analyses

List all analyst notebooks and scripts - Identify the top 5-10 analyses that drive business decisions - Document the logic and data sources

Set up dbt

Create a dbt project (use dbt Cloud for simplicity) - Connect to your data warehouse - Build staging models for your core data sources

Write tests and documentation

Add tests to every model - Write descriptions for every column - Document business logic and ownership

Set up Superset

Deploy Superset (or use Preset Cloud) - Connect to your data warehouse - Expose dbt models as datasets

Phase 2: Migration (Weeks 5-12)

Codify your top analyses

Convert the top 5-10 notebook analyses into dbt models - Rewrite as marts (business-ready models) - Add tests and documentation

Build initial dashboards

Create Superset dashboards on top of dbt marts - Add row-level security and access controls - Document ownership and SLAs

Set up automation

Configure dbt to run on a schedule (daily or more frequent) - Set up monitoring and alerting for failed runs - Document runbooks for common issues

Train your team

Teach analysts how to write dbt models - Teach BI users how to build dashboards - Document best practices and patterns

Phase 3: Scaling (Weeks 13+)

Expand coverage

Migrate remaining analyses from notebooks to dbt - Build more dashboards and reports - Expand to new data sources

Introduce the metrics layer

Define core business metrics in dbt - Expose metrics in Superset - Standardise how metrics are calculated across the org

Optimise and govern

Review model dependencies and optimise slow queries - Implement fine-grained access control - Set up monitoring for data quality and freshness

Compliance and audit-readiness

Document all models and dashboards - Implement change tracking and approval workflows - Prepare for SOC 2 / ISO 27001 audits

Getting Help

If you’re a startup or mid-market company, this is a lot to do in-house. Consider partnering with a fractional CTO or AI automation agency to accelerate the process. A good partner will:

Help you design your dbt project architecture
Set up Superset and configure governance
Migrate your existing analyses
Train your team
Stay on as a fractional resource to guide ongoing development

If you’re in Sydney or Australia, PADISO offers CTO as a Service and custom software development including data stack modernisation. We’ve helped startups and portfolio companies move from ad-hoc analysis to governed analytics in 8-12 weeks.

For larger enterprises, AI strategy and readiness consulting is critical. You need to think about how dbt and Superset fit into your broader AI and ML integration strategy, your security posture, and your platform engineering roadmap.

Conclusion

The shift from notebooks to dashboards is more than a tool change. It’s a shift from exploratory analysis to production analytics. It requires discipline, documentation, and governance.

But the payoff is massive. Once you’ve made this move, your team moves faster. Analysts reuse models instead of rewriting logic. Dashboards stay fresh because they’re built on tested, documented data. Auditors are happy because everything is tracked and auditable.

The dbt + Superset stack is the modern way to do this. It’s used by high-growth startups, scale-ups, and enterprises. It’s open-source, flexible, and designed for teams that want to move fast without sacrificing quality.

Start with your top 5-10 analyses. Get them into dbt. Build dashboards on top. Document and test everything. Then scale from there. In 3-4 months, you’ll have a governance-first analytics layer that your entire organisation trusts.

That’s the goal. That’s what we help teams achieve.

Want to talk through your situation?

Book a 30-minute call with Kevin (Founder/CEO). No pitch - direct advice on what to do next.

Book a 30-min call

From Notebooks to Dashboards: Promoting dbt Models into Apache Superset

From Notebooks to Dashboards: Promoting dbt Models into Apache Superset

Table of Contents

The Problem: Notebooks Don’t Scale

Why dbt + Superset Is the Right Stack

The Workflow: From Exploration to Governance

Stage 1: Exploration (Notebooks)

Stage 2: Codification (dbt)

Stage 3: Exposure (Superset)

Stage 4: Maintenance (dbt + Superset)

Setting Up Your dbt Foundation

Project Structure

Writing Testable Models

Documentation and Metadata

Running dbt in Production

Connecting dbt Models to Superset

Option 1: Direct Database Connection

Option 2: Preset Cloud + dbt Integration

Option 3: dbt Artifacts + Custom Sync

Building Governed Dashboards

Governance-First Design

Dashboard as Code

Building Reusable Charts

Maintaining Lineage and Metadata

dbt Lineage

Superset Lineage

Metadata Governance

Scaling Without Losing Agility

Modular Model Design

Metrics Layer

Documentation as a Living System

Security, Compliance, and Audit-Readiness

Version Control and Change Management

Access Control and PII Handling

Testing and Data Quality

Freshness and SLAs

Next Steps and Implementation

Phase 1: Foundation (Weeks 1-4)

Phase 2: Migration (Weeks 5-12)

Phase 3: Scaling (Weeks 13+)

Getting Help

Conclusion

Want to talk through your situation?