From Notebooks to Dashboards: Promoting dbt Models into Apache Superset
Turn exploratory analyst notebooks into governed Superset dashboards. Learn the workflow for promoting dbt models with lineage, metadata, and audit-readiness.
From Notebooks to Dashboards: Promoting dbt Models into Apache Superset
Table of Contents
- The Problem: Notebooks Don’t Scale
- Why dbt + Superset Is the Right Stack
- The Workflow: From Exploration to Governance
- Setting Up Your dbt Foundation
- Connecting dbt Models to Superset
- Building Governed Dashboards
- Maintaining Lineage and Metadata
- Scaling Without Losing Agility
- Security, Compliance, and Audit-Readiness
- Next Steps and Implementation
The Problem: Notebooks Don’t Scale
Your analysts are brilliant. They spin up Jupyter notebooks, Databricks notebooks, or SQL scripts. They explore data, find patterns, build models, and generate insights. The work is exploratory, iterative, and fast. That’s the strength of notebooks.
But then the business asks: “Can we turn this into a dashboard?” Or worse: “Who owns this analysis? What data went into it? Is it still accurate?” Suddenly, that beautiful exploratory work becomes a liability. Notebooks are:
- Not version-controlled in any meaningful way (git diffs are painful, collaboration is messy)
- Not reusable (each analyst rewrites the same transformations)
- Not governed (no metadata, no lineage, no audit trail)
- Not scalable (running 50 notebooks by hand isn’t sustainable)
- Not compliant (if you’re pursuing SOC 2 or ISO 27001 audit-readiness, notebooks are a red flag)
You need a way to capture that exploratory energy but move it into a system that’s production-ready, auditable, and reusable. That’s where dbt and Apache Superset come in.
Why dbt + Superset Is the Right Stack
If you’re building AI-driven products or modernising your data infrastructure, you’ve probably heard of dbt. If not, here’s the one-liner: dbt (data build tool) is a framework for transforming data in your warehouse using SQL, with version control, testing, and documentation built in.
Apache Superset is the open-source BI layer. It’s lightweight, flexible, and designed to work with any SQL database. Together, they form a clean separation of concerns:
- dbt handles the “T” in ELT: transformations, testing, lineage, and metadata
- Superset handles the “V” in visualization: dashboards, charts, and user-facing analytics
This stack is especially valuable if you’re working with startups or mid-market teams who need to move fast. Unlike Tableau or Looker, you’re not buying expensive enterprise licenses. Unlike a custom dashboard built from scratch, you’re not reinventing the wheel. You get governance, auditability, and speed.
For teams pursuing SOC 2 or ISO 27001 compliance via Vanta, this approach is also cleaner. dbt models live in version control. Superset dashboards can be defined as code. Audit trails are explicit. Change management is traceable.
The Workflow: From Exploration to Governance
Let’s map out the journey from notebook to dashboard.
Stage 1: Exploration (Notebooks)
Your analyst writes exploratory SQL in a notebook. They’re asking questions:
- “What’s the churn rate by cohort?”
- “How do customer acquisition costs trend over time?”
- “Which features correlate with retention?”
They iterate fast. They write throwaway queries. They refine until they find something interesting.
Stage 2: Codification (dbt)
Once the analysis is solid, it moves into dbt. The analyst writes a dbt model (a .sql file with a SELECT statement). They add tests to ensure data quality. They document the model and its columns. They commit to git.
At this stage, the transformation is:
- Reproducible (same input, same output, every time)
- Testable (row counts, uniqueness, referential integrity)
- Documented (descriptions, owner, business logic)
- Versioned (git history, rollback capability)
Stage 3: Exposure (Superset)
Superset connects to your warehouse and reads the dbt models. The analyst (or a BI engineer) builds a dashboard on top of these models. The dashboard is:
- Governed (row-level security, column-level access)
- Reusable (other analysts can build on the same models)
- Auditable (who viewed what, when)
- Shareable (embedded, scheduled, exported)
Stage 4: Maintenance (dbt + Superset)
When the underlying data changes (a column is renamed, a calculation is updated), you update the dbt model once. All dependent dashboards inherit the change. You don’t have 50 notebooks to update.
This workflow is not theoretical. It’s how leading data teams at high-growth startups structure their analytics. And if you’re working with a venture studio or fractional CTO, this is the pattern they’ll recommend.
Setting Up Your dbt Foundation
Before you can promote models into Superset, you need a solid dbt project. Here’s what that looks like.
Project Structure
Start with a standard dbt project layout:
my-analytics/
├── dbt_project.yml # Project config
├── models/
│ ├── staging/ # Raw data, lightly transformed
│ │ ├── stg_customers.sql
│ │ ├── stg_orders.sql
│ │ └── stg_events.sql
│ ├── marts/ # Business logic, ready for BI
│ │ ├── fct_orders.sql
│ │ ├── dim_customers.sql
│ │ └── dim_time.sql
│ └── intermediate/ # Reusable building blocks
│ └── int_customer_metrics.sql
├── tests/ # Data quality tests
│ └── assert_*.sql
├── macros/ # Reusable SQL snippets
└── docs/ # Documentation
└── schema.yml
This structure matters. Staging models are close to raw data. Marts are business-ready. Intermediate models are reusable components. When you expose these in Superset, you’ll want to expose only the marts (and maybe intermediate models). Staging models are implementation details.
Writing Testable Models
Every dbt model should have tests. Tests catch data quality issues before they reach a dashboard. Common tests include:
- Not null: column must have a value
- Unique: column must be unique (e.g., customer_id)
- Accepted values: column must be one of a set of values (e.g., status IN (‘active’, ‘inactive’))
- Relationships: foreign key integrity (e.g., order.customer_id references customer.id)
Here’s an example schema.yml file that documents and tests your models:
version: 2
models:
- name: fct_orders
description: "One row per order. Core fact table."
columns:
- name: order_id
description: "Primary key."
tests:
- unique
- not_null
- name: customer_id
description: "Foreign key to dim_customers."
tests:
- relationships:
to: ref('dim_customers')
field: customer_id
- name: order_amount
description: "Total order value in USD."
tests:
- not_null
- name: order_status
description: "Order state."
tests:
- accepted_values:
values: ['pending', 'confirmed', 'shipped', 'delivered', 'cancelled']
- name: dim_customers
description: "One row per customer. Slowly changing dimension."
columns:
- name: customer_id
description: "Primary key."
tests:
- unique
- not_null
When you run dbt test, these checks execute. If they fail, the build stops. Bad data never reaches Superset.
Documentation and Metadata
The schema.yml file is also where you document your data. Each model and column gets a description. These descriptions flow into Superset’s metadata layer, so your BI users understand what they’re looking at.
You can also add custom metadata using dbt properties. For example:
models:
- name: fct_orders
meta:
owner: "analytics"
pii: false
refresh_frequency: "daily"
business_unit: "finance"
This metadata is queryable via dbt’s manifest.json and can be used to enforce governance rules in Superset.
Running dbt in Production
Once your models are solid, you need a way to run them on a schedule. Common approaches:
- dbt Cloud (managed service, built by the dbt team)
- Airflow (orchestration, full control)
- GitHub Actions (lightweight, git-native)
- Cron jobs (simple, but less visibility)
For a startup or small team, dbt Cloud is the easiest. For larger teams with complex dependencies, Airflow is more flexible. The key is: models run on a schedule (e.g., daily), and Superset always reads the latest results.
Connecting dbt Models to Superset
Now that your dbt models are production-ready, let’s get them into Superset.
Option 1: Direct Database Connection
The simplest approach: point Superset at your data warehouse (Postgres, BigQuery, Snowflake, etc.) and let it discover your dbt models.
- Add a database connection in Superset:
- Go to Settings → Data Connections → + Database - Choose your warehouse type - Enter connection details (host, username, password, database name) - Test the connection
- Refresh the schema:
- Go to Data → Datasets - Superset auto-discovers tables and views - Your dbt models appear as tables (if they’re built as tables) or views (if they’re built as views)
- Expose only the marts:
- In Superset, uncheck the staging and intermediate models from the “show in UI” setting - Only the business-ready marts are visible to BI users
This approach works, but it’s not optimal. Superset doesn’t know about dbt metadata (descriptions, tests, lineage). Dashboards are decoupled from dbt.
Option 2: Preset Cloud + dbt Integration
Preset is Superset’s official managed cloud service. It has native dbt integration. When you connect Preset to your dbt Cloud account, something magical happens:
- dbt model descriptions automatically populate Superset dataset columns
- dbt exposures (a dbt feature for documenting downstream usage) become Superset dashboard metadata
- Model lineage flows into Superset’s UI
- You can define metrics in dbt and expose them in Superset
This is the gold standard. Your dbt project becomes the single source of truth for both transformations and analytics. Here’s the workflow:
- Define dbt exposures in your
schema.yml:
exposures:
- name: customer_health_dashboard
type: dashboard
maturity: production
owner:
name: "Analytics Team"
email: "analytics@company.com"
description: "Monitor customer churn and LTV by cohort."
depends_on:
- ref('fct_orders')
- ref('dim_customers')
url: "https://superset.company.com/dashboard/customer-health"
- Sync to Preset:
- Preset reads your dbt Cloud metadata - Model descriptions and lineage populate automatically - Exposures appear as documented dashboard references
- Build dashboards in Superset:
- Charts are built on top of dbt models - Metadata is inherited from dbt - Lineage is tracked end-to-end
If you’re working with a Sydney-based AI agency or venture studio helping you scale, they’ll likely recommend Preset for exactly this reason: it collapses the gap between data engineering and analytics.
Option 3: dbt Artifacts + Custom Sync
If you’re self-hosting Superset and want dbt integration without Preset, you can use a Python package like dbt-superset-lineage to sync dbt metadata into Superset’s metadata store.
The workflow:
- dbt runs and generates a
manifest.json(contains all model metadata) - A scheduled Python script reads the manifest
- The script updates Superset’s dataset columns with dbt descriptions
- Lineage is tracked in Superset’s database
This approach requires more engineering but gives you full control. It’s useful if you have a custom data stack or need to integrate dbt with other tools (like a data catalog).
Building Governed Dashboards
Once your dbt models are in Superset, it’s time to build dashboards. But not all dashboards are created equal.
Governance-First Design
When you’re promoting notebooks into production, governance matters. You need to ensure:
- Row-level security (RLS): analysts only see data they’re authorised to see
- Column-level access: sensitive columns (PII, revenue) are hidden from unauthorised users
- Audit trails: who viewed what, when
- Change tracking: who modified the dashboard, what changed
Superset supports RLS through SQL filters. For example:
-- Only show orders for the user's region
SELECT * FROM fct_orders
WHERE region = '{{ current_user.region }}'
You can also use Superset’s native RBAC (role-based access control) to restrict dashboard and dataset access by role.
Dashboard as Code
One of Superset’s underrated features: you can export dashboards as JSON and version-control them. This is powerful because:
- Dashboards are reproducible: you can recreate them in a new Superset instance
- Changes are tracked: git history shows who changed what
- Dashboards can be templated: you can build a dashboard once and clone it for multiple teams
To export a dashboard:
- Go to the dashboard
- Click Actions → Download as JSON
- Commit the JSON to git
To import:
- Go to Dashboards → Import Dashboard
- Upload the JSON file
- Superset rebuilds the dashboard
This pattern is especially useful if you’re working with platform engineering or custom software development teams who want to codify your entire analytics layer.
Building Reusable Charts
Instead of building one-off charts for each dashboard, build reusable chart components. For example:
- A “Daily Active Users” chart that takes a table name as a parameter
- A “Cohort Retention” chart that works across multiple cohort definitions
- A “Revenue Waterfall” chart that aggregates by any dimension
Superset supports parameterised charts through native filters. You define a filter (e.g., “Select a region”), and charts update dynamically. This reduces duplication and makes dashboards easier to maintain.
Maintaining Lineage and Metadata
Here’s where the dbt + Superset stack really shines: lineage and metadata are explicit and auditable.
dbt Lineage
Every dbt model knows its dependencies. If fct_orders depends on stg_orders, which depends on the raw orders table, dbt tracks this. You can visualise this in dbt Cloud or generate a DAG (directed acyclic graph).
When you make a change to stg_orders, you immediately know which downstream models are affected. This is invaluable when you’re debugging data issues or planning migrations.
Superset Lineage
Superset has a lineage feature (in recent versions) that shows which dashboards depend on which datasets. Combined with dbt lineage, you get end-to-end visibility:
raw.orders table
↓
stg_orders (dbt model)
↓
fct_orders (dbt model)
↓
fct_orders dataset (Superset)
↓
Revenue Dashboard (Superset)
If the raw data changes, you know exactly which dashboards are affected.
Metadata Governance
Metadata is the glue. It includes:
- Ownership: who owns this model or dashboard
- Freshness: when was it last updated
- Quality: how many tests does it have, do they pass
- PII flags: does this model contain sensitive data
- Business context: what business question does this answer
Store this in dbt’s meta properties:
models:
- name: fct_orders
meta:
owner: "finance-team"
pii: false
sla: "daily"
business_unit: "revenue"
tags: ["core", "production"]
Then, use this metadata to enforce governance rules:
- Models with
pii: truerequire additional access controls - Models with
sla: dailytrigger alerts if they’re stale - Models tagged
corerequire additional testing
This is especially important if you’re pursuing SOC 2 or ISO 27001 compliance. Auditors want to see that you track data ownership, freshness, and quality. dbt + Superset gives you that visibility.
Scaling Without Losing Agility
As your analytics layer grows, you’ll have 50+ models and 100+ dashboards. How do you maintain agility?
Modular Model Design
The key is modularity. Build your dbt project in layers:
- Staging models (stg_*): lightly transformed raw data, one model per source table
- Intermediate models (int_*): reusable building blocks, not exposed to BI users
- Marts (fct_ and dim_): business-ready, exposed to BI users
This structure means:
- Analysts can reuse intermediate models without duplicating logic
- Changes to raw data are isolated to staging models
- Business logic is centralised in marts
- Dashboards only depend on marts, not raw data
Metrics Layer
Recent versions of dbt introduce a metrics layer. Instead of hardcoding calculations in dashboards, you define metrics in dbt:
metrics:
- name: revenue
label: "Total Revenue"
model: ref('fct_orders')
description: "Sum of all order amounts."
calculation_method: sum
expression: order_amount
timestamp: order_date
dimensions:
- customer_id
- region
- product_category
Then, in Superset, you can build charts directly on these metrics. If the revenue calculation changes, you update it once in dbt, and all dashboards inherit the change.
This is how modern data teams at scale structure their analytics. It’s also how leading AI agencies help portfolio companies avoid the “dashboard sprawl” problem.
Documentation as a Living System
As your project grows, documentation becomes critical. In dbt, documentation is code:
models:
- name: fct_orders
description: |
# Order Facts
One row per order. Updated daily.
## Grain
- order_id (unique)
## Key Relationships
- customer_id → dim_customers
- product_id → dim_products
## Freshness
- Updated: {{ run_started_at }}
- SLA: 24 hours
## Ownership
- Team: Finance
- Slack: #analytics-finance
When you run dbt docs generate, this becomes an interactive website. Analysts can explore your data model without leaving their browser. This is invaluable for onboarding new team members.
Security, Compliance, and Audit-Readiness
If you’re working with a startup or mid-market company that’s pursuing SOC 2 or ISO 27001 compliance via Vanta, this section is critical.
Version Control and Change Management
All dbt models live in git. Every change is tracked:
- Who made the change (git author)
- When (commit timestamp)
- Why (commit message)
- What changed (git diff)
For auditors, this is gold. You have a complete audit trail of all data transformations. You can answer questions like:
- “When was the revenue calculation changed?”
- “Who has access to the customer table?”
- “What changed in the churn model between January and February?”
Superset dashboards can also be version-controlled (as JSON exports). Combined with dbt, you have end-to-end change tracking.
Access Control and PII Handling
With dbt + Superset, you can implement fine-grained access control:
- Data warehouse level: use warehouse-native RLS (Snowflake, BigQuery, Postgres all support this)
- dbt level: tag models with
meta.pii: trueand enforce stricter governance - Superset level: use RBAC and SQL filters to restrict dashboard access
For example, if you have a customers table with PII (email, phone, SSN), you might:
- Create a dbt staging model
stg_customersthat includes PII (for internal use only) - Create a dbt mart model
dim_customers_safethat excludes PII (for BI users) - Expose only
dim_customers_safein Superset - Tag
stg_customerswithmeta.pii: trueto flag it for auditors
Testing and Data Quality
Data quality is a compliance requirement. Every dbt model should have tests. Tests are:
- Automated: run on every dbt build
- Auditable: test results are logged
- Enforceable: failed tests block deployment
Common test patterns:
models:
- name: fct_orders
columns:
- name: order_id
tests:
- unique
- not_null
- name: customer_id
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- name: order_amount
tests:
- not_null
- dbt_utils.expression_is_true:
expression: "order_amount >= 0"
When you run dbt test, these checks execute. If any fail, the build stops. Bad data never reaches Superset.
For auditors, this demonstrates that you have:
- Automated data quality checks
- Documented business rules
- Enforcement mechanisms
Freshness and SLAs
dbt supports freshness checks. You can define SLAs for each model:
sources:
- name: raw
database: raw_db
tables:
- name: orders
loaded_at_field: created_at
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
If the raw data hasn’t been updated in 24 hours, dbt flags it as stale. You can integrate this with monitoring tools (Datadog, New Relic, etc.) to alert when SLAs are breached.
For auditors, this shows that you have:
- Defined SLAs for critical data
- Automated monitoring
- Alerting mechanisms
Next Steps and Implementation
You now understand the workflow for turning notebooks into governed dashboards. Here’s how to get started.
Phase 1: Foundation (Weeks 1-4)
- Audit your existing analyses
- List all analyst notebooks and scripts - Identify the top 5-10 analyses that drive business decisions - Document the logic and data sources
- Set up dbt
- Create a dbt project (use dbt Cloud for simplicity) - Connect to your data warehouse - Build staging models for your core data sources
- Write tests and documentation
- Add tests to every model - Write descriptions for every column - Document business logic and ownership
- Set up Superset
- Deploy Superset (or use Preset Cloud) - Connect to your data warehouse - Expose dbt models as datasets
Phase 2: Migration (Weeks 5-12)
- Codify your top analyses
- Convert the top 5-10 notebook analyses into dbt models - Rewrite as marts (business-ready models) - Add tests and documentation
- Build initial dashboards
- Create Superset dashboards on top of dbt marts - Add row-level security and access controls - Document ownership and SLAs
- Set up automation
- Configure dbt to run on a schedule (daily or more frequent) - Set up monitoring and alerting for failed runs - Document runbooks for common issues
- Train your team
- Teach analysts how to write dbt models - Teach BI users how to build dashboards - Document best practices and patterns
Phase 3: Scaling (Weeks 13+)
- Expand coverage
- Migrate remaining analyses from notebooks to dbt - Build more dashboards and reports - Expand to new data sources
- Introduce the metrics layer
- Define core business metrics in dbt - Expose metrics in Superset - Standardise how metrics are calculated across the org
- Optimise and govern
- Review model dependencies and optimise slow queries - Implement fine-grained access control - Set up monitoring for data quality and freshness
- Compliance and audit-readiness
- Document all models and dashboards - Implement change tracking and approval workflows - Prepare for SOC 2 / ISO 27001 audits
Getting Help
If you’re a startup or mid-market company, this is a lot to do in-house. Consider partnering with a fractional CTO or AI automation agency to accelerate the process. A good partner will:
- Help you design your dbt project architecture
- Set up Superset and configure governance
- Migrate your existing analyses
- Train your team
- Stay on as a fractional resource to guide ongoing development
If you’re in Sydney or Australia, PADISO offers CTO as a Service and custom software development including data stack modernisation. We’ve helped startups and portfolio companies move from ad-hoc analysis to governed analytics in 8-12 weeks.
For larger enterprises, AI strategy and readiness consulting is critical. You need to think about how dbt and Superset fit into your broader AI and ML integration strategy, your security posture, and your platform engineering roadmap.
Conclusion
The shift from notebooks to dashboards is more than a tool change. It’s a shift from exploratory analysis to production analytics. It requires discipline, documentation, and governance.
But the payoff is massive. Once you’ve made this move, your team moves faster. Analysts reuse models instead of rewriting logic. Dashboards stay fresh because they’re built on tested, documented data. Auditors are happy because everything is tracked and auditable.
The dbt + Superset stack is the modern way to do this. It’s used by high-growth startups, scale-ups, and enterprises. It’s open-source, flexible, and designed for teams that want to move fast without sacrificing quality.
Start with your top 5-10 analyses. Get them into dbt. Build dashboards on top. Document and test everything. Then scale from there. In 3-4 months, you’ll have a governance-first analytics layer that your entire organisation trusts.
That’s the goal. That’s what we help teams achieve.