Guide 17 mins

Apache Superset + dbt: Performance Tuning

Master Apache Superset + dbt performance tuning. Configuration patterns, benchmarks, and operational habits for fast dashboards at scale.

The PADISO Team ·2026-06-11

Why Performance Matters in Superset + dbt
Understanding the Superset + dbt Architecture
Database-Level Optimisation
dbt Model Design for Performance
Superset Configuration and Caching
Query Optimisation and Indexing
Monitoring and Observability
Real-World Performance Benchmarks
Operational Habits for Long-Term Performance
Next Steps and Implementation

Why Performance Matters in Superset + dbt {#why-performance-matters}

When you’re running Apache Superset on top of dbt-transformed data, performance isn’t a nice-to-have—it’s the difference between a dashboard that people use and one that gets abandoned. A 10-second load time becomes a 30-second wait. A dashboard that refreshes in 2 minutes becomes a 15-minute slog. Users stop trusting the data. Teams revert to exporting CSVs and building Excel models.

We’ve seen this pattern across startups and mid-market teams running Platform Development in Sydney and across the region. The moment your analytics infrastructure starts to slow, it cascades: analysts spend time debugging queries instead of answering questions, stakeholders lose confidence in real-time metrics, and cost balloons as you scale compute to compensate for inefficient queries.

The good news: most performance problems are predictable and fixable. They live at three layers: your data warehouse configuration, your dbt model design, and your Superset setup. This guide walks through each, with concrete patterns and numbers you can use today.

Understanding the Superset + dbt Architecture {#architecture}

Before you tune anything, you need to understand what you’re tuning. The stack looks like this:

Data Source → dbt Transformations → Data Warehouse → Superset Queries → User Dashboard

Each layer matters. A slow query in dbt doesn’t matter if Superset doesn’t hit it. A fast dbt model doesn’t help if Superset runs an inefficient query on top of it. A perfectly optimised warehouse query doesn’t matter if Superset’s caching is misconfigured and you’re running the same query 50 times a day.

The Apache Superset Documentation covers the basics of architecture and deployment, but it doesn’t tell you how to operate it at scale. That’s what this guide is for.

When you’re working with a venture studio or fractional CTO team—whether you’re based in Platform Development in Melbourne or Platform Development in Austin—you need to understand this stack end-to-end. You can’t delegate performance tuning to one person; it requires coordination between your data engineer (dbt), your warehouse admin, and whoever’s managing Superset.

Database-Level Optimisation {#database-optimisation}

Your data warehouse is the foundation. No amount of Superset tuning will fix a slow warehouse.

Partitioning and Clustering

If you’re running Snowflake, BigQuery, or Redshift, partitioning is non-negotiable. Partition your fact tables by date (usually by day or week, depending on query patterns). Partition dimension tables by natural key if they’re large (>100M rows).

For example, if you have a events table with 2 billion rows, partition by event_date. When Superset runs a query for “last 7 days,” the warehouse scans only 7 partitions instead of 2 billion rows.

BigQuery’s clustering works similarly. If you’re querying by user_id and event_type most often, cluster on those columns. Redshift’s distribution keys matter too—distribute large tables on a column that’s used in joins (usually a foreign key).

Partitioning alone can cut query time by 50–80% for date-filtered dashboards. We’ve seen this consistently across teams running Platform Development in Canberra and government-regulated environments where audit-readiness and performance are equally critical.

Indexing Strategy

Indexes are cheaper in some warehouses than others. Snowflake doesn’t support traditional indexes—it uses clustering and materialized views. BigQuery uses clustering and partitioning. Redshift supports traditional B-tree indexes on smaller tables.

The rule: index columns that are frequently filtered in Superset dashboards. If your dashboard has a filter on customer_segment, index that column. If you’re filtering by created_date, partition instead.

Don’t over-index. Each index adds write overhead and storage cost. For a table that’s written once a day and read 100 times, one index on the most common filter is worth it. For a table that’s updated every minute, indexes become expensive.

Materialized Views and Pre-Aggregation

This is where dbt and your warehouse converge. Instead of having Superset run a query that aggregates 500M rows every time a user opens a dashboard, pre-aggregate in dbt and materialise the result.

If your dashboard shows “revenue by customer segment by day,” create a dbt model that does exactly that aggregation once per day. Store the result in a materialised view. Superset queries the pre-aggregated table instead of the raw data.

This pattern cuts query time from 30–60 seconds to 2–5 seconds. The trade-off: your dashboard is now a few hours behind real-time instead of live. For most business dashboards, that’s a fair trade.

The dbt Models - dbt Documentation explains how to structure these efficiently. Use materialized: table for pre-aggregations, materialized: view for logical groupings, and materialized: incremental for large tables that append-only (events, logs).

dbt Model Design for Performance {#dbt-models}

Your dbt models are the bridge between raw data and Superset dashboards. Design them wrong, and Superset will be slow no matter what you do.

Staging Models and Incremental Builds

The classic dbt pattern is: raw data → staging models → intermediate models → marts. This layering is good for maintainability and performance.

Staging models clean and standardise raw data. They’re usually ephemeral (no table created, just a CTE). Intermediate models join and enrich data. Marts are the final tables that Superset queries.

For large tables (>500M rows), use incremental models. Instead of rebuilding the entire table every day, append only new rows. This cuts dbt run time from 30 minutes to 5 minutes.

-- Example incremental model
{{ config(
  materialized='incremental',
  unique_key='event_id'
) }}

select
  event_id,
  user_id,
  event_type,
  created_at,
  {{ dbt_utils.surrogate_key(['user_id', 'event_id']) }} as event_key
from {{ source('raw', 'events') }}

{% if execute %}
  where created_at >= (select max(created_at) from {{ this }})
{% endif %}

This model rebuilds only rows created after the last run. Superset queries the full table, but dbt’s transformation is fast.

Denormalisation for Speed

Normalised data is good for data integrity. Denormalised data is good for query speed. For Superset dashboards, denormalisation usually wins.

If you have a users table and an orders table, and Superset always queries them together, create a denormalised orders_with_user_details mart that includes user_name, user_segment, user_cohort, etc. in the orders table.

This adds storage cost and complicates updates, but it cuts join overhead and makes Superset queries simpler and faster.

Aggregation Tables

For dashboards that show time-series data (revenue over time, user growth, event counts), create aggregation tables.

Instead of Superset running:

select
  date_trunc('day', created_at) as day,
  customer_segment,
  count(*) as order_count,
  sum(amount) as revenue
from orders
where created_at >= now() - interval '365 days'
group by 1, 2

Every time a user opens the dashboard, create a dbt model that does this aggregation once per day:

{{ config(materialized='table') }}

select
  cast(created_at as date) as order_date,
  customer_segment,
  count(*) as order_count,
  sum(amount) as revenue
from {{ ref('orders') }}
where created_at >= now() - interval '365 days'
group by 1, 2

Superset queries this pre-aggregated table. Query time drops from 20+ seconds to <1 second.

The How Preset Integrates dbt with Apache Superset to Deliver on Headless BI and Surface Metrics resource covers how to structure these relationships and manage them as assets.

Filtering and Pushdown

In dbt, filter early. If a model is only needed for a specific date range or customer segment, filter it in the source CTE, not in downstream models.

-- Bad: filter downstream
with raw_events as (
  select * from {{ source('raw', 'events') }}
),
filtered_events as (
  select * from raw_events where created_at >= '2024-01-01'
)
select * from filtered_events

-- Good: filter in source
with raw_events as (
  select * from {{ source('raw', 'events') }}
  where created_at >= '2024-01-01'
)
select * from raw_events

The second approach pushes the filter down to the source, reducing data scanned and intermediate result sets.

Superset Configuration and Caching {#superset-config}

Superset’s caching layer is where most performance gains happen. Misconfigure it, and you’re running the same expensive query 50 times a day.

Query Caching

Enable query caching in Superset. Set SUPERSET_CACHE_VALUES_FORMAT: 'json' and CACHE_DEFAULT_TIMEOUT: 3600 (1 hour) in your superset_config.py.

This caches query results for 1 hour. If two users open the same dashboard within that window, the second user gets cached results instead of hitting the warehouse.

For dashboards that refresh every 5 minutes (common in operations teams), set CACHE_DEFAULT_TIMEOUT: 300. For dashboards that are queried once per day, set CACHE_DEFAULT_TIMEOUT: 86400.

Chart-Level Caching

Different charts on the same dashboard may need different cache windows. A revenue chart might be cached for 1 hour. A real-time events chart might be cached for 5 minutes.

In Superset, you can set cache timeout per chart. Edit the chart, go to Advanced → Cache Timeout, and set the value in seconds.

Metadata Caching

Superset caches table and column metadata. If you add a new column to a dbt model, Superset might not see it for a few minutes. This is usually fine, but if you’re doing frequent schema changes, set SQLALCHEMY_CACHE_TIMEOUT: 0 to disable metadata caching (at the cost of slightly slower UI).

Redis Configuration

If you’re running Superset at scale (>20 dashboards, >100 concurrent users), you need Redis for caching. Superset’s in-memory cache is fine for small deployments, but it doesn’t scale.

Configure Redis in superset_config.py:

CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': 'redis://localhost:6379/0',
    'CACHE_DEFAULT_TIMEOUT': 3600,
}

Monitor Redis memory usage. If it grows unbounded, you have too many unique queries or your cache timeout is too long.

Query Optimisation and Indexing {#query-optimisation}

Not all queries are created equal. Some Superset queries are inherently slow, and no amount of caching fixes that.

Use SQL Labs to Profile Queries

Superset’s SQL Lab tool lets you write and profile raw SQL. Use it to understand which queries are slow.

Open SQL Lab, write a query that matches what Superset is running, and look at the execution plan. In most warehouses, you can see this with EXPLAIN or EXPLAIN ANALYZE.

For example, in Snowflake:

explain select
  user_id,
  count(*) as event_count,
  sum(amount) as total_amount
from events
where created_at >= '2024-01-01'
group by user_id;

Look for full table scans. If you’re scanning 2 billion rows to answer a query that should scan 100M, you have a partitioning or filtering problem.

Avoid SELECT *

Always select only the columns you need. SELECT * forces the warehouse to read and transfer every column, even if Superset only displays 3 of them.

In Superset, when you build a chart, the generated SQL should include only the columns in the chart. If it doesn’t, you’re using an older version or a misconfigured dataset.

Limit Joins

Each join adds cost. If a dashboard joins 5 tables, the query is 5x more complex than if it joins 2. This is where denormalised dbt models help.

If you must join multiple tables, do it in dbt (once, during transformation) rather than in Superset (every time the dashboard loads).

Avoid Subqueries in Superset

When you build a Superset chart, avoid writing a subquery in the SQL editor. Instead, create a dbt model that does the subquery, and query the model.

Subqueries are harder for the warehouse to optimise. dbt models are materialised tables that the warehouse can plan around.

The Optimize Your Database for Dashboard Performance - Preset guide covers these patterns in detail, with specific examples for different warehouse types.

Monitoring and Observability {#monitoring}

You can’t optimise what you don’t measure. Set up monitoring to track Superset performance over time.

Query Execution Time

Enable query logging in Superset. Set SQLALCHEMY_ECHO: true in superset_config.py to log all SQL queries to stdout. Pipe this to a logging service (CloudWatch, Datadog, etc.).

Track:

Query execution time (P50, P95, P99)
Number of queries per chart
Number of rows scanned per query
Cache hit rate

If P95 query time is >10 seconds, you have a performance problem. If P99 is >30 seconds, it’s critical.

Dashboard Load Time

Superset dashboards load multiple charts in parallel. The dashboard load time is the slowest chart’s execution time (plus some overhead for rendering).

If a dashboard has 10 charts and 1 takes 20 seconds, the dashboard takes 20+ seconds to load. Optimise that one chart, and the whole dashboard is faster.

Use browser DevTools to measure dashboard load time. Open the Network tab, reload the dashboard, and look at the waterfall. Identify the slowest API calls (usually /api/v1/chart_data/...).

Warehouse Metrics

Monitor your warehouse’s own metrics:

Query queue length (if you’re hitting concurrency limits)
Compute cost per query
Storage cost per table
Partition pruning efficiency (% of partitions scanned vs. total)

If your warehouse is queuing queries, you’re hitting concurrency limits. Either reduce query load (via caching) or increase warehouse size.

If compute cost is high, you have inefficient queries or too many queries. Fix the queries first, then consider increasing cache timeout.

Real-World Performance Benchmarks {#benchmarks}

Here’s what good looks like, based on real deployments:

Small Deployment (1–10 dashboards, <50 users)

Query execution time: <2 seconds (P95)
Dashboard load time: <5 seconds
Cache hit rate: 40–60%
Warehouse cost: <$100/month (if using serverless)

At this scale, you don’t need advanced optimisation. Partitioning, basic indexing, and Superset’s default caching are enough.

Medium Deployment (10–50 dashboards, 50–500 users)

Query execution time: <5 seconds (P95)
Dashboard load time: <10 seconds
Cache hit rate: 60–75%
Warehouse cost: $500–$2,000/month

Here, you need pre-aggregation tables, denormalised marts, and careful cache configuration. Most of the optimisation happens in dbt.

Large Deployment (50+ dashboards, 500+ users)

Query execution time: <10 seconds (P95)
Dashboard load time: <15 seconds
Cache hit rate: 75–85%
Warehouse cost: $2,000+/month

At this scale, you need everything: pre-aggregation, denormalisation, advanced caching, query profiling, and dedicated observability. You might also consider a separate analytics warehouse (ClickHouse, Druid) instead of your OLTP database.

Many teams at this scale work with Platform Development in Canada or Platform Development in United States partners who specialise in scaling analytics infrastructure. The complexity is real, and it’s worth getting expert help.

Operational Habits for Long-Term Performance {#operational-habits}

Performance isn’t a one-time fix. It degrades over time as data grows and usage patterns change. Build these habits into your team’s workflow.

Weekly Performance Reviews

Every Monday, check your monitoring dashboards. Look at:

Slowest queries (top 10)
Slowest dashboards (top 10)
Cache hit rate
Warehouse cost

If any query is slower than last week, investigate why. Did data volume grow? Did a new filter get added? Did a dbt model change?

This takes 30 minutes and prevents small problems from becoming big ones.

Quarterly Data Audits

Every quarter, audit your dbt models and Superset datasets:

Which models are actually used by dashboards?
Which dashboards haven’t been opened in 90 days?
Which charts are slow but not used?
Which tables have grown >50% in size?

Delete unused models and dashboards. They’re dead weight that slow down dbt runs and confuse your team.

For tables that have grown significantly, revisit partitioning and indexing. A table that was 10GB and is now 100GB might need new partitions.

dbt Testing and Governance

As your dbt project grows, add tests to catch performance regressions:

Row count tests (alert if a model has 10x more rows than expected)
Freshness tests (alert if a model hasn’t been refreshed in 24 hours)
Custom tests for data quality

These tests catch problems early, before they hit Superset dashboards.

Use dbt’s meta fields to document which models are used by which dashboards. When you change a model, you know immediately which dashboards might be affected.

Version Control for Superset

Superset dashboards are usually managed through the UI, which makes version control hard. But you can export dashboards as JSON and commit them to Git.

Every quarter, export your dashboards and commit them. This gives you a history of changes and makes it easier to recover from accidental deletions.

curl -X GET http://localhost:8088/api/v1/dashboards/1/export/ \
  -H "Authorization: Bearer $SUPERSET_TOKEN" > dashboard_1.json

Commit this to Git. If a dashboard breaks, you can see exactly what changed.

Cost Optimisation

As data volume grows, warehouse cost grows. But it shouldn’t grow linearly. If your data volume grows 50% but your cost grows 100%, you have a problem.

Common causes:

Too many queries (fix with caching)
Inefficient queries (fix with indexing and pre-aggregation)
Unnecessary data (delete old partitions)
Over-provisioned compute (reduce warehouse size)

Most cloud warehouses let you set cost alerts. Set an alert for 10% above your expected monthly cost. When it fires, investigate.

Next Steps and Implementation {#next-steps}

Performance tuning is a process, not a destination. Here’s how to get started:

Week 1: Baseline and Monitoring

Enable query logging in Superset
Set up monitoring (CloudWatch, Datadog, or similar)
Profile your slowest 10 queries using SQL Lab
Document current performance metrics (P50, P95, P99 query time)

Week 2–3: Quick Wins

Enable query caching in Superset (set CACHE_DEFAULT_TIMEOUT: 3600)
Add partitioning to your largest tables (by date, usually)
Create 2–3 pre-aggregation tables for your most-used dashboards
Set chart-level cache timeouts for high-traffic dashboards

Week 4+: Structural Changes

Refactor your dbt project to use staging/intermediate/mart layers
Create denormalised marts for your dashboards
Add incremental builds to large tables
Set up weekly performance reviews

When to Get Help

If your dashboards are still slow after these steps, or if you don’t have the in-house expertise to implement them, it’s time to bring in specialists.

Teams running Platform Development in New York, Platform Development in San Francisco, or Platform Development in Seattle often work with platform engineering teams who specialise in analytics infrastructure. The same applies if you’re based in Australia—Platform Development in Australia teams can help you scale from a basic setup to a production-grade analytics platform.

If you’re a founder or CEO running a startup, consider working with a CTO as a Service partner who can help you build this infrastructure from the ground up, rather than trying to fix it after it’s broken.

For teams pursuing SOC 2 or ISO 27001 compliance, analytics infrastructure is part of your audit. Superset and dbt should be integrated into your security and change control processes. This is where a partner who understands both Platform Design & Engineering and compliance becomes valuable.

Resources

For deeper dives into specific topics:

The Apache Superset Documentation covers deployment, configuration, and advanced features.
The dbt Models - dbt Documentation explains model materialisation and best practices.
How Preset Integrates dbt with Apache Superset to Deliver on Headless BI and Surface Metrics shows how to structure dbt and Superset together at scale.
The Optimize Your Database for Dashboard Performance - Preset guide covers database-specific optimisation patterns.
For a broader view of analytics architecture, Apache Superset: A hidden gem of BI tools? – II. - Hiflylabs walks through how Superset fits into a modern stack.

Summary

Apache Superset + dbt is a powerful combination, but only if you tune it right. Performance comes from three layers: your data warehouse (partitioning, indexing, pre-aggregation), your dbt models (staging, incremental builds, denormalisation), and your Superset configuration (caching, query optimisation).

Start with the quick wins: enable caching, partition your largest tables, create 2–3 pre-aggregation models. Then move to structural changes: refactor your dbt project, denormalise for speed, set up monitoring.

Most importantly, build operational habits. Weekly performance reviews, quarterly audits, and cost monitoring keep your system fast and cost-effective as data grows.

If you’re building analytics infrastructure for a startup or scaling it for a mid-market team, the complexity is real. Getting expert help—whether from a CTO as a Service partner, a venture studio, or a platform engineering team—is often worth the investment. The alternative is a slow, expensive, unmaintainable system that nobody trusts.

Start this week. Profile your slowest queries. Enable caching. Measure the impact. Build from there.

Want to talk through your situation?

Book a 30-minute call with Kevin (Founder/CEO). No pitch - direct advice on what to do next.

Book a 30-min call