Guide 20 mins

Apache Superset + dbt: Caching Strategy

Master Superset + dbt caching: configuration patterns, Redis setup, warm-up strategies, and operational habits for sub-second dashboard load times.

The PADISO Team ·2026-06-02

Why Caching Matters in Superset + dbt Stacks
Understanding the Caching Layers
Redis Configuration for Superset
dbt Integration and Query Pre-Computation
Warm-Up and Cache Priming Strategies
Monitoring and Tuning Cache Hit Rates
Operational Habits and Runbooks
Common Pitfalls and How to Avoid Them
Summary and Next Steps

Why Caching Matters in Superset + dbt Stacks

Apache Superset dashboards run against data models built and maintained by dbt. Without caching, every dashboard refresh triggers fresh SQL queries against your data warehouse. For teams running dozens of dashboards across tens of thousands of rows—or worse, running analytics at scale across regulated industries—this approach kills user experience and inflates compute costs.

The math is straightforward. A single Superset dashboard with eight charts, each querying a dbt model with a 10-second execution time, takes 80 seconds to load. Add five concurrent users, and you’re burning 400 seconds of warehouse compute per minute. Over a month, that’s measurable spend and visible lag.

Caching flips this dynamic. When configured correctly, caching strategies reduce dashboard load time from seconds to sub-100 millisecond responses. Users see data instantly. Your warehouse compute footprint drops by 70–90% for read-heavy workloads. And because dbt maintains your data models as code, you control exactly what gets cached and when it refreshes.

This guide walks through the concrete patterns, configuration, and operational discipline needed to run a production Superset + dbt stack with reliable, maintainable caching. We’ll cover Redis setup, dbt-aware cache invalidation, warm-up strategies, and the monitoring habits that keep caches healthy.

Understanding the Caching Layers

Superset’s caching architecture sits across three distinct layers. Understanding each layer is essential before you configure any of them.

Query Result Caching (Redis)

The most important layer for performance is query result caching. When a user runs a chart or dashboard, Superset executes the underlying SQL query and stores the result set in Redis. Subsequent requests for the same query—from the same user or different users—retrieve the cached result instead of hitting the warehouse again.

This layer is where you’ll see the most dramatic performance gains. A 10-second query becomes a 50-millisecond cache hit. The tradeoff is staleness: cached results are only as fresh as your cache TTL (time-to-live) allows. For a 1-hour TTL, results are guaranteed to be no older than 1 hour.

Query result caching is controlled by the CACHE_DEFAULT_TIMEOUT setting in Superset’s configuration and can be overridden per-dashboard or per-chart. You’ll also configure the cache backend—Redis is the standard choice for production workloads.

Dashboard and Chart Metadata Caching

Superset also caches dashboard and chart metadata: the definitions of charts, their filters, their relationships to datasets, and computed column logic. This layer is lighter-weight than query result caching but still important for dashboards with many charts or complex filter logic.

Metadata caching is typically handled by Superset’s internal cache layer and is less often tuned than query result caching. However, it’s worth understanding because it can create unexpected staleness if you update a chart definition and the metadata cache hasn’t refreshed.

Data Source and Database Connection Caching

Superset caches metadata about your databases and data sources—table schemas, column names, data types. This layer is essential for the UI to be responsive when users build charts. Without it, every time someone opens the chart builder, Superset would query your warehouse for the full schema.

This layer is less frequently a performance bottleneck but can cause problems if your dbt models change (new columns added, types modified) and Superset’s schema cache doesn’t refresh in time.

Interaction Between Layers

These three layers interact in important ways. When you update a dbt model—adding a column, changing a metric calculation—the query result cache doesn’t automatically invalidate. A user might see old results until the cache TTL expires. This is why warm-up and cache invalidation strategy matters: you need explicit processes to refresh caches when your dbt models change.

Redis Configuration for Superset

Redis is the recommended cache backend for production Superset deployments. It’s fast, supports expiration, and integrates seamlessly with Superset’s caching layer.

Setting Up Redis

First, deploy a Redis instance. For production workloads, use a managed Redis service (AWS ElastiCache, Azure Cache for Redis, or equivalent). For smaller teams, a single Redis instance with 4–8 GB of memory is a reasonable starting point.

Configure Superset to use Redis as the cache backend. In your superset_config.py file:

CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': 'redis://<host>:<port>/0',
    'CACHE_DEFAULT_TIMEOUT': 3600,  # 1 hour in seconds
}

Also configure the results cache, which stores query results separately:

RESULTS_CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': 'redis://<host>:<port>/1',
    'CACHE_DEFAULT_TIMEOUT': 86400,  # 24 hours
}

Using separate Redis databases (0 for metadata, 1 for results) keeps concerns separated and makes it easier to flush one cache without affecting the other.

Memory Management and Eviction Policy

Redis stores caches in memory. When Redis reaches its memory limit, it evicts old entries according to the eviction policy. The default policy (noeviction) will cause writes to fail once memory is exhausted—not ideal for a production cache.

Set an explicit eviction policy:

maxmemory-policy allkeys-lru

This tells Redis to evict the least-recently-used keys when memory is full. This ensures your cache keeps serving requests even under memory pressure, though you lose older results.

Monitor Redis memory usage continuously. If you’re consistently hitting 80%+ utilisation, increase your Redis instance size or reduce cache TTLs.

Connection Pooling

Superset runs multiple worker processes. Each worker needs a connection to Redis. Configure connection pooling to avoid exhausting Redis connections:

CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': 'redis://<host>:<port>/0',
    'CACHE_KEY_PREFIX': 'superset',
    'CACHE_DEFAULT_TIMEOUT': 3600,
}

Superset handles connection pooling internally, but if you’re running many workers (e.g., 16+ Gunicorn workers), monitor Redis connection count to ensure you’re not hitting connection limits.

Compression and Serialisation

For large result sets, consider enabling compression. Superset uses pickle for serialisation by default, which can produce large objects in Redis. If your dashboards query millions of rows, compress results to save memory:

RESULTS_CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': 'redis://<host>:<port>/1',
    'CACHE_DEFAULT_TIMEOUT': 86400,
}

Note: Superset’s compression support is limited in older versions. Test thoroughly before enabling in production.

dbt Integration and Query Pre-Computation

The real power of caching emerges when you align it with dbt’s data model lifecycle. dbt runs on a schedule (typically daily or more frequently), materialising models and computing metrics. Superset queries these models. By understanding this pipeline, you can cache intelligently.

Staging and Mart Models

dbt typically organises models into staging (raw, lightly transformed) and mart (business-ready, aggregated) layers. For Superset caching, always query mart models, not staging models. Mart models are slower to compute (more transformations) but are the source of truth for analytics. Caching mart query results gives the biggest performance win.

In Superset, create datasets that point to dbt mart models, not staging. This ensures your cached results are stable and business-aligned.

Materialisations and Cache Strategy

dbt offers several materialisation strategies: views, tables, incremental models, and dynamic tables. Each affects caching differently.

Views are computed on-the-fly. Querying a view in Superset means Superset’s cache stores the full result set. This is fine for small views but problematic for large ones. A 50-million-row view takes time to compute; caching the result is essential but the result set itself is large.

Tables are materialised once and queried directly. Querying a table is faster than querying a view, so caching is less critical—but still valuable. A table with a 2-second query time cached becomes a 50-millisecond hit.

Incremental models add only new rows each run, making them ideal for time-series data. Superset can cache these results with a short TTL (e.g., 1 hour) to reflect new data frequently without re-computing the entire model.

For most Superset + dbt stacks, use table materialisations for mart models. Tables are fast to query and cache hits are dramatic.

Pre-Computed Metrics and dbt Metrics

dbt 1.0+ introduced the metrics feature, allowing you to define business metrics as code. Superset can query these metrics directly. Pre-computing metrics in dbt and caching the results in Superset is a powerful pattern.

Example dbt metric:

metrics:
  - name: monthly_revenue
    label: Monthly Revenue
    model: ref('fct_orders')
    expression: "sum(order_amount)"
    timestamp: order_date
    time_grains: [day, week, month, quarter, year]

When Superset queries this metric, it gets a pre-defined, governed calculation. Caching the result ensures consistency and performance.

Explicit Cache Invalidation with dbt

The challenge: when dbt runs and refreshes a model, Superset’s cache doesn’t know the underlying data has changed. Users might see stale results until the cache TTL expires.

Solve this with explicit cache invalidation. After dbt finishes a run, trigger a cache refresh in Superset. This can be done via Superset’s API:

curl -X DELETE https://superset.example.com/api/v1/cache \
  -H "Authorization: Bearer $SUPERSET_TOKEN" \
  -H "Content-Type: application/json"

Or, more granularly, invalidate caches for specific datasets:

curl -X DELETE https://superset.example.com/api/v1/datasets/{dataset_id}/cache \
  -H "Authorization: Bearer $SUPERSET_TOKEN"

Integrate this into your dbt post-hook or orchestration tool (Airflow, Dagster, etc.). After dbt finishes, call the Superset API to clear caches for affected datasets.

Warm-Up and Cache Priming Strategies

Cache invalidation is half the battle. The other half is ensuring caches are warm when users need them. A cold cache means the first user to load a dashboard waits for a full query execution. Warm-up strategies pre-populate caches so users always hit fast results.

Dashboard Warm-Up

The simplest warm-up strategy is to pre-load dashboards. After your dbt run completes and you’ve invalidated old caches, immediately load each dashboard in Superset. This forces all underlying queries to execute and populate the cache.

You can automate this with a script that loads each dashboard via the Superset API or via headless browser automation (Selenium, Playwright). Here’s a Python example using the Superset API:

import requests
import time

SUPERSET_URL = "https://superset.example.com"
DASHBOARD_IDS = [1, 2, 3, 4, 5]  # Your dashboard IDs
TOKEN = "your-superset-api-token"

headers = {"Authorization": f"Bearer {TOKEN}"}

for dashboard_id in DASHBOARD_IDS:
    url = f"{SUPERSET_URL}/api/v1/dashboard/{dashboard_id}"
    response = requests.get(url, headers=headers)
    print(f"Warmed dashboard {dashboard_id}")
    time.sleep(2)  # Rate limit to avoid overwhelming the warehouse

This approach is simple but has limits: it only warms the default view of each dashboard. If users apply custom filters, those results aren’t cached.

Query-Level Warm-Up

For more control, warm up specific queries. Identify the most-used charts and their common filter combinations. Write a script that executes these queries directly against your warehouse, storing results in Superset’s cache:

import requests

SUPERSET_URL = "https://superset.example.com"
DATASET_ID = 42
TOKEN = "your-superset-api-token"

headers = {"Authorization": f"Bearer {TOKEN}"}

query_params = {
    "datasource_id": DATASET_ID,
    "sql": "SELECT * FROM fct_orders WHERE order_date >= CURRENT_DATE - 30",
}

response = requests.post(
    f"{SUPERSET_URL}/api/v1/query",
    json=query_params,
    headers=headers
)
print(response.json())

This warms the cache for specific queries. Run this after your dbt pipeline completes.

Scheduled Warm-Up Jobs

For dashboards viewed frequently during business hours, schedule warm-up jobs to run before peak usage times. If your team starts work at 8 AM, run warm-up at 7:55 AM. This ensures caches are hot when users arrive.

Use your orchestration tool (Airflow, Dagster, GitHub Actions, or even cron) to schedule warm-up jobs:

# Example Airflow DAG
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta

default_args = {
    'start_date': datetime(2024, 1, 1),
    'retries': 1,
}

with DAG('superset_warmup', default_args=default_args, schedule_interval='0 7 * * MON-FRI') as dag:
    warm_up_task = PythonOperator(
        task_id='warm_up_dashboards',
        python_callable=warm_up_dashboards,
    )

Cache Warm-Up for dbt Incremental Models

For incremental dbt models (e.g., daily event tables), warm-up is especially valuable. Incremental models add new data daily. The full historical query might take 30 seconds, but querying just today’s data takes 2 seconds. Cache both:

Full historical results (cached for 24 hours)
Today’s data (cached for 1 hour, refreshed hourly)

This way, users querying the full dashboard see cached historical data, and users drilling into today’s metrics see fresh data.

Monitoring and Tuning Cache Hit Rates

You can’t improve what you don’t measure. Monitoring cache hit rates and query performance is essential to tuning your caching strategy.

Redis Metrics

Monitor Redis directly for cache health:

redis-cli INFO stats

Key metrics:

keyspace_hits: Number of successful cache lookups
keyspace_misses: Number of cache misses (queries that hit the warehouse)
hit_rate: hits / (hits + misses)

A healthy production cache should have a hit rate of 70%+ for read-heavy dashboards. If hit rates are below 50%, your TTLs might be too short or your warm-up strategy needs improvement.

redis-cli INFO memory

Watch memory usage. If Redis is consistently above 80% capacity, either increase instance size or reduce cache TTLs.

Superset Query Performance Metrics

Superset logs query execution times. Enable detailed logging:

import logging
logging.getLogger('superset.sql_lab').setLevel(logging.DEBUG)

Parse logs to identify slow queries. Any query taking more than 10 seconds is a candidate for caching or optimisation.

Dashboard Load Time Monitoring

Monitor end-to-end dashboard load times from the user’s perspective. Use browser performance APIs or tools like Datadog or New Relic to track:

Time to first chart (TTFC)
Time to interactive (TTI)
Total page load time

Track these metrics over time. After implementing caching, you should see a 50–70% reduction in load times.

Identifying Cache Misses

When a cache miss occurs, Superset executes a warehouse query. These are expensive and slow. Identify patterns in cache misses:

Are users applying filters not covered by warm-up?
Are dashboards being accessed at unexpected times?
Is your TTL too short?

Use Superset’s query logs to identify the most common cache misses. Then adjust your warm-up strategy or TTLs to cover these cases.

Alerting on Cache Degradation

Set up alerts for cache health degradation:

Alert if hit rate drops below 60%
Alert if Redis memory usage exceeds 85%
Alert if average query time increases by 50%

These alerts signal that your caching strategy needs tuning.

Operational Habits and Runbooks

Caching is not a set-it-and-forget-it system. Production caching requires operational discipline and clear runbooks.

Daily Monitoring Checklist

Every morning, check:

Redis health: redis-cli ping should return PONG. Check memory usage.
Cache hit rate: Should be above 70% for normal workloads.
Dashboard load times: Sample a few dashboards. Should load in under 2 seconds.
Recent dbt runs: Did the latest dbt run complete successfully? Did cache invalidation trigger?
User complaints: Any reports of slow dashboards or stale data?

Cache Invalidation Runbook

When dbt models change (schema updates, metric changes, etc.), follow this runbook:

Identify affected datasets: Which Superset datasets depend on the changed dbt models?
Invalidate caches: Call the Superset API to clear caches for affected datasets.
Warm up caches: Re-load dashboards or run warm-up queries to pre-populate caches.
Verify: Sample dashboards to ensure results are fresh and queries are fast.
Document: Log the change and the invalidation action for audit purposes.

Handling Cache Corruption

Occasionally, caches become corrupted (stale data that doesn’t match the warehouse, inconsistent results). When this happens:

Flush all caches: redis-cli FLUSHDB (be careful—this clears all caches)
Restart Superset workers: Ensures clean state
Run warm-up: Re-populate caches from scratch
Monitor hit rates: Ensure caches rebuild correctly

This is a nuclear option but sometimes necessary. Document why corruption occurred to prevent future incidents.

Scaling Cache Infrastructure

As your Superset usage grows, caching needs scale too. Watch for these signals:

Redis memory consistently above 80%
Cache hit rate declining despite stable usage
Query timeouts increasing

When you see these signals, consider:

Increasing Redis instance size: More memory = more cache capacity
Reducing cache TTLs: Shorter TTLs = smaller cache footprint but more frequent refreshes
Implementing a cache hierarchy: Use a local in-process cache for frequently accessed data, Redis for distributed caching
Sharding Redis: For very large deployments, shard Redis across multiple instances

Documentation and Change Control

Document your caching strategy:

What data is cached? How long?
When are caches invalidated?
How are caches warmed up?
Who is responsible for monitoring?

Maintain a change log of cache configuration updates. This helps with debugging and audit trails.

Common Pitfalls and How to Avoid Them

Pitfall 1: TTLs Too Long, Users See Stale Data

Problem: You set cache TTL to 24 hours to maximise cache hits. Users complain that dashboards show yesterday’s data.

Solution: Match TTL to data freshness requirements. For real-time dashboards, use 1-hour TTLs. For daily summaries, 24-hour TTLs are fine. Document TTL decisions per dashboard.

Pitfall 2: Cache Invalidation Not Triggered After dbt Runs

Problem: dbt updates a model, but Superset’s cache doesn’t refresh. Users see stale results.

Solution: Integrate cache invalidation into your dbt orchestration. After dbt finishes, call the Superset API to invalidate affected caches. Automate this—don’t rely on manual steps.

Pitfall 3: Redis Runs Out of Memory, Caches Evict Randomly

Problem: Redis hits its memory limit. Caches are evicted, hit rates plummet, dashboards slow down.

Solution: Monitor Redis memory continuously. Set alerts at 80% capacity. Increase instance size before you hit limits. Use an appropriate eviction policy (allkeys-lru is usually best).

Pitfall 4: Warm-Up Strategy Doesn’t Cover User Queries

Problem: Warm-up loads default dashboard views, but users apply custom filters. These filtered queries aren’t cached, so they’re slow.

Solution: Analyse actual user queries. Identify the top 20 filter combinations. Add these to your warm-up strategy. Alternatively, accept that some queries will be slower and focus on caching the most common cases.

Pitfall 5: Caching Hides Performance Problems

Problem: A poorly written dbt model takes 30 seconds to compute. You cache the result for 24 hours. The cache hits 95% of the time, so the problem is hidden.

Solution: Don’t use caching to mask performance issues. Optimise underlying queries first, then add caching for additional speed. A 2-second query cached is better than a 30-second query cached.

Pitfall 6: No Visibility Into Cache Performance

Problem: You don’t monitor cache hit rates or query times. You don’t know if caching is working.

Solution: Instrument everything. Log cache hits and misses. Track query execution times. Set up dashboards to visualise cache health. Make cache performance visible to the team.

Advanced Caching Patterns

Semantic Layer Caching

For teams using a semantic layer (Cube, Looker, or similar), caching strategy changes. The semantic layer sits between Superset and your warehouse, pre-aggregating metrics and dimensions. When Superset queries the semantic layer, results are already optimised. Caching on top of this is still valuable but the performance gains are smaller.

When using a semantic layer, focus caching on the semantic layer itself, not on Superset. This ensures all downstream tools (Superset, Tableau, etc.) benefit from the cache.

For more details on this pattern, see how Superset integrates with semantic layers like Cube.

Multi-Tenant Caching

If you’re running Superset for multiple tenants (SaaS product with per-customer dashboards), cache isolation is critical. One tenant’s queries shouldn’t evict another tenant’s cache. Configure cache key prefixes per tenant:

CACHE_KEY_PREFIX = f"superset_{tenant_id}"

This ensures each tenant has isolated cache space. Monitor per-tenant cache hit rates separately.

Hybrid Caching (Redis + In-Process)

For very high-traffic dashboards, use a two-level cache:

In-process cache: Fast, in-memory cache within each Superset worker. TTL: 5–10 minutes.
Redis cache: Distributed cache shared across all workers. TTL: 1–24 hours.

On a cache miss at the in-process level, check Redis. On a Redis miss, query the warehouse. This reduces Redis load and improves hit times for frequently accessed data.

Superset supports this via the CACHE_CONFIG with a local cache layer, though configuration is complex. Consult the official Superset documentation for details.

Integration with Platform Engineering

For teams building or modernising data platforms, caching is a critical component of the architecture. If you’re running Superset + dbt on your platform, you need caching strategy baked into your platform design.

At PADISO, we work with teams across Australia and globally building data platforms that scale. Whether you’re in Sydney, Melbourne, or Canberra, caching strategy is part of the platform engineering conversation. For teams in the US, we support similar work across New York, Washington, DC, Chicago, Austin, and Dallas.

When designing a data platform with Superset and dbt, consider:

Cache infrastructure: Is Redis managed or self-hosted? What’s your failover strategy?
dbt integration: How does dbt orchestration trigger cache invalidation?
Monitoring: What metrics matter? How do you alert on cache degradation?
Governance: Who owns caching strategy? How are TTL decisions made?
Cost: Does caching reduce warehouse compute costs? By how much?

These questions are fundamental to platform engineering. Get them right, and you build a data platform that scales reliably. Get them wrong, and you’ll be firefighting slow dashboards and high compute bills.

For teams modernising regulated platforms or building multi-tenant SaaS, caching strategy also intersects with compliance. If you’re pursuing SOC 2 or ISO 27001 compliance, cache security matters: what data is in Redis? How is it encrypted? Who can access it? These are audit questions, not just performance questions.

Summary and Next Steps

Apache Superset + dbt caching is a learnable, measurable discipline. Here’s what you now know:

Caching layers: Query results, metadata, schema. Each serves a purpose.
Redis setup: Use managed Redis, configure appropriate TTLs, monitor memory and hit rates.
dbt alignment: Cache mart models, not staging. Materialise as tables. Invalidate caches after dbt runs.
Warm-up strategy: Pre-load dashboards and queries before users arrive. Automate this.
Monitoring: Track hit rates, query times, Redis memory. Alert on degradation.
Operational discipline: Follow runbooks. Document decisions. Iterate based on data.

Immediate Actions

Baseline current performance: Load a dashboard. How long does it take? Check Redis stats—what’s your current hit rate?
Deploy Redis: If you don’t have one, set up a managed Redis instance. Configure Superset to use it.
Set initial TTLs: Start conservative (1–4 hours). Monitor hit rates. Adjust based on data.
Automate cache invalidation: After your dbt pipeline, call the Superset API to clear caches for changed models.
Implement warm-up: Write a script that loads your top 10 dashboards after dbt finishes.
Monitor: Set up dashboards to track cache hit rate, Redis memory, query times. Review daily.

30-Day Plan

Week 1: Deploy Redis, configure Superset, establish baseline metrics.
Week 2: Implement cache invalidation and warm-up automation.
Week 3: Analyse cache hit rates. Identify slow queries. Optimise dbt models if needed.
Week 4: Refine TTLs and warm-up strategy based on real usage patterns. Document your strategy.

When to Seek Help

Caching strategy is straightforward but requires discipline. If you’re building a data platform from scratch or modernising an existing one, consider engaging a platform engineering partner. At PADISO, we’ve built Superset + dbt stacks for teams across Australia, the US, Canada, and New Zealand. We know the patterns that work, the pitfalls to avoid, and how to scale caching as your data platform grows.

If you’re running Superset + dbt and need help with caching strategy, architecture review, or platform modernisation, get in touch. We work with founders, operators, and engineering teams to ship reliable, scalable data platforms.

Want to talk through your situation?

Book a 30-minute call with Kevin (Founder/CEO). No pitch - direct advice on what to do next.

Book a 30-min call