PADISO.ai: AI Agent Orchestration Platform - Launching May 2026
Back to Blog
Guide 18 mins

Apache Superset + ClickHouse: Caching Strategy

Master Superset + ClickHouse caching: query cache, result caching, configuration patterns, benchmarks and operational habits for fast dashboards.

The PADISO Team ·2026-06-10

Apache Superset + ClickHouse: Caching Strategy

Table of Contents


Why Caching Matters

When you run Apache Superset on ClickHouse, you’re operating at the intersection of two powerful systems: Superset’s flexible SQL-to-dashboard layer and ClickHouse’s columnar query engine. Without a deliberate caching strategy, you’ll hit a ceiling fast.

Here’s the real problem: dashboards that refresh every 30 seconds, users hammering the same filters, and ad-hoc queries running identical SQL five times in a morning. Each query hits ClickHouse cold, burning CPU and memory. Across 50 concurrent users, that’s not a performance problem—it’s a cost problem. It’s a reliability problem. It’s a “why did the dashboard take 8 seconds to load?” problem.

Caching solves this by intercepting queries before they reach ClickHouse, serving pre-computed results from memory, and invalidating only when data actually changes. Done right, you’ll see dashboard load times drop from 3–5 seconds to sub-500ms, query costs fall 40–60%, and your infrastructure team stop getting paged at 2 AM.

This guide walks through the patterns, configurations, and operational habits we’ve deployed across our platform development work in Sydney, Melbourne, and across Australia for fintech, retail, and media teams scaling Superset + ClickHouse analytics stacks.


Understanding the Caching Layers

When a user opens a Superset dashboard or runs a query, the request flows through multiple layers. Understanding where caching happens—and where it should happen—is the foundation of any strategy.

The Query Path

A typical flow looks like this:

  1. Superset dashboard load → user opens a dashboard or applies a filter
  2. SQL generation → Superset builds SQL from the chart definition and filter state
  3. Superset result cache check → does the result already exist in cache?
  4. ClickHouse query cache check → if not in Superset cache, does ClickHouse have it cached?
  5. ClickHouse execution → if no cache hit, ClickHouse runs the query against disk/memory
  6. Result return and storage → ClickHouse returns results; Superset caches them

Each layer has different TTLs (time-to-live), invalidation rules, and trade-offs. Most teams miss layer 4 entirely, then wonder why their ClickHouse cluster is burning 80% CPU on repeat queries.

Three Caching Tiers

Tier 1: ClickHouse Query Cache. ClickHouse itself can cache query results in memory. This is the fastest layer—no network round-trip, no deserialization. A cache hit here costs microseconds. The official ClickHouse query cache documentation explains how to enable and tune this for exact-match queries.

Tier 2: Superset Result Cache. Superset can cache query results in Redis, Memcached, or a database backend. This layer survives ClickHouse restarts and can be shared across Superset instances. A cache hit here costs milliseconds (network latency + deserialization).

Tier 3: Dashboard/Browser Cache. Superset’s front-end can cache rendered charts. This is the fastest for repeated dashboard views, but it’s brittle—users often don’t realise they’re looking at stale data.

A robust strategy uses all three, with clear invalidation rules for each.


ClickHouse Query Cache Configuration

ClickHouse’s query cache is simple but powerful. When enabled, it stores exact query results in memory and returns them instantly on repeat queries—even from different users.

Enabling Query Cache

Query cache is disabled by default. Enable it in your ClickHouse config:

<query_cache>
  <max_size_in_bytes>1073741824</max_size_in_bytes>  <!-- 1 GB -->
  <max_entries>1024</max_entries>
  <min_query_duration_ms>100</min_query_duration_ms>
</query_cache>

max_size_in_bytes: Total memory reserved for cached results. Start at 1 GB for a 50-user Superset deployment; scale to 5–10 GB if you have 200+ concurrent users or very large result sets.

max_entries: Maximum number of distinct queries to cache. Each entry is a separate query fingerprint. 1024 is reasonable for most dashboards (typically 20–100 distinct queries per dashboard × 5–10 dashboards).

min_query_duration_ms: Only cache queries that take longer than this threshold. Set this to 100 ms to avoid caching tiny queries that are already fast. This saves memory and keeps the cache focused on expensive queries.

Query Cache Invalidation

ClickHouse’s query cache invalidates automatically when:

  • The underlying table is modified (INSERT, UPDATE, DELETE, ALTER)
  • A table is dropped or renamed
  • A database is dropped

You cannot manually invalidate a single query from cache—this is intentional. If you need fine-grained control, you’ll rely on Superset’s result cache instead.

Query Cache and Superset

Superset doesn’t know about ClickHouse’s query cache. It will still cache results in its own layer. This is fine—it means you get two bites at the apple. A Superset cache miss will still hit ClickHouse’s query cache before hitting disk.

However, there’s a gotcha: ClickHouse’s query cache is exact-match only. If your Superset SQL includes timestamps or random UUIDs (some versions do), the cache will never hit. Check your generated SQL carefully. Superset’s official ClickHouse documentation shows how to verify your connection and inspect generated SQL.


Superset Result Caching

Superset’s result cache is where most teams focus, and for good reason: it’s under your control, it’s transparent to ClickHouse, and it survives infrastructure restarts.

Cache Backend Options

Superset supports several backends. For production Superset + ClickHouse deployments, use one of these:

Redis. Fastest, most reliable. Recommended for teams with 50+ concurrent users or dashboards refreshing more than once per minute. Redis caching patterns documentation covers cache-aside and other patterns Superset uses.

Memcached. Simpler than Redis, no persistence. Good for stateless Superset clusters where cache loss is acceptable. Memcached is lightweight and battle-tested.

Database backend (SQLite, PostgreSQL). Simplest to set up, slowest. Use only for development or very small deployments (under 10 users).

For most production deployments, we recommend Redis. It’s fast, reliable, and integrates well with Superset’s cache invalidation logic.

Configuring Superset Cache

Set these environment variables or in superset_config.py:

# Redis backend
CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': 'redis://localhost:6379/0',
    'CACHE_DEFAULT_TIMEOUT': 3600,  # 1 hour
}

# Query result cache (separate from general cache)
RESULT_CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': 'redis://localhost:6379/1',
    'CACHE_DEFAULT_TIMEOUT': 3600,  # 1 hour
}

Note: Use separate Redis databases (0 and 1 above) for general cache and result cache. This prevents eviction of one from affecting the other.

Result Cache TTL Strategy

TTL (time-to-live) is the most important lever. Too short, and you get cache misses and wasted queries. Too long, and users see stale data.

For Superset + ClickHouse, we typically use:

  • Real-time dashboards (financial data, operational metrics): 60–300 seconds (1–5 minutes)
  • Daily reporting dashboards: 3600 seconds (1 hour)
  • Weekly or monthly reports: 86400 seconds (1 day)

Set TTL at the dataset level in Superset, not globally. Different datasets have different freshness requirements.

# In Superset UI: Dataset → Edit → Advanced → Cache Timeout
# Or via API:
PATCH /api/v1/dataset/{id}
{
  "cache_timeout": 300  # seconds
}

Cache Key Strategy

Superset generates cache keys from the query SQL plus filter state. Two identical queries with the same filters will share a cache entry. However, if your SQL includes user-specific data (e.g., user ID in a WHERE clause), each user gets a separate cache entry.

This is usually fine, but it can fragment the cache. If you have 100 users viewing the same dashboard with user-specific filters, you’ll have 100 cache entries for essentially the same query structure.

To reduce fragmentation, consider:

  1. Parameterised queries: Use Superset’s “Native query” feature to write SQL with named parameters. Superset will cache based on parameter values, not the full SQL string.
  2. Shared aggregations: Pre-compute common aggregations in ClickHouse (materialized views, dictionary tables) so Superset queries are simpler and more cacheable.
  3. Dashboard-level caching: Cache at the dashboard level rather than individual charts, if your BI tool supports it.

Cache-Aside and Write-Through Patterns

Two main patterns emerge when building caching strategies: cache-aside and write-through. Understanding the trade-offs helps you choose the right pattern for your Superset + ClickHouse deployment.

Cache-Aside Pattern

Cache-aside (also called lazy loading) works like this:

  1. User requests a query
  2. Superset checks cache; if hit, return result
  3. If miss, query ClickHouse
  4. Store result in cache
  5. Return to user

This is Superset’s default behaviour. It’s simple and safe—cache misses are handled gracefully, and stale data is impossible (results are only cached after a successful query).

Downside: cold cache means slow first load. If your dashboard has 20 charts and zero cache entries, the first user to open it waits for all 20 queries to run (potentially 20–30 seconds).

Write-Through Pattern

Write-through pre-populates the cache:

  1. Data changes in ClickHouse
  2. A background job detects the change
  3. The job re-runs relevant queries and updates cache
  4. When users request, cache is always warm

This requires more infrastructure—a change-detection system, background jobs, and careful invalidation logic. But it guarantees fast dashboard loads and zero cold-cache latency.

For Superset + ClickHouse, write-through is practical if:

  • Your data updates follow a predictable schedule (e.g., daily ETL)
  • You have a small number of critical dashboards (under 10)
  • You can afford the complexity of background jobs

We’ve deployed write-through patterns for platform development work across the United States where financial services teams need guaranteed sub-100ms dashboard loads. The trade-off: a job scheduler (Airflow, Prefect, or even cron) that refreshes cache on a schedule.

Hybrid Approach

Most production deployments use a hybrid:

  • Cache-aside for ad-hoc queries. Users run custom SQL or apply unusual filters—these hit ClickHouse, then cache for subsequent identical queries.
  • Write-through for critical dashboards. A background job refreshes cache every 5 minutes for the 5–10 most-used dashboards.

This gives you fast loads for common queries and graceful degradation for edge cases.


Benchmarking and Measurement

You can’t optimise what you don’t measure. Before and after implementing caching, measure these metrics.

Key Metrics

Query latency (p50, p95, p99). Measure from user click to dashboard render. Before caching, expect 2–5 seconds for a 10-chart dashboard. After, target sub-500ms for cache hits.

Cache hit rate. Percentage of queries served from cache. Target 70–85% for typical dashboards. Below 50% suggests your TTL is too short or your cache is too small.

ClickHouse CPU and memory. Monitor before and after. A well-cached Superset deployment should reduce ClickHouse CPU by 40–60%.

Cache memory usage. Monitor Redis or Memcached memory. If you’re hitting eviction limits, increase cache size or reduce TTL.

Measurement Tools

Superset logs. Enable debug logging to see cache hits/misses:

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'handlers': {
        'console': {'class': 'logging.StreamHandler'},
    },
    'loggers': {
        'superset': {'handlers': ['console'], 'level': 'DEBUG'},
    },
}

Look for log messages like [CACHE HIT] or [CACHE MISS].

ClickHouse query log. Enable the query log to see which queries are running:

SET log_queries = 1;
SET log_query_threads = 1;

-- Query from system.query_log
SELECT query, query_duration_ms, read_rows, read_bytes
FROM system.query_log
WHERE type = 'QueryFinish'
ORDER BY event_time DESC
LIMIT 100;

Redis/Memcached stats. Use redis-cli or memcached-tool to inspect cache:

redis-cli INFO stats
# Look for: hits, misses, evicted_keys

redis-cli DBSIZE  # Total keys in cache
redis-cli MEMORY STATS  # Memory breakdown

Benchmarking Checklist

  1. Baseline (no cache). Measure query latency and ClickHouse CPU with cache disabled.
  2. Enable ClickHouse query cache. Re-measure. Expect 10–20% improvement.
  3. Enable Superset result cache (short TTL, 5 min). Re-measure. Expect 40–60% improvement.
  4. Increase TTL to 1 hour. Re-measure. Expect 60–80% improvement, with trade-off of slightly stale data.
  5. Add write-through for top 3 dashboards. Re-measure. Expect p50 latency under 200ms.

Document before/after numbers. This is how you justify infrastructure spend to finance and how you track progress over time.


Operational Habits for Cache Management

Caching is not “set and forget.” Successful deployments require ongoing operational discipline.

Monitoring and Alerting

Set up alerts for:

  • Cache hit rate drops below 50%. Suggests cache is too small, TTL too short, or workload has changed.
  • Redis memory usage above 80%. Indicates eviction is happening; increase cache size or reduce TTL.
  • ClickHouse CPU spikes. May indicate cache is not working; check ClickHouse query logs.
  • Dashboard load time exceeds 2 seconds. Indicates cache misses or query performance regression.

Cache Invalidation Strategy

Invalidation is the hard part. Superset automatically invalidates cache when:

  • A dataset’s SQL definition changes
  • A table’s schema changes (in ClickHouse)
  • You manually clear cache (via Superset UI)

But you often need custom invalidation. For example:

  • Data refresh cycles. If your ETL runs at 2 AM, invalidate cache at 2:05 AM to avoid stale morning dashboards.
  • Data corrections. If you discover bad data and re-run ETL, manually invalidate affected cache entries.
  • Schema migrations. Before deploying schema changes, plan cache invalidation.

Implement a simple invalidation policy:

# Example: Invalidate cache for a specific dataset after ETL
from superset.models.core import Database
from superset import db

def invalidate_cache_for_dataset(dataset_id):
    # Clear Superset result cache for this dataset
    cache.delete_memoized(get_query_result, dataset_id)
    
    # Log the invalidation
    logger.info(f"Invalidated cache for dataset {dataset_id}")

Call this function from your ETL pipeline (Airflow, dbt, etc.) after data loads complete.

Cache Warming

For critical dashboards, pre-warm cache on deployment or after invalidation:

# Example: Warm cache by running all charts in a dashboard
from superset.models.dashboard import Dashboard
from superset.models.slice import Slice

def warm_cache_for_dashboard(dashboard_id):
    dashboard = db.session.query(Dashboard).filter_by(id=dashboard_id).first()
    for chart in dashboard.slices:
        # Force execution (will be cached)
        chart.get_json()
    logger.info(f"Warmed cache for dashboard {dashboard_id}")

Run this after deployment or on a schedule (e.g., 8 AM daily before users arrive).

Documentation and Runbooks

Document your caching strategy in a runbook:

  1. Cache configuration. What’s the TTL for each dataset? What’s the cache backend?
  2. Invalidation triggers. When does cache get cleared? (Data refresh? Schema change? Manual?)
  3. Troubleshooting. If dashboards are slow, what’s the first thing to check? (Cache hit rate? ClickHouse CPU? Network latency?)
  4. Scaling limits. At what point does the current cache setup break? (100 concurrent users? 1000 queries/min?)

Share this with your team. Caching is only effective if everyone understands the trade-offs.


Multi-Tenant and Security Considerations

If you’re running embedded Superset analytics for multiple customers (SaaS), caching becomes more complex.

Cache Isolation

Ensure cache entries don’t leak between tenants. Superset handles this via row-level security (RLS) and dataset permissions, but it’s worth verifying.

Check: If user A and user B are in different tenants but query the same table, do they see the same cached result? They shouldn’t.

Superset’s default behaviour: cache keys include the user’s permissions, so different users get different cache entries. But verify this in your deployment.

-- In ClickHouse, use RLS to ensure row-level filtering
ALTER TABLE my_table MODIFY ROW POLICY my_policy FOR SELECT USING tenant_id = currentUser();

Cache Size and Eviction

In a multi-tenant deployment, cache can fill quickly. If you have 50 tenants × 20 dashboards × 10 charts = 10,000 possible cache entries, and your cache is only 1 GB, you’ll hit eviction fast.

Strategy:

  1. Size cache per tenant count. 1 GB per 10 tenants is a reasonable starting point.
  2. Use separate Redis databases per tenant. This prevents one tenant’s queries from evicting another’s.
  3. Monitor eviction per tenant. If tenant A is evicting tenant B’s cache, investigate why tenant A’s queries are so expensive.

Compliance and Audit

Caching can complicate compliance. If you’re subject to SOC 2 or similar audits (many SaaS teams are), document:

  • What data is cached? (PII? Financial data?)
  • Where is it cached? (Memory? Redis? Encrypted?)
  • How long is it retained? (TTL?)
  • Who can access it? (Only authenticated users? Row-level security?)

For teams pursuing SOC 2 compliance via Vanta, caching adds a control point. Make sure your Vanta configuration includes:

  • Cache encryption in transit and at rest
  • Access logging for cache reads/writes
  • Automated TTL enforcement

Troubleshooting Common Cache Issues

When caching goes wrong, it’s often subtle. Here’s how to diagnose and fix common issues.

Problem: Cache Hit Rate is 0%

Cause 1: Cache is disabled. Check your Superset config:

# If this is missing or set to 'null', cache is disabled
CACHE_CONFIG = {'CACHE_TYPE': 'redis', ...}

Cause 2: TTL is too short. If TTL is 10 seconds and users open dashboards every 30 seconds, hit rate will be low.

Solution: Increase TTL. Start with 5 minutes, measure hit rate, adjust.

Cause 3: Queries are non-deterministic. If your SQL includes NOW() or RAND(), every execution is different and cache will never hit.

Solution: Inspect generated SQL in Superset logs. Remove non-deterministic functions.

Problem: Dashboards are Still Slow Despite Caching

Cause 1: Network latency. Cache hit takes 10 ms (network round-trip). If your Redis is in a different region, latency adds up.

Solution: Co-locate Redis and Superset. Use a local Redis instance or a fast Redis cluster in the same VPC.

Cause 2: Superset is slow to render. Cache is fast, but rendering 20 charts in the browser takes time.

Solution: Reduce chart count per dashboard. Use Superset’s “lazy loading” to defer off-screen chart rendering.

Cause 3: ClickHouse query is expensive. Even with cache, the first query (cache miss) is slow.

Solution: Optimise the ClickHouse query. Add indexes, pre-aggregate data, or use materialized views. See Preset’s guide to fast Superset dashboards for detailed optimisation tips.

Problem: Cache Memory Usage is Growing Unbounded

Cause: Cache eviction policy is not working, or cache size is too small.

Solution: Check Redis eviction policy:

redis-cli CONFIG GET maxmemory-policy
# Should return something like 'allkeys-lru' or 'volatile-lru'

If it’s ‘noeviction’, Redis will reject new writes when full. Change it:

redis-cli CONFIG SET maxmemory-policy allkeys-lru

Also, check if you’re caching unnecessarily large results. Superset has a config to limit cache size:

RESULT_CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': 'redis://localhost:6379/1',
    'CACHE_DEFAULT_TIMEOUT': 3600,
    'CACHE_MAX_SIZE_MB': 1024,  # Limit cache to 1 GB
}

Problem: Users are Seeing Stale Data

Cause 1: TTL is too long. Data refreshes every hour, but cache TTL is 8 hours.

Solution: Align TTL with data refresh cycle. If data refreshes hourly, set TTL to 55 minutes.

Cause 2: Cache is not being invalidated. Data changed, but Superset doesn’t know.

Solution: Manually invalidate cache in Superset UI (Dataset → Actions → Clear Cache) or implement custom invalidation in your ETL.

Cause 3: ClickHouse query cache is returning old data. This is rare, but possible if ClickHouse’s query cache TTL is longer than your data’s freshness window.

Solution: Check ClickHouse query cache settings. Ensure max_query_cache_ttl_seconds is aligned with your data freshness requirements.


Multi-Region and Scaling Considerations

As your Superset + ClickHouse deployment grows, caching strategy must evolve.

Single Region (Most Common)

One Superset cluster, one ClickHouse cluster, one Redis cache. Simple and effective for teams under 100 concurrent users. This is typical for platform development work in New York or Toronto where latency between services is sub-10ms.

Multi-Region

If you have Superset or ClickHouse instances in multiple regions (e.g., Sydney and Melbourne), you need a distributed cache strategy.

Option 1: Local Redis per region. Each region has its own Redis. Trade-off: cache is not shared across regions. If a user in Sydney queries, then a user in Melbourne queries the same thing, both hit ClickHouse.

Option 2: Shared Redis cluster. One Redis cluster serving all regions. Trade-off: cross-region latency adds 50–100 ms per cache hit. Faster than querying ClickHouse, but slower than local cache.

For most deployments, Option 1 (local Redis per region) is better. The cache miss rate from not sharing across regions is typically offset by lower latency.

Scaling Beyond 500 Concurrent Users

At this scale, single-instance caching breaks down. Consider:

  1. Redis Cluster. Horizontal scaling of Redis via clustering. Adds complexity but handles 10x+ throughput.
  2. ClickHouse Replication. Multiple ClickHouse replicas to distribute query load. Caching becomes less critical because queries are faster anyway.
  3. Query result pre-aggregation. Move from caching raw query results to caching pre-aggregated tables in ClickHouse. Faster and more scalable.

For teams at this scale, we typically recommend platform engineering to redesign the architecture rather than just scaling caching.


Next Steps

Implementing a caching strategy for Superset + ClickHouse is a phased effort. Here’s a practical roadmap:

Week 1: Baseline and Planning

  1. Measure current state. Dashboard load times, ClickHouse CPU, query latency. Document these.
  2. Audit existing cache config. Is caching already enabled? What’s the TTL? What’s the backend?
  3. Identify bottleneck dashboards. Which 5 dashboards are slowest or most-used?
  4. Plan cache strategy. Which pattern (cache-aside, write-through, hybrid) fits your workload?

Week 2–3: Implementation

  1. Enable ClickHouse query cache. Deploy config changes, monitor for issues.
  2. Configure Superset result cache. Set up Redis or Memcached, configure TTL per dataset.
  3. Implement invalidation logic. Add cache clearing to your ETL pipeline.
  4. Warm cache for critical dashboards. Set up a job to pre-populate cache.

Week 4+: Optimisation and Monitoring

  1. Measure improvement. Compare new latency and CPU to baseline.
  2. Adjust TTL and cache size. Fine-tune based on hit rate and eviction.
  3. Document runbook. Write down your caching strategy, invalidation rules, and troubleshooting steps.
  4. Train team. Ensure engineers understand caching trade-offs and how to troubleshoot.

If you’re building embedded Superset + ClickHouse analytics for a SaaS product, or scaling a platform across regions, caching strategy is critical to unit economics and user experience. The patterns in this guide apply whether you’re in Sydney, the US, or Canada.

For teams needing hands-on support, PADISO provides platform engineering and CTO guidance across Australia and globally. We’ve built and scaled Superset + ClickHouse deployments for fintech, retail, and media companies, and we can help you design and implement a caching strategy tailored to your workload, compliance requirements, and growth trajectory.

Start with the basics: measure, configure, monitor, iterate. Caching is a force multiplier—done right, it buys you months of runway before you need to scale infrastructure.

Want to talk through your situation?

Book a 30-minute call with Kevin (Founder/CEO). No pitch — direct advice on what to do next.

Book a 30-min call