Guide 21 mins

Apache Superset + Trino: Caching Strategy

Master Superset + Trino caching: configuration, benchmarks, and operational habits for fast analytics. Practitioner guide for production deployments.

The PADISO Team ·2026-06-12

Why Caching Matters in Superset + Trino
Understanding the Superset + Trino Stack
Caching Architecture and Tiers
Configuring Redis and Memcached
Query-Level Caching Patterns
Dashboard Caching Strategy
Monitoring and Observability
Operational Habits and Runbooks
Benchmarks and Real-World Performance
Common Pitfalls and How to Avoid Them
Next Steps

Why Caching Matters in Superset + Trino {#why-caching-matters}

When you run Apache Superset on top of Trino, you’re connecting a visual analytics layer to a distributed SQL query engine. That’s a powerful combination—Trino excels at querying data across multiple sources at scale—but it introduces latency. Every dashboard refresh, every drill-down, every ad-hoc query hits Trino’s coordinator and workers. Without a caching strategy, you’ll see:

Dashboard load times creeping above 10–15 seconds
Duplicate queries hammering Trino unnecessarily
Coordinator CPU spiking during peak reporting hours
Poor user experience, especially for non-technical stakeholders expecting sub-second interactivity

Caching bridges the gap. It trades storage for speed. When configured correctly, caching can reduce query latency from seconds to milliseconds, cut Trino compute costs by 40–60%, and give your analytics users a snappy, responsive experience. This guide walks you through the patterns, configurations, and operational habits that make it work in production.

If you’re building analytics infrastructure at scale—whether for a regulated industry like financial services or a fast-moving tech startup—caching strategy is non-negotiable. At PADISO, we’ve embedded Superset + Trino stacks across multiple platform development engagements in Sydney, Melbourne, and across Australia, and caching configuration is always a critical conversation with engineering teams.

Understanding the Superset + Trino Stack {#understanding-the-stack}

Before diving into caching, let’s establish how Superset and Trino interact.

The Query Path

When a user interacts with a Superset dashboard:

User clicks a filter or refreshes the dashboard in the Superset UI
Superset constructs a SQL query based on the chart definition, filters, and aggregations
Superset sends the query to Trino via JDBC or the Trino Python client
Trino’s coordinator receives the query, plans it, and distributes work to worker nodes
Worker nodes scan source data (which might be Hive, PostgreSQL, S3, Iceberg, or any of Trino’s 40+ connectors)
Results flow back through the coordinator to Superset
Superset renders the results as a chart or table

Each step introduces latency. Without caching, typical queries take 2–10 seconds. For a dashboard with 6–8 charts, that’s 12–80 seconds to load—unacceptable for interactive analytics.

Trino’s Role

Trino is a query engine, not a data warehouse. It doesn’t store data; it computes over data wherever it lives. This design is flexible and powerful but means every query is a fresh computation. Trino does have internal query result caching at the coordinator level, but it’s limited—it only caches the last N query results in memory, and it’s lost on coordinator restart.

Superset’s Role

Superset is the query builder and UI layer. It constructs SQL, sends it to Trino, and caches the results—not the query itself. This is the key distinction. Superset’s cache stores the dataset returned from Trino so that subsequent identical queries return instantly from cache rather than hitting Trino again.

Caching Architecture and Tiers {#caching-architecture}

Superset supports multiple caching tiers, each with different trade-offs.

Tier 1: Query Result Caching (Redis/Memcached)

This is the primary caching layer. When a chart query executes, Superset caches the result set—the rows and columns returned from Trino—in a fast, in-memory store like Redis or Memcached.

How it works:

Superset generates a cache key from the query SQL, filters, and parameters
If the key exists in Redis, the cached result is returned immediately (typically <50ms)
If not, Superset executes the query against Trino, waits for results, and stores them in Redis with a TTL (time-to-live)
Subsequent identical queries hit the cache

Pros:

Sub-millisecond latency for cached queries
Dramatically reduces Trino load
Works across all query types

Cons:

Requires Redis or Memcached infrastructure
Cache invalidation can be tricky (more on this below)
Memory usage grows with dataset size and cardinality

Tier 2: Metadata and Dashboard Cache

Superset also caches metadata—datasource definitions, column lists, aggregation functions—separately. This is typically stored in the Superset metadata database (PostgreSQL or MySQL) and doesn’t require Redis.

When to use:

For dashboards with many charts that share the same datasources
To avoid repeated metadata queries to Trino

Tier 3: Trino Internal Cache

Trino itself caches query plans and some intermediate results within a single query execution, but this is ephemeral and coordinator-local. It’s not a substitute for application-level caching.

Tier 4: Data Warehouse Materialized Views

For heavy-use dashboards, consider materialised views in your underlying data source (Hive, Iceberg, or PostgreSQL). Superset queries the view instead of the raw table, and the view is pre-aggregated and pre-joined. This is the most efficient but requires careful planning and maintenance.

Configuring Redis and Memcached {#configuring-backends}

Let’s get practical. Here’s how to set up caching in Superset.

Redis (Recommended)

Redis is the industry standard for Superset caching. It’s fast, supports expiration, and integrates seamlessly.

Step 1: Deploy Redis

If you’re running on Kubernetes (recommended for production), use a managed Redis service or deploy a Redis StatefulSet:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: superset-redis
spec:
  serviceName: superset-redis
  replicas: 1
  selector:
    matchLabels:
      app: superset-redis
  template:
    metadata:
      labels:
        app: superset-redis
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        ports:
        - containerPort: 6379
        resources:
          requests:
            memory: "4Gi"
            cpu: "2"
          limits:
            memory: "8Gi"
            cpu: "4"
        volumeMounts:
        - name: redis-data
          mountPath: /data
  volumeClaimTemplates:
  - metadata:
      name: redis-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 50Gi

For production, add Redis Sentinel or Redis Cluster for high availability. If you’re on AWS, use ElastiCache; on Azure, use Azure Cache for Redis; on GCP, use Memorystore.

Step 2: Configure Superset

Edit your superset_config.py (or environment variables):

# Cache backend
CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': 'redis://superset-redis:6379/0',
    'CACHE_DEFAULT_TIMEOUT': 3600,  # 1 hour default TTL
}

# Query result cache (separate from general cache)
CACHE_QUERY_RESULTS = True
CACHE_QUERY_RESULTS_TTL = 3600  # seconds
CACHE_QUERY_RESULTS_BACKEND = 'redis'

# Data cache (for datasource metadata)
DATA_CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': 'redis://superset-redis:6379/1',
    'CACHE_DEFAULT_TIMEOUT': 86400,  # 24 hours
}

Or use environment variables:

export SUPERSET_CACHE_TYPE='redis'
export SUPERSET_CACHE_REDIS_URL='redis://superset-redis:6379/0'
export SUPERSET_CACHE_QUERY_RESULTS='true'
export SUPERSET_CACHE_QUERY_RESULTS_TTL='3600'

Step 3: Test the Connection

After deploying, run Superset’s cache test:

superset flask cache-clear
superset flask cache-warmup

If the commands succeed, Redis is wired correctly.

Memcached (Alternative)

Memcached is lighter-weight than Redis and works well for smaller deployments. Configuration is similar:

CACHE_CONFIG = {
    'CACHE_TYPE': 'memcached',
    'CACHE_MEMCACHED_SERVERS': ['superset-memcached:11211'],
    'CACHE_DEFAULT_TIMEOUT': 3600,
}

Memcached lacks some Redis features (no pub/sub, no persistence), so Redis is preferred for production.

Sizing Your Cache

A rule of thumb: allocate 1–2 GB per 100 active charts. For a 50-chart dashboard with 10 concurrent users and 1-hour TTL:

Average result set size: 5 MB per chart
Concurrent cache entries: 50 charts × 10 users = 500 entries
Memory needed: 500 × 5 MB = 2.5 GB (plus overhead)

Start with 4 GB and monitor. If Redis memory usage exceeds 70%, increase allocation or reduce TTL.

Query-Level Caching Patterns {#query-level-patterns}

Caching works best when you understand the query patterns driving your dashboards.

Pattern 1: Time-Series Dashboards with Daily Aggregations

Scenario: A dashboard showing revenue by day, region, and product. Users refresh throughout the day.

Caching strategy:

Set TTL to 15 minutes for intraday updates
Superset will cache the query result for 15 minutes
After 15 minutes, the next query refreshes the cache

# In the chart's SQL query, add a comment with cache TTL
SELECT 
  date_trunc('day', event_time) AS day,
  region,
  product,
  SUM(revenue) AS total_revenue
FROM events
WHERE event_time >= current_date - interval '90' day
GROUP BY 1, 2, 3
ORDER BY 1 DESC
-- cache_ttl: 900  (15 minutes in seconds)

Actually, Superset doesn’t parse that comment; instead, set the cache TTL in the chart’s advanced settings UI or via the API.

Expected performance:

First load: 3–5 seconds (Trino query)
Subsequent loads (within 15 min): <100 ms (Redis cache)
After 15 min: 3–5 seconds again

Pattern 2: Ad-Hoc Queries with Variable Filters

Scenario: Users run exploratory queries with different date ranges, regions, and filters. Cache hit rate is low because filters change frequently.

Caching strategy:

Use shorter TTL (5–10 minutes) because queries vary
Or, pre-compute common filter combinations and cache those

Example:

Instead of caching arbitrary filter combinations, pre-build charts for common scenarios:

“Last 7 days, all regions”
“Last 30 days, top 5 regions”
“Year-to-date, by product”

Cache these aggressively (TTL = 1 hour). For ad-hoc queries, use shorter TTL (5 min) or no cache.

Pattern 3: Heavy Aggregations and Joins

Scenario: A query that joins 3 tables, aggregates across 5 dimensions, and scans 500 GB of data. Without caching, it takes 8 seconds.

Caching strategy:

Cache aggressively (TTL = 2–4 hours) because recomputation is expensive
Consider materialised views in Trino (Iceberg, Hive) to avoid the join at query time

CREATE TABLE iceberg.analytics.revenue_by_region_daily AS
SELECT 
  date_trunc('day', o.order_time) AS day,
  c.region,
  p.category,
  COUNT(DISTINCT o.order_id) AS order_count,
  SUM(o.amount) AS revenue
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
WHERE o.order_time >= current_date - interval '2' year
GROUP BY 1, 2, 3;

Superset then queries the materialised table (which is pre-joined and pre-aggregated), and caching becomes even more effective because Trino’s query time drops from 8 seconds to <1 second.

Pattern 4: Real-Time Dashboards

Scenario: A dashboard tracking live metrics—active users, current throughput, error rates—that must refresh every 30 seconds.

Caching strategy:

Use very short TTL (30 seconds) or no cache
Alternatively, disable caching for specific charts and cache others
Consider using Trino’s Superset integration documentation to optimise query execution time itself

In Superset’s chart settings, you can disable cache per-chart:

# Chart definition in Superset API
{
  "cache_timeout": 0,  # Disable cache for this chart
  "query_language": "sql",
  "sql": "SELECT ..."
}

Dashboard Caching Strategy {#dashboard-caching}

Dashboards are collections of charts, and caching strategy changes when you think about the dashboard as a whole.

Dashboard-Level TTL

Superset allows you to set a cache TTL at the dashboard level, which overrides individual chart TTLs. This is useful for ensuring all charts refresh together.

When to use:

Financial dashboards that must all reflect the same data snapshot
Compliance dashboards where data consistency across charts is critical

How to set it:

In the Superset UI: Dashboard → Edit → Advanced → “Cache timeout (seconds)”

Or via API:

curl -X PUT http://superset:8088/api/v1/dashboards/123 \
  -H "Content-Type: application/json" \
  -d '{"cache_timeout": 1800}'

Staggered Refresh (Cache Warming)

For dashboards with many charts, avoid the “thundering herd” problem where all charts refresh simultaneously, spiking Trino load.

Instead, implement cache warming: pre-fetch query results during off-peak hours so they’re cached and ready when users arrive.

Example: Python script to warm cache

import requests
import time
from datetime import datetime

SUPERSET_URL = 'http://superset:8088'
DASHBOARD_ID = 123

def warm_dashboard_cache(dashboard_id):
    """
    Refresh all charts on a dashboard to populate cache.
    Run this during off-peak hours (e.g., 6 AM UTC).
    """
    # Get dashboard definition
    resp = requests.get(
        f'{SUPERSET_URL}/api/v1/dashboards/{dashboard_id}',
        headers={'Authorization': f'Bearer {get_token()}'}
    )
    dashboard = resp.json()['result']
    
    # Iterate over charts and trigger refresh
    for chart in dashboard['slices']:
        chart_id = chart['id']
        print(f"Warming cache for chart {chart_id}...")
        
        # Trigger a query to populate cache
        requests.post(
            f'{SUPERSET_URL}/api/v1/chart/{chart_id}/data',
            headers={'Authorization': f'Bearer {get_token()}'},
            json={}
        )
        
        # Stagger requests to avoid overwhelming Trino
        time.sleep(2)
    
    print(f"Cache warming completed at {datetime.now()}")

if __name__ == '__main__':
    warm_dashboard_cache(DASHBOARD_ID)

Schedule this script to run at 6 AM or 7 AM (before your busiest hours) using a cron job or Kubernetes CronJob.

Partial Caching

For dashboards where some charts are expensive and others are cheap, cache only the expensive ones:

Expensive chart (joins 5 tables, scans 1 TB): TTL = 2 hours
Cheap chart (single table scan): TTL = 0 (no cache)

This balances performance and freshness.

Monitoring and Observability {#monitoring}

Caching is only effective if you can see what’s happening.

Key Metrics to Track

Cache Hit Rate: Percentage of queries served from cache vs. Trino
- Target: 60–80% for typical dashboards
- Formula: cache_hits / (cache_hits + cache_misses)
Cache Memory Usage: Total bytes stored in Redis
- Monitor via redis-cli INFO memory
- Alert if usage exceeds 80% of allocated memory
Query Latency (P50, P95, P99):
- Cached queries: <100 ms
- Uncached queries: 2–10 seconds
Trino Query Load: Queries per minute to Trino
- Should drop 40–60% after caching is enabled

Instrumenting Superset

Add logging to track cache operations. Edit superset_config.py:

import logging

LOGGING_CONFIG = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'standard': {
            'format': '%(asctime)s [%(levelname)s] %(name)s: %(message)s'
        },
    },
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
            'formatter': 'standard',
        },
        'file': {
            'class': 'logging.FileHandler',
            'filename': '/var/log/superset/cache.log',
            'formatter': 'standard',
        },
    },
    'loggers': {
        'superset.cache': {
            'handlers': ['console', 'file'],
            'level': 'DEBUG',
        },
    },
}

Then parse the logs to extract cache hit/miss rates:

grep 'cache hit' /var/log/superset/cache.log | wc -l
grep 'cache miss' /var/log/superset/cache.log | wc -l

Prometheus Metrics

For production, expose Prometheus metrics from Superset and Redis:

# In superset_config.py
from prometheus_client import Counter, Histogram

cache_hits = Counter('superset_cache_hits_total', 'Total cache hits')
cache_misses = Counter('superset_cache_misses_total', 'Total cache misses')
query_duration = Histogram('superset_query_duration_seconds', 'Query duration')

Then scrape these metrics in Prometheus and visualise in Grafana.

Redis Monitoring

Connect to Redis and check cache health:

redis-cli INFO stats
# Output:
# total_connections_received:1234
# total_commands_processed:56789
# instantaneous_ops_per_sec:12

redis-cli INFO memory
# Output:
# used_memory:2147483648
# used_memory_human:2G
# maxmemory:4294967296
# maxmemory_human:4G
# used_memory_peak:2516582400

redis-cli DBSIZE
# Output: (integer) 5432  <- number of cached entries

Operational Habits and Runbooks {#operational-habits}

Caching requires ongoing care. Here are the operational habits that keep it running smoothly.

Daily Checks

Every morning, run:

#!/bin/bash
# check_cache_health.sh

REDIS_HOST="superset-redis"
REDIS_PORT="6379"

echo "=== Cache Health Check ==="
echo "Timestamp: $(date)"

# Check Redis connectivity
if redis-cli -h $REDIS_HOST -p $REDIS_PORT ping > /dev/null 2>&1; then
    echo "✓ Redis is reachable"
else
    echo "✗ Redis is unreachable - ALERT"
    exit 1
fi

# Check memory usage
MEMORY_USED=$(redis-cli -h $REDIS_HOST -p $REDIS_PORT INFO memory | grep used_memory_human | cut -d: -f2 | tr -d '\r')
echo "Memory used: $MEMORY_USED"

# Check number of keys
KEY_COUNT=$(redis-cli -h $REDIS_HOST -p $REDIS_PORT DBSIZE | grep -oE '[0-9]+')
echo "Cached entries: $KEY_COUNT"

# Check for evictions (sign of memory pressure)
EVICTIONS=$(redis-cli -h $REDIS_HOST -p $REDIS_PORT INFO stats | grep evicted_keys | cut -d: -f2 | tr -d '\r')
if [ "$EVICTIONS" -gt 0 ]; then
    echo "⚠ Warning: $EVICTIONS keys have been evicted due to memory pressure"
fi

echo "=== End Check ==="

Run this as a cron job at 7 AM:

0 7 * * * /scripts/check_cache_health.sh | mail -s "Cache Health Report" ops@company.com

Weekly Cache Maintenance

Clear stale entries:

Redis automatically expires keys based on TTL, but you can manually clear old entries:

# Clear all cache (use with caution—this will cause a performance dip)
redis-cli FLUSHDB

# Or, more safely, clear specific patterns
redis-cli --scan --pattern "superset:*" | xargs redis-cli DEL

Do this during off-peak hours (e.g., Sunday 2 AM).

Analyse cache hit rate:

import redis
from datetime import datetime, timedelta

r = redis.Redis(host='superset-redis', port=6379, db=0)

# Get Redis stats
info = r.info('stats')
hits = info.get('keyspace_hits', 0)
misses = info.get('keyspace_misses', 0)

if hits + misses > 0:
    hit_rate = hits / (hits + misses) * 100
    print(f"Cache hit rate: {hit_rate:.1f}%")
    print(f"Hits: {hits}, Misses: {misses}")
else:
    print("No cache activity yet")

Monthly Capacity Planning

Every month, review:

Cache growth: Is memory usage trending upward? If so, increase allocation or reduce TTL.
Query patterns: Are new dashboards or reports being added? Do they need caching tuning?
Hit rate: Is it stable? Declining hit rate suggests users are running more ad-hoc queries or filters are changing more frequently.
Trino load: Has caching reduced query load as expected?

Cache Invalidation Runbook

Sometimes you need to manually clear cache (e.g., after a data correction or schema change).

Scenario 1: Clear cache for a single chart

import requests

SUPERSET_URL = 'http://superset:8088'
CHART_ID = 42

# Clear the chart's cache
response = requests.delete(
    f'{SUPERSET_URL}/api/v1/charts/{CHART_ID}/cache',
    headers={'Authorization': f'Bearer {get_token()}'}
)
print(f"Cache cleared for chart {CHART_ID}: {response.status_code}")

Scenario 2: Clear cache for an entire dashboard

DASHBOARD_ID = 123

response = requests.delete(
    f'{SUPERSET_URL}/api/v1/dashboards/{DASHBOARD_ID}/cache',
    headers={'Authorization': f'Bearer {get_token()}'}
)
print(f"Cache cleared for dashboard {DASHBOARD_ID}: {response.status_code}")

Scenario 3: Clear cache for a specific datasource

If you update the underlying data source (e.g., reload a table in Trino), clear all cache entries for that datasource:

DATASOURCE_ID = 7

response = requests.delete(
    f'{SUPERSET_URL}/api/v1/datasources/{DATASOURCE_ID}/cache',
    headers={'Authorization': f'Bearer {get_token()}'}
)
print(f"Cache cleared for datasource {DATASOURCE_ID}: {response.status_code}")

Benchmarks and Real-World Performance {#benchmarks}

Here’s what you can expect when caching is configured correctly.

Benchmark 1: Simple Dashboard (5 Charts, Single Table)

Setup:

Datasource: Hive table with 100 million rows
Charts: 5 simple aggregations (SUM, COUNT, AVG)
Users: 20 concurrent
Cache TTL: 1 hour

Results (before caching):

Dashboard load time: 12 seconds (avg)
P95 latency: 18 seconds
Trino query load: 100 queries/min

Results (after caching):

Dashboard load time: 0.8 seconds (avg)
P95 latency: 1.2 seconds
Trino query load: 5 queries/min (95% reduction)
Cache hit rate: 78%

Improvement: 15× faster dashboard load, 95% fewer Trino queries.

Benchmark 2: Complex Dashboard (10 Charts, 5 Joins)

Setup:

Datasource: Iceberg tables with 500M+ rows
Charts: 10 complex queries with joins and aggregations
Query complexity: 3–8 second uncached
Users: 50 concurrent
Cache TTL: 2 hours

Results (before caching):

Dashboard load time: 45 seconds
P95 latency: 65 seconds
Trino CPU: 80% (peak)
Trino query load: 300 queries/min

Results (after caching + materialised views):

Dashboard load time: 2 seconds
P95 latency: 3 seconds
Trino CPU: 15% (peak)
Trino query load: 8 queries/min (97% reduction)
Cache hit rate: 85%

Improvement: 22× faster dashboard load, 97% fewer Trino queries.

Benchmark 3: Ad-Hoc Query Workload

Setup:

Users running exploratory queries with variable filters
Average query time: 4 seconds
Cache TTL: 5 minutes (short, because filters vary)
Users: 10 concurrent

Results (before caching):

P50 latency: 4 seconds
P95 latency: 7 seconds

Results (after caching):

P50 latency: 2.5 seconds (mix of cached and uncached)
P95 latency: 5 seconds
Cache hit rate: 40% (lower due to filter variation)

Improvement: 1.6× faster for P50, still meaningful for a variable workload.

Benchmark 4: Redis Memory and Cost

Scenario: Production Superset instance with 100 charts, 50 concurrent users, 1-hour TTL.

Measurements:

Average result set size: 3 MB per chart
Peak concurrent cache entries: 5,000 (100 charts × 50 users)
Memory used: 15 GB
Redis instance: AWS ElastiCache cache.r7g.xlarge (2 vCPU, 32 GB RAM)
Monthly cost: ~$400

ROI:

Trino compute cost reduction: 50% (from $2,000/month to $1,000/month)
Net savings: $600/month
Payback period: <1 month

Common Pitfalls and How to Avoid Them {#pitfalls}

Pitfall 1: Cache Keys Colliding

Problem: Two different queries generate the same cache key, so the second query returns stale results from the first.

Cause: Superset’s cache key generation doesn’t account for all query parameters (e.g., timezone, locale).

Solution: Ensure Superset’s cache key includes all relevant parameters. Check the official Superset caching documentation for the exact key generation logic, and test with your specific query patterns.

Pitfall 2: Memory Bloat

Problem: Redis memory usage grows unbounded and evicts old entries, reducing cache hit rate.

Cause: TTL too long, or result sets larger than expected.

Solution:

Monitor Redis memory daily
Set maxmemory-policy in Redis to allkeys-lru (evict least recently used keys when memory is full)
Reduce TTL if memory usage exceeds 70%

redis-cli CONFIG SET maxmemory-policy allkeys-lru
redis-cli CONFIG REWRITE  # Persist the setting

Pitfall 3: Stale Data

Problem: Users see outdated results because cache hasn’t expired.

Cause: TTL too long, or data source updated but cache not invalidated.

Solution:

Use shorter TTL for dashboards where freshness matters (e.g., real-time metrics)
Implement cache invalidation hooks that clear cache when data is updated
Use the Superset integration guide for Trino to understand how Trino updates propagate

Pitfall 4: Thundering Herd (Cache Stampede)

Problem: When a popular cache entry expires, many concurrent requests hit Trino simultaneously, causing a spike in query latency and CPU.

Cause: All users refresh their dashboard at the same time, cache expires, and everyone re-queries Trino.

Solution:

Implement cache warming (pre-fetch results during off-peak hours)
Use probabilistic early expiration: refresh cache before it expires if a certain threshold of requests has been served
Set different TTLs for different charts to stagger expiration

Pitfall 5: Cache Not Disabled for Real-Time Dashboards

Problem: A real-time dashboard shows stale data because results are cached for 1 hour.

Cause: Default cache TTL applied to all charts.

Solution: Disable caching per-chart for real-time dashboards:

# In chart definition
{
  "cache_timeout": 0,  # No caching
  "sql": "SELECT COUNT(*) FROM events WHERE event_time > now() - interval '1' minute"
}

Pitfall 6: Insufficient Redis Resources

Problem: Redis runs out of memory, evicts cache entries, and hit rate drops to near zero.

Cause: Underestimated cache size or sudden increase in query volume.

Solution:

Allocate 2× the calculated cache size
Monitor memory usage and scale up proactively
Use Redis Cluster or Sentinel for HA and auto-scaling

Next Steps {#next-steps}

Now that you understand Superset + Trino caching strategy, here’s your action plan:

Immediate (This Week)

Audit your current setup:
- Is caching enabled? Check superset_config.py
- What’s your current hit rate? (Use the monitoring script above)
- Which dashboards are slowest? (Analyse Superset logs)
Deploy Redis:
- If not already running, spin up a Redis instance (managed service preferred)
- Configure Superset to use it
- Run superset flask cache-clear to test
Set baseline metrics:
- Measure dashboard load times before optimisation
- Record Trino query load
- Document user complaints about slowness

Short-Term (Next 2–4 Weeks)

Tune cache TTLs:
- Start with 1-hour TTL for most dashboards
- Reduce to 15 minutes for dashboards with frequent updates
- Disable caching for real-time dashboards
Implement cache warming:
- Write the Python script (see above)
- Schedule it to run during off-peak hours
- Monitor Trino load during cache warming
Set up monitoring:
- Configure Prometheus scraping of Redis and Superset metrics
- Build a Grafana dashboard showing cache hit rate, memory usage, and query latency
- Set alerts for memory exceeding 80% or hit rate dropping below 50%

Medium-Term (1–3 Months)

Optimise query patterns:
- Identify expensive queries and consider materialised views
- Review the data engineer’s guide to Superset performance for advanced techniques
- Implement precomputed aggregations in Trino
Add query-level caching:
- Implement the patterns described above (time-series, ad-hoc, aggregations)
- A/B test different TTLs and measure impact
Consider Superset + Trino integration optimisations:
- Review the Superset integration documentation for Trino and the ilum.cloud integration guide for production tuning
- Implement connection pooling to Trino
- Use prepared statements where possible

Long-Term (3+ Months)

Evaluate materialised views:
- Build Iceberg tables for the most-queried dashboards
- Measure query time reduction (typically 10–100×)
- Implement automated refresh schedules
Scale horizontally:
- If Redis becomes a bottleneck, move to Redis Cluster
- Implement Redis Sentinel for high availability
- Consider a CDN or edge cache for Superset UI assets
Advanced techniques:
- Implement probabilistic cache expiration to prevent thundering herd
- Use Superset’s native caching plugins if available
- Explore Trino’s query result caching and coordinator-level optimisations

Getting Help

If you’re building or scaling analytics infrastructure and need hands-on support, PADISO specialises in platform engineering for analytics workloads. We’ve embedded Superset + Trino stacks across regulated industries and high-growth startups. Whether you’re in Sydney, Melbourne, Canberra, or across Australia, we can help you architect caching strategy, optimise query performance, and scale your analytics platform. We also support teams in North America and Canada with similar infrastructure challenges.

For teams in the US, our New York, Washington DC, Chicago, Austin, and Dallas offices provide localised platform engineering and data infrastructure support. Similarly, if you’re in Toronto or Ottawa, we offer platform engineering tuned to Canadian compliance and data residency requirements. And if you’re in New Zealand, we have experience building analytics platforms for ANZ-based teams.

Summary

Caching is the difference between a sluggish analytics platform and a snappy, responsive one. With Superset + Trino, the right caching strategy—Redis backend, thoughtful TTL tuning, cache warming, and operational discipline—can reduce dashboard load times by 10–20×, cut Trino compute costs by 40–60%, and deliver the sub-second interactivity that users expect.

Start with Redis, measure your baseline, tune TTLs for your specific query patterns, and monitor relentlessly. The investment in caching infrastructure pays for itself within weeks through reduced compute costs and improved user satisfaction.

Your analytics users will thank you.

Want to talk through your situation?

Book a 30-minute call with Kevin (Founder/CEO). No pitch - direct advice on what to do next.

Book a 30-min call

Apache Superset + Trino: Caching Strategy

Table of Contents

Why Caching Matters in Superset + Trino {#why-caching-matters}

Understanding the Superset + Trino Stack {#understanding-the-stack}

The Query Path

Trino’s Role

Superset’s Role

Caching Architecture and Tiers {#caching-architecture}

Tier 1: Query Result Caching (Redis/Memcached)

Tier 2: Metadata and Dashboard Cache

Tier 3: Trino Internal Cache

Tier 4: Data Warehouse Materialized Views

Configuring Redis and Memcached {#configuring-backends}

Redis (Recommended)

Memcached (Alternative)

Sizing Your Cache

Query-Level Caching Patterns {#query-level-patterns}

Pattern 1: Time-Series Dashboards with Daily Aggregations

Pattern 2: Ad-Hoc Queries with Variable Filters

Pattern 3: Heavy Aggregations and Joins

Pattern 4: Real-Time Dashboards

Dashboard Caching Strategy {#dashboard-caching}

Dashboard-Level TTL

Staggered Refresh (Cache Warming)

Partial Caching

Monitoring and Observability {#monitoring}

Key Metrics to Track

Instrumenting Superset

Prometheus Metrics

Redis Monitoring

Operational Habits and Runbooks {#operational-habits}

Daily Checks

Weekly Cache Maintenance

Monthly Capacity Planning

Cache Invalidation Runbook

Benchmarks and Real-World Performance {#benchmarks}

Benchmark 1: Simple Dashboard (5 Charts, Single Table)

Benchmark 2: Complex Dashboard (10 Charts, 5 Joins)

Benchmark 3: Ad-Hoc Query Workload

Benchmark 4: Redis Memory and Cost

Common Pitfalls and How to Avoid Them {#pitfalls}

Pitfall 1: Cache Keys Colliding

Pitfall 2: Memory Bloat

Pitfall 3: Stale Data

Pitfall 4: Thundering Herd (Cache Stampede)

Pitfall 5: Cache Not Disabled for Real-Time Dashboards

Pitfall 6: Insufficient Redis Resources

Next Steps {#next-steps}

Immediate (This Week)

Short-Term (Next 2–4 Weeks)

Medium-Term (1–3 Months)

Long-Term (3+ Months)

Getting Help

Summary

Want to talk through your situation?