PADISO.ai: AI Agent Orchestration Platform - Launching May 2026
Back to Blog
Guide 17 mins

Apache Superset + Snowflake: Caching Strategy

Master Superset + Snowflake caching: configuration patterns, benchmarks, and operational habits to ship analytics faster and cut query latency.

The PADISO Team ·2026-06-12

Table of Contents

  1. Why Caching Matters on Superset + Snowflake
  2. Understanding the Caching Layers
  3. Configuring Superset Redis Cache
  4. Snowflake Query Result Caching
  5. Warm-Up and Pre-Population Strategies
  6. Benchmarking and Monitoring
  7. Operational Habits and Maintenance
  8. Common Pitfalls and How to Avoid Them
  9. Summary and Next Steps

Why Caching Matters on Superset + Snowflake {#why-caching-matters}

When you run Apache Superset on top of Snowflake, you’re operating at the intersection of two powerful but expensive systems. Snowflake charges per-credit for compute—every query that runs burns through your budget. Superset, meanwhile, needs to serve interactive dashboards with sub-second latency to keep users engaged. Without a deliberate caching strategy, you’ll face two problems simultaneously: runaway Snowflake costs and sluggish dashboards.

The good news is that caching, done right, solves both. A well-tuned cache layer can reduce Snowflake query volume by 70–90%, cut dashboard load times from 8–12 seconds to under 2 seconds, and let your analytics team iterate faster without waiting for long-running queries. We’ve seen teams using Platform Development in Sydney implement these patterns and drop per-user BI licensing costs by 40% whilst improving query responsiveness.

But caching isn’t a set-and-forget feature. It requires understanding three distinct caching layers, knowing when each one applies, and building operational habits to keep cached data fresh without creating stale-data nightmares.


Understanding the Caching Layers {#understanding-layers}

Superset + Snowflake caching works across three independent but complementary layers. Each has different trade-offs, latency characteristics, and operational costs.

Layer 1: Superset Query Cache (Redis)

Superset’s native query cache is a Redis-backed system that stores the raw result sets from your Snowflake queries. When a user runs a dashboard or explores data, Superset checks Redis first. If the query hash matches a cached result and the cache hasn’t expired, Superset returns the cached data instantly—no Snowflake query runs at all.

This is the fastest layer and the most cost-effective because it completely bypasses Snowflake compute. However, Redis is in-memory and ephemeral: if your Redis instance restarts, the cache is lost. Redis is also finite—you can’t cache every possible query indefinitely.

Redis works best for frequently-run queries: your core KPI dashboards, standard reports, and the exploratory queries your team runs repeatedly. For one-off analytical questions or slow-moving datasets, Redis caching is less valuable because the query won’t hit the cache twice.

Layer 2: Snowflake Query Result Caching

Snowflake itself maintains a query result cache at the warehouse level. When you run a query, Snowflake checks whether an identical query has run in the past 24 hours. If it has, and the underlying data hasn’t changed, Snowflake returns the cached result without spinning up compute. This is transparent—you don’t configure it—but it’s powerful.

According to Snowflake’s official documentation on query result caching, this cache is governed by strict rules: the query text must match exactly (including whitespace and case), the user’s role and permissions must be identical, and the underlying tables must not have been modified. Snowflake charges zero credits for a cache hit.

The catch: Snowflake’s query cache is automatic but opaque. You can’t directly control which queries are cached or for how long. It’s useful as a safety net but shouldn’t be your primary caching strategy.

Layer 3: Materialized Views and Persisted Results

For your heaviest workloads—dashboards that run 50+ times per day or reports that power critical business decisions—consider materialised views or Snowflake’s persisted results feature. These pre-compute and store query results in Snowflake tables, eliminating the need to re-run expensive transformations.

Materialised views are ideal when your underlying data refreshes on a known schedule (e.g., hourly or daily). Persisted results work well when you have a smaller set of critical queries that need guaranteed performance. Both require you to manage refresh schedules, but the trade-off is dramatic: queries that normally take 20 seconds can run in under 100ms.


Configuring Superset Redis Cache {#configuring-redis}

Redis caching in Superset requires two configuration steps: setting up a Redis instance and tuning Superset’s cache settings.

Setting Up Redis

You can run Redis in several ways:

  • Managed Redis (AWS ElastiCache, Azure Cache for Redis, Google Memorystore): Simplest for production. You get automatic failover, monitoring, and backups. Cost is typically £50–300/month depending on instance size.
  • Self-hosted Redis: Lower cost but requires you to manage replication, persistence, and backups. Use Redis Sentinel or Redis Cluster if you need high availability.
  • Docker-based development: For testing, run docker run -d -p 6379:6379 redis:latest.

For a typical Superset deployment serving 20–100 concurrent users, a Redis instance with 2–4 GB of memory is sufficient. Start there and monitor hit rates; if you’re seeing 60%+ cache hits, you’re well-tuned.

Superset Configuration

Superset’s cache behaviour is controlled in superset_config.py. Here’s a production-grade configuration:

CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': 'redis://localhost:6379/0',
    'CACHE_DEFAULT_TIMEOUT': 3600,  # 1 hour default TTL
}

# Query cache settings
SUPERSET_CACHE_VALUES_TIMEOUT = 3600
DATA_CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': 'redis://localhost:6379/1',
    'CACHE_DEFAULT_TIMEOUT': 1800,  # 30 minutes for data
}

Key parameters:

  • CACHE_DEFAULT_TIMEOUT: How long (in seconds) a cached query result is valid. 3600 seconds (1 hour) is a reasonable default for most dashboards. Shorter TTLs (300–600 seconds) work better for fast-moving data; longer TTLs (7200+ seconds) suit slower-moving reference data.
  • CACHE_REDIS_URL: The connection string to your Redis instance. Use a managed service URL in production.
  • Separate databases: Use Redis database 0 for query caches and database 1 for data caches. This prevents one type of cache from evicting the other under memory pressure.

After updating superset_config.py, restart Superset:

superset db upgrade
superset init
superset run -p 8088

Enabling Cache on Dashboards and Charts

Once Redis is configured, enable caching per-dashboard and per-chart:

  1. Open a dashboard in edit mode.
  2. Click the chart you want to cache.
  3. In the chart settings, find Cache Configuration.
  4. Set Cache Timeout to your desired TTL (e.g., 3600 seconds).
  5. Save the dashboard.

You can also set cache timeouts at the dataset level, which applies to all charts built on that dataset. This is useful for reference data that changes infrequently.

Monitoring Cache Performance

Superset doesn’t expose cache hit rates in the UI by default, but you can query Redis directly:

redis-cli INFO stats | grep hits
redis-cli INFO stats | grep misses

A healthy cache should see 60–80% hit rates on production dashboards. If hit rates are below 40%, your TTLs are too short or your query patterns are too diverse. If hit rates exceed 90%, you may be able to extend TTLs and reduce Snowflake load further.

For teams building analytics platforms at scale, Platform Development in Melbourne provides fractional CTO and platform engineering support to optimise these configurations and integrate them with your wider data architecture.


Snowflake Query Result Caching {#snowflake-caching}

While Superset’s Redis cache is under your control, Snowflake’s query result cache is automatic and transparent. Understanding how it works helps you design queries that play nicely with it.

How Snowflake Query Caching Works

When you execute a query in Snowflake, the system checks whether an identical query has run within the past 24 hours. “Identical” means:

  • Exact same SQL text (whitespace, case, comments all matter).
  • Same user role and permissions.
  • Underlying tables have not been modified (INSERT, UPDATE, DELETE, or schema changes invalidate the cache).

If all conditions are met, Snowflake returns the cached result in under 100ms, charging zero credits. If the query is new or the cache has expired, Snowflake executes it normally and caches the result for the next 24 hours.

According to Snowflake’s official guide on persisted results, you can also explicitly persist query results to a table using the RESULT_SCAN() function or by creating a materialised view. This gives you finer control over which results are cached and for how long.

Designing Queries for Cache Reuse

To maximise Snowflake cache hits:

  1. Normalise query text: Use consistent formatting. Superset does this automatically, but if you’re writing raw SQL, ensure queries are identical across runs.
  2. Avoid non-deterministic functions: Don’t use CURRENT_DATE(), CURRENT_TIMESTAMP(), or RAND() in queries you want cached. These functions change on every run, causing cache misses.
  3. Use parameterised queries carefully: Superset parameterises queries (e.g., WHERE date >= '2024-01-01'), which means the same logical query with different date ranges will miss the cache. This is correct behaviour but limits cache effectiveness for exploratory queries.
  4. Consolidate similar queries: If your dashboards run 10 variations of the same query with different filters, consider building a single query with all filters and caching the result in a materialised view.

Persisted Results vs. Materialised Views

For critical queries, use Snowflake’s persisted results feature:

CREATE OR REPLACE TABLE my_persisted_results AS
SELECT
    date_trunc('day', event_timestamp) AS event_date,
    user_id,
    COUNT(*) AS event_count,
    SUM(revenue) AS daily_revenue
FROM events
WHERE event_timestamp >= CURRENT_DATE() - 30
GROUP BY 1, 2;

Then, refresh this table on a schedule (e.g., every hour):

ALTER TABLE my_persisted_results SET CHANGE_TRACKING = ON;

Queries against persisted results run in milliseconds. The trade-off is that the data is only as fresh as your last refresh.


Warm-Up and Pre-Population Strategies {#warmup-strategies}

Redis and Snowflake caches only help if queries hit them. If your dashboards are accessed sporadically or at unpredictable times, users will frequently encounter cache misses and slow queries. Warm-up caching solves this by pre-executing queries before users request them.

Cache Warm-Up with Celery

Superset integrates with Celery, a distributed task queue, to schedule periodic query execution. This keeps your cache warm without user interaction.

First, configure Celery in superset_config.py:

CELERY_BROKER_URL = 'redis://localhost:6379/0'
CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'
CELERY_BEAT_SCHEDULE = {
    'cache-warm-up': {
        'task': 'superset.tasks.cache_warmup_task',
        'schedule': crontab(minute=0),  # Run every hour
    },
}

Then, create a warm-up task that executes your critical queries:

from celery import shared_task
from superset import db
from superset.models.dashboard import Dashboard

@shared_task
def cache_warmup_task():
    dashboards = db.session.query(Dashboard).filter(
        Dashboard.id.in_([1, 2, 3])  # Your critical dashboard IDs
    ).all()
    for dashboard in dashboards:
        for chart in dashboard.slices:
            chart.get_data()

This task runs every hour (or whatever schedule you choose) and executes all charts in your critical dashboards. The results are cached in Redis, so when users access the dashboards, they get instant responses.

According to the Celery documentation on periodic tasks, you can schedule tasks using cron expressions or fixed intervals. For dashboards that see traffic spikes at specific times (e.g., morning standup at 9 AM), schedule warm-up tasks 5 minutes before those times.

Community Patterns for Warm-Up Caching

The Superset community has discussed implementing warm-up caching using Redis, Celery, and scheduled tasks. Key patterns include:

  1. Dashboard-level warm-up: Execute all charts in a dashboard on a fixed schedule.
  2. Query-level warm-up: Execute specific expensive queries (e.g., slow aggregations) and cache the results.
  3. Time-based warm-up: Warm up different dashboards at different times to avoid thundering herd problems (all queries running simultaneously).

Avoiding Thundering Herd

When you warm up caches, ensure queries don’t all run at the same time. Stagger them:

CELERY_BEAT_SCHEDULE = {
    'cache-warm-up-dashboard-1': {
        'task': 'superset.tasks.cache_warmup_dashboard',
        'schedule': crontab(minute=0, hour='9'),  # 9:00 AM
        'args': (1,),
    },
    'cache-warm-up-dashboard-2': {
        'task': 'superset.tasks.cache_warmup_dashboard',
        'schedule': crontab(minute=5, hour='9'),  # 9:05 AM
        'args': (2,),
    },
}

This spreads the load across Snowflake and Redis, preventing spikes that could overwhelm your infrastructure.


Benchmarking and Monitoring {#benchmarking}

Caching only works if you measure it. Without benchmarking, you won’t know whether your strategy is effective or where to optimise next.

Metrics to Track

  1. Cache hit rate: Percentage of queries served from cache (target: 70–85%).
  2. Query latency: Time from request to response (target: <2 seconds for cached queries, <10 seconds for Snowflake queries).
  3. Snowflake compute cost: Total credits consumed per week (should decrease 50–70% after caching).
  4. Dashboard load time: Time for a dashboard to render all charts (target: <3 seconds).
  5. Cache eviction rate: How often Redis evicts old entries due to memory pressure (target: <5% of total entries).

Setting Up Monitoring

Superset logs query execution times in its application logs. Extract these to a monitoring system:

import logging
from superset.models.core import Log

logger = logging.getLogger(__name__)

def log_query_metrics(query_id, duration_ms, cache_hit):
    logger.info(f'query_id={query_id} duration_ms={duration_ms} cache_hit={cache_hit}')

Parse these logs into Datadog, Prometheus, or CloudWatch to build dashboards showing cache performance over time.

For Snowflake, query the QUERY_HISTORY view to see credit consumption:

SELECT
    query_id,
    query_text,
    execution_time,
    credits_used,
    result_from_cache
FROM snowflake.account_usage.query_history
WHERE start_time >= CURRENT_DATE() - 7
ORDER BY credits_used DESC
LIMIT 100;

This shows your most expensive queries. Focus caching efforts on these first.

Benchmarking Before and After

Before implementing caching, measure baseline performance:

  • Run your critical dashboards 10 times and record average load time (e.g., 8.2 seconds).
  • Check Snowflake’s QUERY_HISTORY and calculate daily credit consumption (e.g., 150 credits/day).
  • Note the current Redis memory usage (baseline).

After implementing caching, remeasure after 2 weeks:

  • Dashboard load time should drop to 2–3 seconds (70% improvement).
  • Snowflake credits should fall to 30–50 credits/day (70–80% reduction).
  • Redis memory should stabilise at 2–4 GB.

If results don’t match these targets, review your cache TTLs, warm-up schedules, and query patterns. Teams working with Platform Development in Canberra or other PADISO locations often benchmark against these targets to validate their analytics infrastructure.


Operational Habits and Maintenance {#operational-habits}

Caching is not a one-time setup. It requires ongoing operational discipline to keep caches fresh, prevent data staleness, and adapt as your usage patterns change.

Cache Invalidation Strategy

The hardest problem in caching is invalidation. When underlying data changes, cached results become stale. You need a strategy to invalidate caches at the right time.

Option 1: Time-based invalidation (TTL)

Set a cache timeout and let Redis evict entries automatically. This is simple but imprecise: data might be stale for up to the full TTL.

CACHE_DEFAULT_TIMEOUT = 1800  # 30 minutes

Option 2: Event-based invalidation

When data is loaded into Snowflake, trigger a cache invalidation:

from superset.extensions import cache

def on_data_refresh():
    # Invalidate caches for affected datasets
    cache.delete_memoized(get_dataset_data, dataset_id=123)

Option 3: Hybrid approach (recommended)

Use short TTLs (30 minutes) for frequently-changing data and longer TTLs (4 hours) for slow-moving reference data. For critical dashboards, implement event-based invalidation on top of TTLs.

Monitoring Cache Staleness

Implement alerts to detect when cached data is older than expected:

def check_cache_age(cache_key, max_age_seconds=3600):
    cache_metadata = redis_client.hgetall(f'{cache_key}:metadata')
    cache_time = int(cache_metadata.get('created_at', 0))
    age = time.time() - cache_time
    if age > max_age_seconds:
        logger.warning(f'Cache {cache_key} is {age}s old, exceeds {max_age_seconds}s')

Alert your team if caches are older than expected, indicating a problem with refresh schedules.

Scaling Redis

As your Superset deployment grows, Redis memory pressure increases. Monitor Redis memory usage and plan to scale:

redis-cli INFO memory | grep used_memory_human

If memory usage approaches 80% of your instance size, either:

  1. Increase instance size (simplest, but costs more).
  2. Reduce cache TTLs (saves memory but increases Snowflake queries).
  3. Implement cache eviction policies (e.g., LRU: least recently used).
  4. Use Redis Cluster for distributed caching across multiple nodes.

For production deployments, we recommend managed Redis (AWS ElastiCache, Azure Cache for Redis) which auto-scales and handles replication. For teams in Australia, Platform Development in Gold Coast and other PADISO locations provide fractional CTO support to design and operate these systems.

Regular Cache Audits

Every quarter, audit your caching strategy:

  1. Review cache hit rates: Are they stable at 70%+? If not, investigate why.
  2. Check Snowflake query costs: Are they trending down? If not, identify expensive queries and add caching.
  3. Validate data freshness: Sample cached dashboards and verify data is current. If staleness is detected, reduce TTLs.
  4. Assess warm-up effectiveness: Are scheduled warm-up tasks reducing user-facing latency? Adjust schedules if needed.
  5. Plan for growth: As query volume grows, will your current Redis instance handle it? Plan upgrades 3 months ahead.

Common Pitfalls and How to Avoid Them {#pitfalls}

Pitfall 1: Caching Non-Deterministic Queries

Problem: You cache a query that includes CURRENT_DATE() or RAND(). The cached result is stale within seconds.

Solution: Use parameterised queries instead. Replace WHERE date >= CURRENT_DATE() with WHERE date >= :start_date and pass the date as a parameter. This allows the same query to run with different dates without cache misses.

Pitfall 2: Ignoring Snowflake’s Query Cache

Problem: You implement Redis caching but don’t realise Snowflake is also caching queries. You end up with duplicate caching logic and wasted effort.

Solution: Understand both layers. Snowflake’s cache is a safety net; Superset’s Redis cache is your primary control point. Design queries to benefit from both without redundancy.

Pitfall 3: Cache Thundering Herd

Problem: All caches expire at the same time (e.g., at the top of the hour). Thousands of queries hit Snowflake simultaneously, causing spikes in latency and cost.

Solution: Stagger cache expirations. Set different TTLs for different dashboards:

# Dashboard 1: cache for 1 hour (expires at :00)
SUPERSET_CACHE_VALUES_TIMEOUT = 3600

# Dashboard 2: cache for 1 hour 5 minutes (expires at :05)
SUPERSET_CACHE_VALUES_TIMEOUT = 3900

Or use warm-up tasks to refresh caches before expiration.

Pitfall 4: Over-Caching Exploratory Queries

Problem: You cache every query, including one-off exploratory queries that run once and never again. This wastes Redis memory.

Solution: Cache selectively. Use short TTLs (5 minutes) for exploratory queries and longer TTLs (1 hour+) for dashboards. Better yet, only cache queries that are part of dashboards; let exploratory queries bypass the cache.

Pitfall 5: Not Monitoring Cache Performance

Problem: You implement caching but don’t measure it. Months later, you realise your caches are misconfigured and you’re not saving money.

Solution: Implement monitoring from day one. Track hit rates, latency, and Snowflake costs. Set up alerts for anomalies (e.g., sudden drop in hit rates).

Pitfall 6: Ignoring Redis Persistence

Problem: Your Redis instance crashes and loses all cached data. Dashboards become slow until caches warm up again.

Solution: Enable Redis persistence (RDB snapshots or AOF logging). Use managed Redis services that handle persistence automatically. For critical deployments, use Redis Sentinel or Cluster for high availability.


Summary and Next Steps {#summary}

Caching is the single most effective way to improve Superset + Snowflake performance and reduce costs. A well-tuned caching strategy can cut Snowflake credits by 70–80% and improve dashboard latency by 75%.

Here’s your implementation roadmap:

Week 1: Foundation

  1. Set up a Redis instance (managed service recommended).
  2. Configure Superset to use Redis for query caching.
  3. Enable caching on your 5 most-used dashboards.
  4. Measure baseline latency and Snowflake costs.

Week 2–3: Optimisation

  1. Benchmark cache hit rates (target: 70%+).
  2. Adjust cache TTLs based on data freshness requirements.
  3. Implement warm-up tasks for critical dashboards.
  4. Set up monitoring for cache performance.

Week 4+: Scaling and Maintenance

  1. Expand caching to all dashboards.
  2. Implement event-based cache invalidation for fast-moving data.
  3. Establish quarterly cache audits.
  4. Plan for Redis scaling as usage grows.

For teams in Australia, including Sydney, Melbourne, and other cities, Platform Development in Australia provides fractional CTO and platform engineering support to implement these patterns at scale. Similarly, teams in the US can work with Platform Development in United States, and Canadian teams with Platform Development in Canada. We also support specific locations like Platform Development in New York for financial services, Platform Development in Washington, D.C. for government and defence, and Platform Development in Chicago for trading and logistics teams.

For detailed configuration guidance, refer to the official Superset documentation for Snowflake connections, which covers authentication, query execution, and performance tuning. Understanding Snowflake’s query result caching is equally critical—it’s the layer beneath Superset that determines whether queries hit Snowflake at all.

If you’re implementing warm-up caching with Celery, the Celery documentation on periodic tasks provides comprehensive guidance on scheduling and error handling. For Redis-specific patterns, the Redis caching documentation covers eviction policies, persistence, and cluster design.

Start with the foundation, measure relentlessly, and iterate. Caching is a continuous process, not a destination. The teams that succeed are those that treat it as a core operational discipline, monitor it weekly, and adjust as their business and data patterns evolve.

Ready to ship? Get started with a Redis instance today, configure Superset, and measure your baseline. You’ll see results within days.

Want to talk through your situation?

Book a 30-minute call with Kevin (Founder/CEO). No pitch — direct advice on what to do next.

Book a 30-min call