Table of Contents
- Understanding Memory Pressure in Superset
- Diagnosis: Identifying Memory Bottlenecks
- Gunicorn and Worker Configuration
- Query Caching and Redis Strategy
- Async Query Execution
- Database Backend Tuning
- Dashboard and Chart Optimisation
- Monitoring and Alerting
- Real-World Configuration Patterns
- Next Steps and Scaling
Understanding Memory Pressure in Superset {#understanding-memory-pressure}
Apache Superset is a powerful open-source data visualisation and business intelligence platform, but it demands careful operational attention when deployed at scale. Memory pressure—the condition where available RAM becomes a bottleneck—is one of the most common production challenges we encounter across our customer engagements at PADISO. Unlike CPU saturation, which tends to degrade gracefully, memory pressure can cause cascading failures: OOM (Out of Memory) kills, eviction of critical cache entries, and sudden dashboard timeouts that frustrate users and damage confidence in your analytics platform.
When we work with teams modernising their data platforms or embedding analytics into SaaS products, we often inherit Superset deployments that were tuned for development or small pilot cohorts, then hit a wall as query volume or data size grows. The root causes are usually predictable:
- Large result sets: Queries that return millions of rows, held in memory while the dashboard renders.
- Concurrent dashboard loads: Multiple users opening the same heavy dashboard simultaneously, each spawning fresh query workers.
- Unoptimised caching: Cache misses on frequently-accessed charts, forcing repeated expensive computation.
- Bloated chart metadata: Charts with hundreds of fields or overly complex filter logic.
- Undersized Gunicorn workers: Too many processes competing for RAM, or workers that aren’t recycled frequently enough.
This guide reflects real configuration patterns and operational habits we apply on customer engagements. We’ll walk through diagnosis, configuration tuning, caching strategy, async query execution, and monitoring—with concrete numbers and examples throughout.
Diagnosis: Identifying Memory Bottlenecks {#diagnosis}
Observing OOM and Process Restarts
The first sign of memory pressure is usually a Superset pod or process restarting unexpectedly. In Kubernetes, you’ll see OOMKilled in the pod status. In traditional VMs, the system log will show kernel OOM killer events. Check your application logs for entries like:
MemoryError: Unable to allocate X.XX GiB for an array
FATAL: out of memory
Detected OOM event in pod superset-web-1234
These errors often spike after a user opens a particularly heavy dashboard or runs a long-running query.
Baseline Memory Profiling
Before tuning, establish a baseline. Use standard Linux tools to observe memory usage under realistic load:
# On the Superset container/VM:
watch -n 1 'free -h && echo "---" && ps aux | grep gunicorn | head -5'
Note the resident set size (RSS) of each Gunicorn worker process. If you’re running 8 workers and each is consuming 500 MB, you’re at 4 GB just for the web tier—before caching, database connections, and OS overhead.
Identifying Expensive Queries
Enable slow query logging on your database backend (PostgreSQL, MySQL, Snowflake, etc.) to identify which charts or dashboards are driving the heaviest workloads. In PostgreSQL, set log_min_duration_statement to catch queries taking longer than 1 second:
ALTER SYSTEM SET log_min_duration_statement = 1000; -- 1 second in milliseconds
SELECT pg_reload_conf();
Then analyse the slow query log to find patterns: queries returning 10M+ rows, queries with expensive joins or window functions, or queries executed on every page load without caching.
Memory Profiling Tools
For deeper analysis, use Python profiling tools. Add a temporary memory profiler to your Superset instance:
pip install memory-profiler
Then decorate expensive functions in your custom Superset code (or use a middleware wrapper) to track memory allocation. This helps identify whether memory is being consumed by query result sets, chart rendering, or metadata operations.
Gunicorn and Worker Configuration {#gunicorn-workers}
Gunicorn is the WSGI application server that runs Superset. Its worker configuration is the first lever to pull when tackling memory pressure. The default configuration is often too aggressive for memory-constrained environments.
Worker Count and Process Model
Superset’s default Gunicorn configuration spawns workers based on CPU count, but memory is the real constraint. A common pattern is:
workers = (2 * CPU_count) + 1
This rule works well for I/O-bound services with small memory footprints. Superset, however, can allocate significant memory per worker because each worker maintains database connection pools, caches, and query result buffers.
Recommendation: Start with a lower worker count and scale up only if CPU utilisation remains below 60%. For a 4-core machine with 8 GB RAM, we typically recommend 2–3 workers rather than the default 9.
# In your docker-compose.yml or Kubernetes deployment:
gunicorn \
--bind 0.0.0.0:8088 \
--workers 3 \
--worker-class gevent \
--worker-connections 1000 \
--timeout 300 \
--max-requests 1000 \
--max-requests-jitter 100 \
superset.app:create_app()
Worker Recycling and Memory Leaks
Even well-written code can leak memory over time. Python’s garbage collector is generally reliable, but long-lived processes can accumulate objects that aren’t properly freed. Gunicorn’s --max-requests and --max-requests-jitter settings gracefully recycle workers after handling a certain number of requests:
--max-requests 1000: Recycle each worker after 1000 requests.--max-requests-jitter 100: Add a random jitter (0–100) to avoid all workers restarting simultaneously.
This means each worker will live 1000–1100 requests before being replaced with a fresh process. The jitter prevents a thundering herd of process restarts.
Connection Pooling
Each Gunicorn worker maintains a pool of database connections. If you have 3 workers and each opens 5 connections, that’s 15 active connections to your database. This is usually fine, but verify that your database can handle the load:
# In superset_config.py:
SQLALCHEMY_ENGINE_OPTIONS = {
'pool_size': 5,
'max_overflow': 10,
'pool_recycle': 3600, # Recycle connections every hour
'pool_pre_ping': True, # Verify connection health before use
}
pool_pre_ping adds a small overhead but prevents stale connection errors, which can cause worker crashes.
Worker Class Selection
Gunicorn supports multiple worker classes. The default is sync, which handles one request per worker at a time. For I/O-bound workloads (waiting on database queries), gevent is more efficient:
gunicorn --worker-class gevent --worker-connections 1000 ...
Gevent uses coroutines to multiplex many concurrent requests across fewer OS threads, reducing memory overhead per concurrent user. However, gevent requires monkey-patching and isn’t compatible with all libraries. Test thoroughly before deploying to production.
Query Caching and Redis Strategy {#redis-caching}
Caching is the single most effective lever for reducing memory pressure in Superset. Without caching, every dashboard load triggers a fresh query to your database, consuming memory for the result set and CPU for computation. With intelligent caching, repeated dashboard views hit Redis instead of your database.
Understanding Superset’s Cache Layers
Superset uses multiple cache layers:
- Query cache: Results of SQL queries, keyed by query hash + parameters.
- Chart cache: Rendered chart data (JSON), keyed by chart ID + filter state.
- Metadata cache: Table and column definitions, keyed by datasource ID.
Each layer can be backed by Redis, Memcached, or the filesystem (not recommended for production).
Redis Configuration for Superset
Configure Redis as your cache backend in superset_config.py:
CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_REDIS_URL': 'redis://redis-host:6379/0',
'CACHE_DEFAULT_TIMEOUT': 3600, # 1 hour default
'CACHE_KEY_PREFIX': 'superset_',
}
DATA_CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_REDIS_URL': 'redis://redis-host:6379/1',
'CACHE_DEFAULT_TIMEOUT': 600, # 10 minutes for query results
'CACHE_KEY_PREFIX': 'superset_data_',
}
Use separate Redis databases (0, 1, 2) for different cache types to isolate eviction policies and monitoring.
Cache TTL (Time-To-Live) Strategy
The default 3600-second (1 hour) TTL is often too long for fast-moving data and too short for expensive queries. Tailor TTL per datasource or dashboard:
- Real-time operational dashboards (order status, live metrics): 30–60 seconds.
- Daily business dashboards (sales, inventory): 300–600 seconds (5–10 minutes).
- Weekly or monthly reports: 3600+ seconds (1+ hour).
- Metadata and dimension tables: 86400 seconds (24 hours).
Set cache TTL at the chart level in Superset’s UI, or programmatically:
# In a custom Superset plugin or extension:
chart.cache_timeout = 300 # 5 minutes for this chart
db.session.commit()
Memory-Aware Redis Configuration
Redis itself can become a memory bottleneck if not configured carefully. Set a maximum memory policy:
maxmemory 4gb
maxmemory-policy allkeys-lru
This tells Redis to evict least-recently-used keys when it hits 4 GB, preventing OOM kills. Monitor Redis memory usage:
redis-cli INFO memory
Look for used_memory_rss and evicted_keys. If evictions are high, either increase Redis capacity or reduce cache TTLs.
Cache Warming and Preloading
For critical dashboards, preload the cache during off-peak hours. Create a Celery task that runs at 2 AM and loads all charts on your executive dashboard:
from celery import shared_task
from superset.models.dashboard import Dashboard
@shared_task
def warm_dashboard_cache(dashboard_id):
dashboard = db.session.query(Dashboard).get(dashboard_id)
for slc in dashboard.slices:
# Trigger chart query to populate cache
from superset.utils.cache import get_chart_cache_key
cache_key = get_chart_cache_key(slc, {})
if not cache.get(cache_key):
# Run the query
slc.get_json(force=True)
Schedule this task in your Celery beat configuration to run before business hours.
Async Query Execution {#async-queries}
Large queries that take 10+ seconds to execute can block Gunicorn workers, starving other users’ requests. Superset’s async query feature offloads expensive queries to a background worker pool, returning control to the user immediately while the query runs asynchronously.
Enabling Async Queries
Async queries require Celery and a result backend (Redis or RabbitMQ). Configure in superset_config.py:
from celery.schedules import crontab
# Enable async queries
SUPERSET_ASYNC_QUERY_JOB_TIMEOUT = 3600 # 1 hour max query time
SUPERSET_ASYNC_QUERIES_ENABLED = True
# Celery configuration
CELERY_BROKER_URL = 'redis://redis-host:6379/2'
CELERY_RESULT_BACKEND = 'redis://redis-host:6379/3'
CELERY_TASK_TRACK_STARTED = True
CELERY_TASK_TIME_LIMIT = 3600 * 1000 # 1 hour in milliseconds
Then start Celery workers:
celery -A superset.tasks worker --loglevel=info --concurrency=4
Query Timeout and Memory Limits
Async queries can still consume memory on the worker, especially if they return large result sets. Set per-query timeout and memory limits:
# In superset_config.py:
SUPERSET_SQLLAB_ASYNC_TIME_LIMIT_SEC = 300 # 5 minutes
SUPERSET_SQLLAB_TIMEOUT = 300
For Kubernetes deployments, set resource limits on Celery worker pods:
apiVersion: v1
kind: Pod
metadata:
name: superset-celery-worker
spec:
containers:
- name: celery
image: superset:latest
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi" # Kill the pod if it exceeds 2GB
cpu: "1000m"
Celery Worker Scaling
Start with 2–4 Celery workers and monitor queue depth. If tasks are backing up, add more workers. Each worker should have its own memory budget (1–2 GB) to avoid contention.
# Monitor Celery queue
celery -A superset.tasks inspect active
celery -A superset.tasks inspect reserved
If the reserved queue grows faster than tasks complete, you need more workers or longer timeouts.
Database Backend Tuning {#database-tuning}
Superset’s memory pressure is often amplified by inefficient database queries or undersized database servers. Tuning your database backend can dramatically reduce memory consumption in Superset itself.
PostgreSQL Memory Settings
If you’re using PostgreSQL as your data warehouse, configure memory-related settings. According to the PostgreSQL documentation on resource consumption, the most important settings are:
-- In postgresql.conf or via ALTER SYSTEM:
ALTER SYSTEM SET shared_buffers = '2GB'; -- 25% of total RAM
ALTER SYSTEM SET effective_cache_size = '6GB'; -- 50–75% of total RAM
ALTER SYSTEM SET work_mem = '50MB'; -- Per-operation memory for sorts, hash tables
ALTER SYSTEM SET maintenance_work_mem = '512MB'; -- For ANALYZE, CREATE INDEX
ALTER SYSTEM SET max_parallel_workers_per_gather = 4; -- Parallel query execution
SELECT pg_reload_conf();
These settings allow PostgreSQL to cache frequently-accessed data in memory, reducing disk I/O and thus reducing the time Superset workers spend waiting (and holding memory).
Query Plan Analysis
Use EXPLAIN ANALYZE to understand why queries are expensive:
EXPLAIN ANALYZE
SELECT user_id, COUNT(*) as order_count
FROM orders
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY user_id;
Look for sequential scans on large tables (should be index scans) or high memory estimates. Add indexes on frequently-filtered columns:
CREATE INDEX idx_orders_created_at ON orders(created_at);
Materialized Views and Pre-aggregation
For dashboards that repeatedly aggregate the same data, use PostgreSQL materialized views to pre-compute results:
CREATE MATERIALIZED VIEW daily_sales AS
SELECT DATE(created_at) as sale_date, SUM(amount) as total_sales
FROM orders
GROUP BY DATE(created_at);
CREATE INDEX idx_daily_sales_date ON daily_sales(sale_date);
Then refresh the view during off-peak hours:
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales;
Dashboard and Chart Optimisation {#dashboard-optimisation}
Many memory issues originate not in Superset’s infrastructure but in how dashboards and charts are designed. Overly complex dashboards can trigger dozens of queries simultaneously, each holding memory.
Reduce Chart Count Per Dashboard
A dashboard with 50 charts might seem useful, but it forces Superset to execute 50 queries on page load. If each query returns 100K rows and holds 10 MB in memory, that’s 500 MB consumed just for that one user’s dashboard load.
Recommendation: Limit dashboards to 8–12 charts. If you need more visualisations, split them across multiple dashboards or use tabs.
Chart-Level Caching
Set cache TTL at the chart level based on data freshness requirements. In Superset’s UI, go to Chart Settings → Caching → Cache Timeout. Set values like:
- KPI cards (revenue, user count): 60 seconds.
- Trend charts (sales over time): 300 seconds.
- Dimension tables (product list, category breakdown): 3600 seconds.
Filter Optimisation
Filters on dashboards are powerful but can amplify memory pressure if not designed carefully. When a user changes a filter, Superset re-executes all affected charts. If you have 20 charts all listening to the same date filter, that’s 20 new queries.
Use native filters instead of SQL filters where possible. Native filters are evaluated by Superset, not the database, reducing query overhead:
# In dashboard JSON:
"native_filters": [
{
"id": "FILTER_YEAR",
"name": "Year",
"type": "filter_select",
"configuration": {
"source": "DATASET",
"dataset": { "id": 1 },
"column": "year"
}
}
]
Row Limit and Pagination
Set row limits on charts to prevent loading millions of rows. In chart settings:
# In chart JSON:
"query_options": {
"row_limit": 10000, # Max rows returned
}
For tables that need to show more rows, implement server-side pagination:
# In a custom Superset extension:
class PaginatedTableChart(BaseChart):
def get_data(self, query, offset=0, limit=1000):
return query.offset(offset).limit(limit).all()
Monitoring and Alerting {#monitoring}
Production memory issues are easiest to prevent if you catch them early. Implement comprehensive monitoring and alerting.
Key Metrics to Track
- Gunicorn worker memory (RSS): Alert if any worker exceeds 1 GB.
- Redis memory usage: Alert if evictions exceed 100/minute.
- Celery queue depth: Alert if reserved queue grows unbounded.
- Database connection pool exhaustion: Alert if active connections > 80% of pool size.
- Query execution time: Alert if 95th percentile query time exceeds 30 seconds.
- Cache hit ratio: Track cache hits vs. misses; aim for >70% hit ratio on query cache.
Prometheus Metrics
If you’re using Prometheus, export Superset metrics:
# In superset_config.py:
from prometheus_client import Counter, Histogram, Gauge
query_duration = Histogram('superset_query_duration_seconds', 'Query execution time')
cache_hits = Counter('superset_cache_hits_total', 'Cache hits')
cache_misses = Counter('superset_cache_misses_total', 'Cache misses')
worker_memory = Gauge('superset_worker_memory_bytes', 'Worker RSS memory', ['worker_id'])
Then scrape these metrics in your Prometheus config and visualise in Grafana.
Alerting Rules
Create Prometheus alert rules:
groups:
- name: superset
rules:
- alert: SupsetWorkerOOM
expr: superset_worker_memory_bytes > 1e9 # 1 GB
for: 5m
annotations:
summary: "Superset worker {{ $labels.worker_id }} approaching OOM"
- alert: SupsetCacheMissRate
expr: |
rate(superset_cache_misses_total[5m]) /
(rate(superset_cache_hits_total[5m]) + rate(superset_cache_misses_total[5m])) > 0.3
for: 10m
annotations:
summary: "Superset cache miss rate >30%, check cache configuration"
Log Aggregation
Use a centralized log aggregator (ELK, Datadog, Splunk) to track OOM events and memory-related errors:
# Search for memory errors across all Superset logs
index=superset ("MemoryError" OR "OOMKilled" OR "out of memory")
Set up alerts to notify on-call engineers immediately when OOM events occur.
Real-World Configuration Patterns {#real-world-patterns}
Here are concrete configurations we apply on customer engagements, tailored to different deployment scales.
Small Deployment (1–10 concurrent users)
Infrastructure: Single 4-core, 8 GB VM.
# docker-compose.yml
services:
superset:
image: apache/superset:latest
environment:
SUPERSET_LOAD_EXAMPLES: 'no'
PYTHONUNBUFFERED: '1'
command: |
gunicorn \
--bind 0.0.0.0:8088 \
--workers 2 \
--worker-class gevent \
--worker-connections 500 \
--timeout 120 \
--max-requests 500 \
--max-requests-jitter 50 \
superset.app:create_app()
ports:
- "8088:8088"
depends_on:
- postgres
- redis
redis:
image: redis:7-alpine
command: redis-server --maxmemory 1gb --maxmemory-policy allkeys-lru
postgres:
image: postgres:15-alpine
environment:
POSTGRES_PASSWORD: superset
POSTGRES_DB: superset
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:
superset_config.py:
SUPERSET_ASYNC_QUERIES_ENABLED = False # Skip async for small deployments
CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_REDIS_URL': 'redis://redis:6379/0',
'CACHE_DEFAULT_TIMEOUT': 600,
}
SQLALCHEMY_ENGINE_OPTIONS = {
'pool_size': 3,
'max_overflow': 5,
}
Medium Deployment (10–100 concurrent users)
Infrastructure: 3 Superset pods (2 CPU, 2 GB each), 1 Redis pod (4 GB), 1 PostgreSQL pod (8 GB).
# Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: superset-web
spec:
replicas: 3
selector:
matchLabels:
app: superset-web
template:
metadata:
labels:
app: superset-web
spec:
containers:
- name: superset
image: apache/superset:latest
ports:
- containerPort: 8088
env:
- name: SUPERSET_LOAD_EXAMPLES
value: 'no'
- name: REDIS_HOST
value: superset-redis
- name: POSTGRES_HOST
value: superset-postgres
resources:
requests:
memory: "1.5Gi"
cpu: "1000m"
limits:
memory: "2Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /health
port: 8088
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8088
initialDelaySeconds: 10
periodSeconds: 5
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: superset-celery-worker
spec:
replicas: 2
selector:
matchLabels:
app: superset-celery
template:
metadata:
labels:
app: superset-celery
spec:
containers:
- name: celery
image: apache/superset:latest
command: ["celery", "-A", "superset.tasks", "worker", "--loglevel=info", "--concurrency=2"]
env:
- name: CELERY_BROKER_URL
value: redis://superset-redis:6379/2
- name: CELERY_RESULT_BACKEND
value: redis://superset-redis:6379/3
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "1.5Gi"
cpu: "1000m"
superset_config.py:
SUPERSET_ASYNC_QUERIES_ENABLED = True
SUPERSET_ASYNC_QUERY_JOB_TIMEOUT = 600 # 10 minutes
CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_REDIS_URL': 'redis://superset-redis:6379/0',
'CACHE_DEFAULT_TIMEOUT': 600,
'CACHE_KEY_PREFIX': 'superset_',
}
DATA_CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_REDIS_URL': 'redis://superset-redis:6379/1',
'CACHE_DEFAULT_TIMEOUT': 300,
'CACHE_KEY_PREFIX': 'superset_data_',
}
CELERY_BROKER_URL = 'redis://superset-redis:6379/2'
CELERY_RESULT_BACKEND = 'redis://superset-redis:6379/3'
SQLALCHEMY_ENGINE_OPTIONS = {
'pool_size': 5,
'max_overflow': 10,
'pool_recycle': 3600,
'pool_pre_ping': True,
}
Large Deployment (100+ concurrent users)
At this scale, consider a data warehouse like ClickHouse or Snowflake instead of PostgreSQL. Superset integrates well with both, and they’re designed for analytical workloads. We often embed Superset analytics into SaaS platforms for customers across our Platform Development in Sydney, Platform Development in Melbourne, and Platform Development in Australia engagements, where ClickHouse provides the necessary performance and scalability.
Infrastructure: 5+ Superset pods (2 CPU, 2 GB each), 2 Redis pods (8 GB each, with replication), 1 ClickHouse cluster (3 nodes, 16 GB each).
# superset_config.py for large scale
SUPERSET_ASYNC_QUERIES_ENABLED = True
SUPERSET_ASYNC_QUERY_JOB_TIMEOUT = 1800 # 30 minutes
SUPERSET_SQLLAB_ASYNC_TIME_LIMIT_SEC = 1800
# Multiple Redis nodes for HA
CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_REDIS_URL': 'redis-sentinel://sentinel-1:26379,sentinel-2:26379,sentinel-3:26379/mymaster/0',
'CACHE_DEFAULT_TIMEOUT': 600,
'CACHE_KEY_PREFIX': 'superset_',
}
DATA_CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_REDIS_URL': 'redis-sentinel://sentinel-1:26379,sentinel-2:26379,sentinel-3:26379/mymaster/1',
'CACHE_DEFAULT_TIMEOUT': 300,
'CACHE_KEY_PREFIX': 'superset_data_',
}
# ClickHouse as data warehouse
DATABASES = {
'clickhouse': {
'engine': 'clickhouse',
'sqlalchemy_uri': 'clickhouse+http://clickhouse-1:8123/default',
'cache_timeout': 3600,
}
}
SQLALCHEMY_ENGINE_OPTIONS = {
'pool_size': 10,
'max_overflow': 20,
'pool_recycle': 3600,
'pool_pre_ping': True,
}
For teams building enterprise data platforms with Superset at their core, we often handle the full architecture across Platform Development in New York, Platform Development in Washington, D.C., and other regions, ensuring SOC 2 readiness and high-availability design from day one.
Scaling Across Regions and Compliance
As your Superset deployment grows, you may need to scale across multiple regions or ensure compliance with standards like SOC 2 or ISO 27001. At PADISO, we embed Superset and analytics into larger platform engineering efforts. Whether you’re working in Platform Development in Toronto, Platform Development in Ottawa, or internationally across Platform Development in Chicago, Platform Development in Austin, or Platform Development in Dallas, the same memory tuning principles apply—but with additional considerations for data residency, latency, and audit logging.
For government and defence teams, we’ve deployed Superset on sovereign cloud platforms in Platform Development in Canberra and Platform Development in Wellington, ensuring IRAP/PROTECTED and Privacy Act alignment. On the US side, we support Platform Development in Washington, D.C. teams with FedRAMP-aware architecture and ATO support.
For mid-market and enterprise teams modernising their analytics stack, we’ve also deployed Superset on the Gold Coast and across Australia, replacing expensive per-seat BI tools. See Platform Development in Gold Coast for examples.
Async Query Execution in Practice
When you enable async queries, Superset’s behaviour changes significantly. Instead of blocking a Gunicorn worker while a query runs, the query is submitted to a Celery task queue. The user’s browser polls for results, and when the query completes, the result is stored in Redis (or another result backend) for the user to retrieve.
This architecture is essential for large deployments because it decouples query execution from the web tier. You can scale Celery workers independently of Gunicorn workers, and long-running queries no longer starve other users’ requests.
According to Apache Superset’s official documentation on async queries, you can configure per-query timeouts, result storage, and notification strategies. We often pair async queries with smart caching to ensure that repeated queries don’t even hit Celery—they’re served from Redis cache instead.
Advanced Caching Strategies
Beyond basic TTL-based caching, consider more sophisticated strategies. Research on why caching is key for fast BI dashboards shows that intelligent cache invalidation—rather than time-based expiry—can improve both performance and data freshness.
For example, when a user updates a record in your operational database, you can invalidate the relevant Superset cache entries immediately, rather than waiting for the TTL to expire. This requires event-driven cache invalidation, typically via a message queue:
# In your operational application:
from redis import Redis
def update_order(order_id, data):
# Update order in database
order = Order.query.get(order_id)
order.update(data)
db.commit()
# Invalidate Superset cache for affected dashboards
redis_client = Redis(host='redis', port=6379)
redis_client.delete('superset_data_orders_dashboard_*')
redis_client.delete('superset_data_order_detail_*')
This pattern, combined with async queries and multi-layer caching, can reduce memory pressure by 60–80% compared to naive configurations.
Monitoring Memory Pressure Over Time
Memory pressure isn’t a one-time problem; it evolves as your data and user base grow. Implement continuous monitoring to catch degradation early.
For Spotify’s approach to optimising data dashboards for scale, they track not just memory usage but also user-perceived latency (time to first chart render, time to full dashboard load). These metrics correlate with memory pressure: if memory is tight, garbage collection pauses increase, and user-facing latency spikes.
Track these metrics in Prometheus:
from prometheus_client import Histogram
dashboard_load_time = Histogram(
'superset_dashboard_load_seconds',
'Time to load dashboard',
buckets=[0.5, 1.0, 2.0, 5.0, 10.0]
)
with dashboard_load_time.time():
# Load dashboard and render charts
dashboard = get_dashboard(dashboard_id)
render_charts(dashboard)
Plot these metrics alongside memory usage to correlate spikes. If dashboard load time suddenly increases when memory usage crosses 70%, you’ve identified a memory pressure threshold.
Next Steps and Scaling {#next-steps}
Memory tuning is an iterative process. Start with baseline monitoring, apply the configuration changes in this guide, and measure the impact. Most deployments see a 30–50% reduction in peak memory usage after tuning Gunicorn workers, enabling caching, and implementing async queries.
Immediate Actions (Week 1)
- Establish baseline: Measure current memory usage under realistic load using
free,ps, and Prometheus. - Enable Redis caching: Deploy Redis and configure Superset to use it as the cache backend.
- Reduce Gunicorn workers: Lower
--workersfrom the default to 2–4 and monitor CPU utilisation. - Set chart-level cache TTLs: Audit your dashboards and set appropriate cache timeouts per chart.
Medium-Term Actions (Weeks 2–4)
- Enable async queries: Deploy Celery workers and configure async query execution.
- Optimise expensive queries: Use
EXPLAIN ANALYZEto identify slow queries and add indexes. - Implement monitoring: Set up Prometheus scraping, Grafana dashboards, and alerts for memory thresholds.
- Audit dashboard design: Reduce chart count per dashboard, implement pagination for large tables.
Long-Term Actions (Months 2+)
- Consider a data warehouse: If you have 100+ concurrent users, migrate from PostgreSQL to ClickHouse or Snowflake.
- Implement event-driven cache invalidation: Replace time-based TTLs with event-triggered invalidation for improved freshness.
- Establish SLOs: Define service-level objectives for dashboard load time and query latency, and track compliance.
- Plan for multi-region deployment: If users are geographically distributed, consider deploying Superset and caching layers closer to users.
Getting Help
If you’re building analytics into a larger platform modernisation effort, or if you need hands-on support tuning Superset for production, PADISO specialises in exactly this work. We’ve tuned Superset deployments across dozens of customer engagements, from startups embedding analytics into their SaaS products to enterprises consolidating legacy BI tools.
Our Platform Development in Australia team has deep experience with Superset, ClickHouse, and the full modern data stack. We work with founders and operators who need fractional CTO leadership and hands-on co-build support. Whether you’re in Sydney, Melbourne, Canberra, or across Australia, we can help you design, deploy, and tune Superset for production.
If you’re in the US or Canada, we have teams in Platform Development in New York, Platform Development in Washington, D.C., Platform Development in Chicago, Platform Development in Austin, Platform Development in Dallas, Platform Development in Toronto, and Platform Development in Ottawa ready to support your analytics modernisation.
Summary
Memory pressure in Apache Superset is manageable with systematic diagnosis, thoughtful configuration, and operational discipline. The core levers are:
- Right-size Gunicorn workers based on available memory, not CPU count.
- Implement Redis caching with appropriate TTLs per chart and datasource.
- Enable async queries to decouple query execution from the web tier.
- Tune your database backend to reduce query time and memory consumption.
- Optimise dashboard design to reduce concurrent query load.
- Monitor continuously to catch memory degradation early.
Apply these patterns iteratively, measure impact, and scale your infrastructure as your data and user base grow. Most teams see dramatic improvements within weeks, and the operational habits established here will serve you well as Superset becomes a critical part of your data platform.