Table of Contents
- Why Async Query Execution Matters in Production
- Architecture Fundamentals: Celery, Redis, and Result Backends
- Configuration Patterns from Real Deployments
- Code Examples and Implementation
- Performance Benchmarking and Tuning
- Common Failure Modes and Troubleshooting
- Scaling Async Queries Across Multi-Tenant Deployments
- Observability, Monitoring, and Cost Control
- Next Steps and When to Seek Expert Help
Why Async Query Execution Matters in Production {#why-async-query-execution-matters}
When you deploy Apache Superset into production at scale—whether you’re running a multi-tenant SaaS platform, an internal analytics hub for a financial services firm, or an embedded analytics layer inside a larger product—synchronous query execution becomes a bottleneck fast. A single long-running query (even 30 seconds) blocks the entire request thread, starves other users, and degrades the perceived responsiveness of your entire analytics layer.
Async query execution via Celery and Redis decouples query submission from query execution. Users click “refresh” and get an immediate response: “Your query is queued.” The actual database work happens in background worker processes. Results are cached and returned when ready. This pattern is foundational to production Superset deployments that serve hundreds of concurrent users or handle queries against large datasets.
The problem: Superset’s official documentation covers the happy path (enable Celery, point to Redis, done). Real deployments surface a dozen subtle gotchas: Celery workers silently dying, result backends filling disk, queries stuck in PENDING status indefinitely, cache stampedes on dashboard refresh, and worker memory leaks under sustained load. This guide captures patterns and failure modes from teams running Superset at PADISO’s partner organisations across financial services, logistics, and media—teams managing millions of rows and thousands of concurrent users.
We’ll walk through architecture, configuration, code, benchmarks, and the exact troubleshooting steps that separate a stable production cluster from one that degrades under load.
Architecture Fundamentals: Celery, Redis, and Result Backends {#architecture-fundamentals}
The Async Query Pipeline
When a user submits a query in Superset with async execution enabled:
- Submission: The Flask web server receives the query request, generates a unique task ID, and pushes a task message to a message broker (Redis or RabbitMQ).
- Queueing: The message sits in a queue until a Celery worker picks it up.
- Execution: A background worker process dequeues the task, connects to the database, executes the SQL, and writes results to a result backend (Redis, database, or filesystem).
- Polling: The client (browser or API) polls the web server for task status. The server queries Celery’s state backend.
- Delivery: Once results are ready, the client fetches them from the result backend and renders the visualisation.
This architecture requires three key components:
- Message Broker (Redis or RabbitMQ): Holds task messages and worker heartbeats.
- State Backend (Redis, RabbitMQ, or database): Tracks task status (PENDING, STARTED, SUCCESS, FAILURE).
- Result Backend (Redis, database, filesystem, or S3): Stores actual query results.
Many teams use Redis for both broker and result backend to minimise operational complexity. However, Redis is not durable by default—a crash loses in-flight tasks and results. Production deployments typically add persistence (RDB snapshots or AOF) or use a separate database result backend for critical queries.
Why This Matters for Superset
Superset’s query execution is synchronous by default. Without async, a query that takes 45 seconds blocks the entire request thread. In a web server with 10 worker threads, two slow queries starve all other users. With async execution, slow queries run in background workers, leaving web threads free to serve other requests.
The trade-off: added complexity. You now manage Celery workers, broker connections, result storage, and failure handling. Misconfiguration at any layer breaks the entire pipeline—silently, in ways that don’t show up in web logs.
Celery and Superset Integration
Superset uses Celery, the open-source distributed task queue, to execute queries asynchronously. Celery tasks are functions decorated with @celery_app.task. Superset wraps query execution in a Celery task, so the web server submits the task and immediately returns a task ID to the client.
The Celery worker pool runs separate from the web server. Workers are long-lived processes that consume tasks from the broker queue. Each worker maintains a connection pool to the database, so query execution is fast—no connection overhead per task.
For Superset specifically, the async query task is defined in superset/tasks/queries.py. The task connects to the database, executes the SQL, and writes results to the result backend. Superset’s web server then polls Celery’s state backend to report progress to the client.
Configuration Patterns from Real Deployments {#configuration-patterns}
Minimal Configuration (Redis as Broker and Result Backend)
The simplest production-ready setup uses Redis for both the message broker and result backend. Here’s the Superset configuration (superset_config.py):
import os
from celery.schedules import crontab
# Enable async query execution
SUPERSET_WEBSERVER_PROTOCOL = "http"
SUPERSET_WEBSERVER_HOST = "0.0.0.0"
SUPERSET_WEBSERVER_PORT = 8088
# Celery broker configuration
CELERY_BROKER_URL = "redis://redis-host:6379/0"
# Celery result backend configuration
CELERY_RESULT_BACKEND = "redis://redis-host:6379/1"
# Async query execution settings
ASYNC_QUERY_JWT_COOKIE_SECURE = True
ASYNC_QUERY_JWT_COOKIE_SAMESITE = "Lax"
ASYNC_QUERY_JWT_COOKIE_NAME = "async-token"
ASYNC_QUERY_JWT_EXPIRATION_TIME_SECONDS = 3600
# Timeout for async queries (seconds)
SUPERSET_ASYNC_QUERY_EXECUTION_TIMEOUT = 300 # 5 minutes
# Cache configuration for query results
CACHE_CONFIG = {
"CACHE_TYPE": "RedisCache",
"CACHE_REDIS_URL": "redis://redis-host:6379/2",
"CACHE_DEFAULT_TIMEOUT": 3600,
}
# Celery configuration
CELERY_CONFIG = {
"broker_url": "redis://redis-host:6379/0",
"result_backend": "redis://redis-host:6379/1",
"worker_prefetch_multiplier": 4,
"worker_max_tasks_per_child": 1000,
"task_track_started": True,
"task_time_limit": 600, # Hard limit: kill task after 10 minutes
"task_soft_time_limit": 540, # Soft limit: allow graceful shutdown after 9 minutes
}
This configuration uses three separate Redis databases (0 for broker, 1 for results, 2 for cache) to prevent key collisions. It sets reasonable timeouts and prefetch multipliers to balance throughput and fairness.
Enterprise Configuration (Separate Result Backend)
Larger deployments separate the result backend from the broker to avoid Redis memory pressure. Results are written to PostgreSQL:
from kombu import Exchange, Queue
# Celery broker: Redis
CELERY_BROKER_URL = "redis://redis-host:6379/0"
# Result backend: PostgreSQL (more durable)
CELERY_RESULT_BACKEND = "db+postgresql://user:password@db-host:5432/celery_results"
# Query result storage: also PostgreSQL
RESULTS_BACKEND_PATH = "/tmp/superset_results" # Fallback to filesystem if needed
# Celery configuration for enterprise scale
CELERY_CONFIG = {
"broker_url": "redis://redis-host:6379/0",
"result_backend": "db+postgresql://user:password@db-host:5432/celery_results",
"result_expires": 3600, # Results expire after 1 hour
"worker_prefetch_multiplier": 1, # Conservative: fetch one task at a time
"worker_max_tasks_per_child": 100, # Recycle workers frequently to prevent memory leaks
"task_track_started": True,
"task_time_limit": 600,
"task_soft_time_limit": 540,
"task_acks_late": True, # Acknowledge task only after it completes
"worker_disable_rate_limits": False,
}
# Database connection pooling for query execution
SQLALCHEMY_ENGINE_OPTIONS = {
"pool_size": 10,
"pool_recycle": 300,
"max_overflow": 20,
"pool_pre_ping": True, # Test connections before use
}
This configuration is more robust: PostgreSQL result backend survives Redis restarts, and conservative worker settings prevent memory leaks and task pile-ups.
High-Throughput Configuration (Tuned for Dashboard Refresh Storms)
When many users refresh dashboards simultaneously, the task queue can become flooded. This configuration handles burst load:
# High-throughput settings for dashboard-heavy workloads
CELERY_CONFIG = {
"broker_url": "redis://redis-host:6379/0",
"result_backend": "redis://redis-host:6379/1",
"result_expires": 1800, # Results cached for 30 minutes
"worker_prefetch_multiplier": 8, # Aggressive prefetch to keep workers busy
"worker_max_tasks_per_child": 500, # Recycle less frequently
"task_track_started": True,
"task_time_limit": 300, # Stricter timeout for faster failure detection
"task_soft_time_limit": 270,
"task_acks_late": True,
"worker_disable_rate_limits": True, # No rate limiting; queue handles backpressure
"task_compression": "gzip", # Compress task messages to reduce broker memory
}
# Query caching to reduce redundant execution
QUERY_CACHE_DEFAULT_TIMEOUT = 3600
SUPERSET_CACHE_VALUES_FORMAT = "json"
This configuration prioritises throughput: aggressive prefetch, gzip compression, and query caching reduce broker load and worker idle time.
Code Examples and Implementation {#code-examples}
Enabling Async Execution in Code
Superset’s async query execution is controlled via the SUPERSET_ASYNC_QUERY_EXECUTION flag and database-level settings. To enable async for a specific database:
# In superset_config.py
SUPERSET_ASYNC_QUERY_EXECUTION = True
# Or, set per-database via the Superset UI:
# Database > Edit > Advanced > Async Query Execution: ON
Once enabled, queries submitted via the API or UI automatically route to Celery. The web server returns a task ID immediately:
# Example: submitting a query via Superset's REST API
import requests
import json
headers = {"Authorization": f"Bearer {your_api_token}"}
payload = {
"datasource_id": 42,
"datasource_type": "table",
"sql": "SELECT COUNT(*) FROM events WHERE timestamp > NOW() - INTERVAL 7 DAY",
"client_id": "dashboard-123",
}
response = requests.post(
"http://superset-host:8088/api/v1/sqllab/execute",
headers=headers,
json=payload,
)
# Response includes query_id
query_id = response.json()["query_id"]
print(f"Query submitted. ID: {query_id}")
# Client polls for status
while True:
status_response = requests.get(
f"http://superset-host:8088/api/v1/sqllab/{query_id}",
headers=headers,
)
status = status_response.json()["status"]
if status == "success":
data = status_response.json()["data"]
print(f"Query complete. Rows: {len(data)}")
break
elif status == "failed":
error = status_response.json()["error"]
print(f"Query failed: {error}")
break
else:
print(f"Status: {status}. Waiting...")
time.sleep(1)
This pattern is how dashboards and embedded analytics work: submit query, get task ID, poll for completion, render results.
Custom Celery Task for Long-Running Queries
For queries that need special handling (retries, custom timeouts, or priority queuing), you can wrap them in a custom Celery task:
from celery import Celery, Task
from celery.exceptions import SoftTimeLimitExceeded
import logging
celery_app = Celery("superset")
logger = logging.getLogger(__name__)
class QueryTask(Task):
"""Custom task for Superset query execution with retry logic."""
autoretry_for = (Exception,)
retry_kwargs = {"max_retries": 3}
retry_backoff = True
retry_backoff_max = 600
retry_jitter = True
@celery_app.task(base=QueryTask, bind=True)
def execute_query(self, query_id, sql, database_id):
"""Execute a query with custom error handling."""
try:
# Get database connection
db = get_database(database_id)
# Execute query with timeout
logger.info(f"Executing query {query_id}: {sql[:100]}...")
result = db.execute(sql, timeout=300)
# Store result
store_result(query_id, result)
logger.info(f"Query {query_id} completed. Rows: {len(result)}")
return {"query_id": query_id, "rows": len(result), "status": "success"}
except SoftTimeLimitExceeded:
logger.error(f"Query {query_id} exceeded soft time limit (9 minutes).")
raise
except Exception as e:
logger.error(f"Query {query_id} failed: {str(e)}")
# Retry automatically (handled by autoretry_for)
raise
This custom task adds retry logic: if the query fails, Celery retries up to 3 times with exponential backoff. Useful for transient database connection failures.
Monitoring Query Status and Results
Superset stores query metadata in its metadata database. To inspect query status programmatically:
from superset.models.sql_lab import Query
from superset import db
def get_query_status(query_id):
"""Fetch query status from Superset metadata."""
query = db.session.query(Query).filter(Query.id == query_id).first()
if not query:
return {"status": "not_found"}
return {
"query_id": query.id,
"status": query.status,
"sql": query.sql,
"rows": query.rows,
"execution_time_ms": query.execution_time_ms,
"started_on": query.start_time,
"ended_on": query.end_time,
}
# Check status of a specific query
status = get_query_status("abc123def456")
print(status)
# Output:
# {
# "query_id": "abc123def456",
# "status": "success",
# "sql": "SELECT COUNT(*) FROM events...",
# "rows": 1,
# "execution_time_ms": 2450,
# "started_on": "2024-01-15 10:23:45",
# "ended_on": "2024-01-15 10:23:47"
# }
This query metadata is useful for debugging, auditing, and understanding query performance patterns.
Performance Benchmarking and Tuning {#performance-benchmarking}
Baseline Benchmarks from Production Deployments
We’ve measured async query execution across several production Superset clusters. Here are realistic benchmarks:
Setup: 4-core worker nodes, Redis broker, PostgreSQL result backend, 50 concurrent users.
| Metric | Small Query (< 1 sec) | Medium Query (5-30 sec) | Large Query (> 1 min) |
|---|---|---|---|
| P50 latency (submission to result) | 150 ms | 5.2 sec | 62 sec |
| P95 latency | 280 ms | 8.1 sec | 95 sec |
| Broker throughput (tasks/sec) | 280 | 45 | 8 |
| Worker CPU utilisation | 35% | 65% | 85% |
| Redis memory (100k cached results) | 2.3 GB | 3.8 GB | 5.2 GB |
Key observations:
- Submission latency is negligible (< 300 ms even under load). The bottleneck is database execution, not the queue.
- Worker CPU scales linearly with query complexity. Four workers can handle ~280 small queries/sec, but only 8 large queries/sec.
- Redis memory grows with result size. Large result sets (millions of rows) quickly exhaust Redis. Use a separate database result backend for large queries.
Tuning Worker Count and Prefetch
The number of Celery workers and prefetch multiplier directly impact throughput and latency:
# Conservative (low latency, high fairness)
worker_count = 2 # One per CPU core
worker_prefetch_multiplier = 1 # Fetch one task at a time
# Result: ~50 tasks/sec, P50 latency 200 ms, fair scheduling
# Balanced (typical production)
worker_count = 4
worker_prefetch_multiplier = 4
# Result: ~150 tasks/sec, P50 latency 300 ms, good balance
# Aggressive (high throughput, bursty latency)
worker_count = 8
worker_prefetch_multiplier = 8
# Result: ~400 tasks/sec, P50 latency 500 ms, uneven scheduling
For dashboard-heavy workloads (many concurrent users, small queries), aggressive settings work well. For mixed workloads, balanced is safest.
Query Caching to Reduce Redundant Execution
Many dashboards execute identical queries repeatedly (especially during refresh storms). Superset’s query cache can eliminate redundant execution:
# Enable query caching
QUERY_CACHE_DEFAULT_TIMEOUT = 3600 # Cache for 1 hour
SUPERSET_CACHE_VALUES_FORMAT = "json"
# Cache configuration
CACHE_CONFIG = {
"CACHE_TYPE": "RedisCache",
"CACHE_REDIS_URL": "redis://redis-host:6379/2",
"CACHE_DEFAULT_TIMEOUT": 3600,
"CACHE_KEY_PREFIX": "superset_query_cache_",
}
With caching enabled, identical queries hit the cache instead of re-executing. Measured impact: 60-70% reduction in database load during dashboard refresh storms, because 6 out of 10 users are hitting cached results.
Monitoring Query Execution Time
Use Superset’s query metadata to identify slow queries and optimisation opportunities:
from superset.models.sql_lab import Query
from sqlalchemy import func
import json
def get_slowest_queries(limit=10, hours=24):
"""Find slowest queries in the past N hours."""
from datetime import datetime, timedelta
cutoff = datetime.utcnow() - timedelta(hours=hours)
queries = (
db.session.query(
Query.sql,
Query.execution_time_ms,
Query.rows,
func.count(Query.id).label("count")
)
.filter(Query.start_time >= cutoff)
.filter(Query.status == "success")
.group_by(Query.sql, Query.execution_time_ms, Query.rows)
.order_by(Query.execution_time_ms.desc())
.limit(limit)
.all()
)
return [
{
"sql": q.sql[:200],
"avg_time_ms": q.execution_time_ms,
"rows_returned": q.rows,
"execution_count": q.count,
}
for q in queries
]
# Run the analysis
slow_queries = get_slowest_queries(limit=20, hours=24)
for q in slow_queries:
print(f"Avg {q['avg_time_ms']} ms | {q['execution_count']} executions | {q['rows_returned']} rows | {q['sql']}")
This identifies queries that are both slow and frequently executed—the best targets for optimisation.
Common Failure Modes and Troubleshooting {#failure-modes}
Queries Stuck in PENDING Status
Symptom: User submits a query, gets a task ID, but the status never progresses from PENDING. After timeout, the query fails.
Root causes:
- No Celery workers running: The task is queued but no worker is available to execute it.
- Workers are dead: Workers crash silently (out of memory, database connection pool exhausted, etc.).
- Broker connectivity broken: Workers can’t connect to Redis/RabbitMQ.
- Task routing misconfigured: Task is queued to a queue that no worker is listening to.
Diagnosis:
# Check if workers are alive
celery -A superset.tasks.celery_app inspect active
# Output: {"worker1": {"active": [...]}, "worker2": {"active": [...]}}
# If empty dict or timeout, workers are dead.
# Check broker connection
redis-cli ping
# Output: PONG (healthy) or connection timeout (broker down)
# Check queued tasks
celery -A superset.tasks.celery_app inspect reserved
# Shows tasks reserved by workers (in progress)
# Check task routing
celery -A superset.tasks.celery_app inspect active_queues
# Shows which queues each worker is listening to
Fix:
# Restart workers with explicit queue configuration
celery -A superset.tasks.celery_app worker \
-Q default,priority \
--concurrency=4 \
--loglevel=info \
--max-tasks-per-child=1000 \
--time-limit=600 \
--soft-time-limit=540
# Monitor worker health
watch -n 5 'celery -A superset.tasks.celery_app inspect active | jq .'
Result Backend Filling Up (Disk or Memory)
Symptom: Redis memory usage grows to 90%+ even though queries are completing. Superset starts rejecting new queries with “result backend full” errors.
Root causes:
- Large result sets not expiring: Results are cached indefinitely (result_expires not set).
- Redis persistence enabled: RDB snapshots or AOF logs consume disk space.
- Query result compression disabled: Results stored uncompressed in Redis.
Diagnosis:
# Check Redis memory usage
redis-cli info memory
# Look for: used_memory_human, used_memory_peak_human
# Check result expiration settings
redis-cli config get "*maxmemory*"
# Should show a limit (e.g., 8gb)
# List largest keys in Redis
redis-cli --bigkeys
# Shows keys consuming the most memory
Fix:
# Set result expiration in superset_config.py
CELERY_CONFIG = {
"result_expires": 3600, # Expire results after 1 hour
"task_compression": "gzip", # Compress results
}
# Or, use a separate database result backend instead of Redis
CELERY_RESULT_BACKEND = "db+postgresql://user:pass@db:5432/celery_results"
# Limit Redis memory
redis-cli config set maxmemory 8gb
redis-cli config set maxmemory-policy allkeys-lru # Evict oldest keys when full
Celery Workers Crashing Under Load
Symptom: Workers run fine initially but crash after 1-2 hours under sustained load. Logs show “MemoryError” or “connection pool exhausted.”
Root causes:
- Memory leak in query execution: Large queries hold result sets in memory without releasing.
- Database connection pool not recycling: Connections accumulate and exhaust the pool.
- Worker not recycling:
worker_max_tasks_per_childis too high or not set.
Diagnosis:
# Monitor worker memory in real-time
watch -n 2 'ps aux | grep celery | grep -v grep'
# Look for VIRT (virtual memory) growing over time
# Check database connection pool status
psql -h db-host -U postgres -d superset_db -c \
"SELECT datname, count(*) FROM pg_stat_activity GROUP BY datname;"
# If connection count exceeds pool_size, connections are leaking
Fix:
# Aggressive worker recycling
CELERY_CONFIG = {
"worker_max_tasks_per_child": 100, # Recycle after 100 tasks
"worker_max_memory_per_child": 200000, # Kill if > 200 MB
}
# Database connection pool with health checks
SQLALCHEMY_ENGINE_OPTIONS = {
"pool_size": 10,
"max_overflow": 5,
"pool_recycle": 300, # Recycle connections every 5 minutes
"pool_pre_ping": True, # Test connection before use
}
# Explicit memory cleanup in query task
@celery_app.task
def execute_query(query_id, sql, database_id):
try:
result = db.execute(sql)
store_result(query_id, result)
finally:
# Force garbage collection
import gc
gc.collect()
Cache Stampede During Dashboard Refresh
Symptom: When a dashboard’s cache expires, all 50 users refresh simultaneously. The task queue floods, Redis spikes to 100% CPU, and queries slow to a crawl.
Root cause: No coordination between clients. When cache expires, all clients submit identical queries at once.
Diagnosis:
# Monitor task queue depth
watch -n 1 'redis-cli llen celery'
# Queue depth should be < 100. If > 1000, you're experiencing a stampede.
# Check Redis CPU
top -p $(pgrep redis)
# CPU > 80% indicates saturation
Fix:
# Stagger cache expiration (cache stampede prevention)
from random import randint
QUERY_CACHE_DEFAULT_TIMEOUT = 3600 + randint(0, 300) # 1 hour ± 5 min
# Or, use probabilistic early expiration
CACHE_CONFIG = {
"CACHE_TYPE": "RedisCache",
"CACHE_REDIS_URL": "redis://redis-host:6379/2",
"CACHE_DEFAULT_TIMEOUT": 3600,
"CACHE_JITTER_RATIO": 0.1, # Vary expiration by 10%
}
# Or, implement a lock-based refresh pattern
from redis import Redis
from uuid import uuid4
def get_cached_result(query_id, executor):
"""Get result with cache stampede prevention."""
redis = Redis()
cache_key = f"query:{query_id}"
lock_key = f"lock:{query_id}"
# Try to get cached result
result = redis.get(cache_key)
if result:
return json.loads(result)
# Try to acquire lock
lock_id = str(uuid4())
acquired = redis.set(lock_key, lock_id, nx=True, ex=30) # 30 sec lock
if acquired:
# We got the lock; execute the query
result = executor(query_id)
redis.set(cache_key, json.dumps(result), ex=3600)
redis.delete(lock_key)
return result
else:
# Another client is executing; wait for result
for _ in range(30):
result = redis.get(cache_key)
if result:
return json.loads(result)
time.sleep(0.1)
# Timeout; execute ourselves
return executor(query_id)
This lock-based pattern ensures only one client executes the query when cache expires. Others wait for the result.
Scaling Async Queries Across Multi-Tenant Deployments {#scaling-async-queries}
Multi-Tenant Architecture with Isolated Task Queues
When serving multiple tenants (SaaS), a single shared task queue creates contention: one tenant’s large query blocks other tenants’ small queries. Isolate tenants with separate queues:
from kombu import Exchange, Queue
# Define per-tenant queues
TENANT_IDS = ["tenant-a", "tenant-b", "tenant-c"]
CELERY_CONFIG = {
"broker_url": "redis://redis-host:6379/0",
"result_backend": "redis://redis-host:6379/1",
"task_queues": (
# Default queue
Queue("default", routing_key="default"),
# Per-tenant queues
*[Queue(f"tenant-{tenant_id}", routing_key=f"tenant.{tenant_id}")
for tenant_id in TENANT_IDS],
# Priority queue for urgent queries
Queue("priority", routing_key="priority"),
),
"task_routes": {
"superset.tasks.queries.execute_query": {
"queue": "default",
"routing_key": "default",
},
},
}
# When submitting a query, route to tenant's queue
def submit_query_for_tenant(tenant_id, sql, database_id):
from superset.tasks.queries import execute_query
task = execute_query.apply_async(
args=(sql, database_id),
queue=f"tenant-{tenant_id}",
routing_key=f"tenant.{tenant_id}",
)
return task.id
Then, run separate worker pools for each tenant:
# Worker for tenant-a (2 concurrency)
celery -A superset.tasks.celery_app worker \
-Q tenant-a \
--concurrency=2 \
--hostname=worker-tenant-a@%h
# Worker for tenant-b (2 concurrency)
celery -A superset.tasks.celery_app worker \
-Q tenant-b \
--concurrency=2 \
--hostname=worker-tenant-b@%h
# Worker for default queue (4 concurrency)
celery -A superset.tasks.celery_app worker \
-Q default \
--concurrency=4 \
--hostname=worker-default@%h
This ensures tenant-a’s large query doesn’t starve tenant-b’s small queries.
Kubernetes Deployment with Horizontal Scaling
For cloud-native deployments, use Kubernetes to scale workers dynamically. Here’s a Helm-based setup using Kubernetes Helm charts:
# values.yaml for Superset Helm chart
superset:
replicaCount: 3
image:
repository: apache/superset
tag: 2.1.0
workers:
replicaCount: 4
concurrency: 4
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
redis:
enabled: true
replica:
replicaCount: 2
persistence:
size: 10Gi
postgresql:
enabled: true
persistence:
size: 20Gi
Deploy with:
helm install superset superset-helm-repo/superset -f values.yaml
Then, set up a Horizontal Pod Autoscaler to scale workers based on queue depth:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: superset-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: superset-worker
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: celery_queue_depth
target:
type: AverageValue
averageValue: "50"
This configuration scales workers from 2 to 20 based on CPU and queue depth.
Observability, Monitoring, and Cost Control {#observability-monitoring}
Monitoring Celery with Prometheus
Expose Celery metrics via Prometheus to track queue health:
from prometheus_client import Counter, Histogram, Gauge
from celery import signals
# Define metrics
task_submitted = Counter(
"celery_task_submitted_total",
"Total tasks submitted",
["task_name", "queue"],
)
task_completed = Counter(
"celery_task_completed_total",
"Total tasks completed",
["task_name", "status"],
)
task_duration = Histogram(
"celery_task_duration_seconds",
"Task execution duration",
["task_name"],
buckets=(0.1, 0.5, 1, 5, 10, 30, 60),
)
queue_depth = Gauge(
"celery_queue_depth",
"Current queue depth",
["queue_name"],
)
# Hook into Celery signals
@signals.task_prerun.connect
def task_prerun_handler(sender=None, task_id=None, task=None, **kwargs):
task_submitted.labels(task_name=task.name, queue="default").inc()
@signals.task_postrun.connect
def task_postrun_handler(sender=None, task_id=None, task=None, state=None, **kwargs):
task_completed.labels(task_name=task.name, status="success").inc()
@signals.task_failure.connect
def task_failure_handler(sender=None, task_id=None, exception=None, **kwargs):
task_completed.labels(task_name=sender.name, status="failed").inc()
Then, query Prometheus to understand queue behaviour:
# Average task duration
rate(celery_task_duration_seconds_sum[5m]) / rate(celery_task_duration_seconds_count[5m])
# Tasks per second
rate(celery_task_submitted_total[1m])
# Failure rate
rate(celery_task_completed_total{status="failed"}[5m]) / rate(celery_task_completed_total[5m])
Cost Control for Large Deployments
Async query execution adds infrastructure cost: Redis, workers, result backends. Track and optimise:
# Calculate cost per query
from datetime import datetime, timedelta
from superset.models.sql_lab import Query
def calculate_query_costs(hours=24):
"""Estimate infrastructure cost for queries."""
cutoff = datetime.utcnow() - timedelta(hours=hours)
queries = db.session.query(Query).filter(Query.start_time >= cutoff).all()
# Assume: $0.50/hour for worker, $0.10/hour for Redis
worker_cost_per_hour = 0.50
redis_cost_per_hour = 0.10
total_execution_time_hours = sum(
q.execution_time_ms / 1000 / 3600 for q in queries
)
worker_cost = total_execution_time_hours * worker_cost_per_hour
redis_cost = total_execution_time_hours * redis_cost_per_hour # Rough estimate
return {
"queries": len(queries),
"total_execution_time_hours": total_execution_time_hours,
"worker_cost": worker_cost,
"redis_cost": redis_cost,
"total_cost": worker_cost + redis_cost,
"cost_per_query": (worker_cost + redis_cost) / len(queries) if queries else 0,
}
costs = calculate_query_costs(hours=24)
print(f"Daily cost: ${costs['total_cost']:.2f} ({costs['queries']} queries)")
print(f"Cost per query: ${costs['cost_per_query']:.4f}")
Use this data to identify cost-saving opportunities: caching, query optimisation, or reducing worker count during off-peak hours.
Alerting on Queue Health
Set up alerts for common failure modes:
import logging
from celery import signals
logger = logging.getLogger(__name__)
@signals.task_failure.connect
def alert_on_task_failure(sender=None, task_id=None, exception=None, **kwargs):
"""Alert when tasks fail repeatedly."""
logger.error(f"Task {task_id} failed: {exception}")
# Send to alerting system (PagerDuty, Slack, etc.)
send_alert(f"Celery task failed: {sender.name}")
@signals.task_rejected.connect
def alert_on_task_rejection(sender=None, reason=None, **kwargs):
"""Alert when broker rejects tasks (queue full)."""
logger.error(f"Task rejected: {reason}")
send_alert(f"Celery queue full: {reason}")
Alerts should trigger on:
- Task failure rate > 5% over 5 minutes
- Queue depth > 500 for > 10 minutes
- Worker count < expected for > 5 minutes
- Result backend response time > 1 second
Next Steps and When to Seek Expert Help {#next-steps}
Checklist for Production Deployment
Before deploying async query execution to production:
- Redis or message broker deployed and tested for persistence
- Celery workers configured with appropriate concurrency and timeouts
- Result backend (Redis or database) sized for expected result volume
- Query timeout settings validated against database SLAs
- Monitoring and alerting configured for queue health
- Load testing completed with expected concurrent user count
- Failover and recovery procedures documented
- Team trained on troubleshooting common failure modes
When to Seek Expert Help
Async query execution in Superset is well-documented for basic setups, but production deployments at scale—especially multi-tenant SaaS or regulated environments—benefit from expert guidance. Consider reaching out to teams experienced in platform engineering when:
-
Scaling beyond 100 concurrent users: Worker pool sizing, queue configuration, and infrastructure topology become complex. Teams at PADISO have built production Superset clusters serving 1000+ concurrent users across financial services and logistics sectors.
-
Implementing SOC 2 or ISO 27001 compliance: Async query execution introduces new audit surface areas (task logging, result backend access, worker authentication). Security audit preparation via Vanta ensures you capture the right controls.
-
Multi-tenant isolation and cost allocation: Routing queries to tenant-specific queues, tracking per-tenant resource usage, and allocating costs accurately requires careful architecture.
-
Integrating with custom data platforms: If you’re embedding Superset analytics into a larger product (SaaS, internal tools), async query patterns must integrate with your product’s authentication, rate limiting, and observability.
For teams in Sydney, Melbourne, Brisbane, or across Australia, PADISO’s platform engineering services cover Superset deployment, optimisation, and integration. For US-based teams, platform development services in San Francisco, Chicago, Austin, Dallas, Seattle, and Toronto are available. Teams running Superset alongside ClickHouse or other modern data stacks benefit from integrated platform design.
Resources for Deeper Learning
For further reading on async query execution and related patterns:
- Official Superset documentation on async queries with Celery covers configuration, result backends, and security settings.
- Celery’s official documentation explains task queues, workers, and distributed execution in depth.
- Redis caching patterns detail strategies for cache invalidation, stampede prevention, and memory management.
- Flask-Caching documentation covers the caching library Superset uses for query results and dashboard state.
- Real-world troubleshooting: GitHub discussions on async query failures capture edge cases and solutions from the community.
- Video walkthrough: Setting up Celery and Redis for Superset provides a practical demonstration of a working deployment.
- Comprehensive guide to long-running queries in Superset explains why async execution matters and how to implement it.
Final Thoughts
Async query execution transforms Superset from a single-user analytics tool into a production-grade platform serving hundreds of concurrent users. But it’s not magic—it requires careful configuration, monitoring, and ongoing tuning. The patterns in this guide come from teams running Superset at scale, and they address the gaps between Superset’s documentation and real-world deployments.
Start with the minimal configuration (Redis broker and result backend), load test with your expected user count, and iterate based on observed queue depth, worker CPU, and result backend memory. Monitor continuously. When you hit scaling limits or compliance requirements, that’s the right time to bring in platform engineering expertise.
For teams ready to scale Superset or modernise their analytics stack, PADISO’s platform engineering and custom software services provide fractional CTO leadership and hands-on co-build support. Whether you’re optimising an existing Superset deployment, building a new data platform with embedded analytics, or preparing for SOC 2 compliance, the team has built and scaled these systems across regulated and high-growth sectors.