PADISO.ai: AI Agent Orchestration Platform - Launching April 2026
Back to Blog
Guide 5 mins

Apache Superset on Kubernetes: The D23.io Reference Architecture

Production-grade Apache Superset on Kubernetes with Helm charts, autoscaling workers, Redis caching, and Celery beat. Complete reference architecture guide.

Padiso Team ·2026-04-17

Apache Superset on Kubernetes: The D23.io Reference Architecture

Table of Contents

  1. Why Kubernetes for Apache Superset?
  2. The D23.io Reference Architecture Overview
  3. Prerequisites and Core Dependencies
  4. Helm Chart Deployment Strategy
  5. Autoscaling Workers Configuration
  6. Redis Caching Layer Implementation
  7. Celery Beat for Scheduled Reports
  8. Production Readiness and Testing
  9. Security and Compliance Considerations
  10. Monitoring and Observability
  11. Troubleshooting and Optimisation
  12. Next Steps and Scaling Strategy

Why Kubernetes for Apache Superset?

Apache Superset is a powerful, open-source data visualisation and business intelligence platform that has become the de facto standard for organisations seeking flexible, self-serve analytics. However, deploying Superset at scale—especially across mid-market and enterprise environments—requires more than a basic containerised setup. This is where Kubernetes becomes essential.

Kubernetes provides the orchestration, resilience, and elasticity that production Superset deployments demand. Unlike traditional VM-based or single-container deployments, Kubernetes documentation explains that the platform manages containerised workloads and services across clusters of machines, enabling you to scale workers independently, maintain high availability, and recover gracefully from failures.

For organisations running Superset at mid-market scale—handling hundreds of concurrent users, thousands of dashboards, and complex ETL workflows—Kubernetes eliminates the operational burden of manual scaling and health management. The D23.io reference architecture builds on this foundation by adding production-grade patterns: Helm-based deployment, Redis caching for query performance, Celery workers for asynchronous task processing, and Celery Beat for scheduled report generation.

This approach is particularly valuable for teams that already have Kubernetes infrastructure in place or are modernising their data stack. If you’re exploring how to architect your data platform with resilience and cost efficiency in mind, this guide walks you through the proven patterns that have shipped successfully at scale.

The D23.io Reference Architecture Overview

The D23.io reference architecture represents a battle-tested, production-grade deployment model for Apache Superset on Kubernetes. It combines several key components into a cohesive system designed to handle real-world analytics workloads without operational friction.

Architecture Components

At its core, the architecture consists of:

  • Superset Web Pods: Stateless Flask application instances handling user requests, dashboard rendering, and query execution.
  • Celery Worker Pods: Background task processors that handle long-running operations like chart generation, data exports, and asynchronous query execution.
  • Celery Beat Pod: A scheduler that triggers periodic tasks, such as refreshing cached data and generating scheduled reports.
  • Redis Cache: An in-memory data store providing both task queue management for Celery and query result caching for Superset.
  • PostgreSQL Database: The persistent metadata store for dashboards, users, datasets, and configuration.
  • Persistent Volumes: Kubernetes storage for database data and uploaded files.

This separation of concerns—web serving, background processing, and caching—ensures that no single component becomes a bottleneck. When query volume spikes, you scale Celery workers independently. When dashboard rendering lags, you increase web pod replicas. When caching needs grow, Redis scales horizontally.

Why This Architecture Matters

Mid-market organisations typically face a critical problem: their BI platform becomes a victim of its own success. As more teams discover self-serve analytics, query volume increases exponentially. Without proper architecture, Superset slows to a crawl, and teams revert to manual reporting or expensive third-party tools.

The D23.io model solves this by design. Stateless web pods mean you can scale horizontally without session affinity issues. Celery workers mean long-running queries don’t block user-facing requests. Redis caching means repeated queries execute in milliseconds instead of minutes. Celery Beat ensures scheduled reports run reliably without cron job management overhead.

For teams at PADISO—who work with AI agency for startups Sydney and AI agency for enterprises Sydney on data modernisation projects—this architecture has become the standard recommendation when Superset is part of the analytics stack. It bridges the gap between development simplicity and production reliability.

Prerequisites and Core Dependencies

Before deploying Superset on Kubernetes using this reference architecture, you need several foundational components in place.

Kubernetes Cluster Requirements

You’ll need an operational Kubernetes cluster, version 1.20 or later. This can be:

  • Amazon EKS (Elastic Kubernetes Service)
  • Azure AKS (Azure Kubernetes Service)
  • Google GKE (Google Kubernetes Engine)
  • Self-managed Kubernetes (on-premises or cloud VMs)

For mid-market workloads, we recommend a minimum of 3 worker nodes with 4 CPU cores and 8GB RAM per node. This provides enough capacity for Superset web pods, Celery workers, and supporting services whilst maintaining redundancy.

Helm Package Manager

Helm is the Kubernetes package manager, and it’s essential for managing the Superset deployment. Helm allows you to define, install, and upgrade Kubernetes applications using templated manifests called charts. The official Apache Superset Helm chart is maintained in the project’s GitHub repository, providing a community-supported deployment method that abstracts away much of the YAML complexity.

Install Helm on your local machine or CI/CD pipeline:

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Verify the installation:

helm version

External Dependencies

The architecture requires three external services:

  1. PostgreSQL Database: Superset stores all metadata (dashboards, users, datasets, permissions) in PostgreSQL. For production, use a managed service like AWS RDS, Azure Database for PostgreSQL, or Google Cloud SQL. Minimum version 12, with automated backups and high availability enabled.

  2. Redis Cache: Used for Celery task queuing and Superset query caching. Deploy Redis 6.0+ with persistence enabled. For production, use managed Redis services like AWS ElastiCache, Azure Cache for Redis, or Google Cloud Memorystore.

  3. Object Storage: Optional but recommended for storing chart exports, uploaded datasets, and backup files. Use S3-compatible storage (AWS S3, MinIO, or equivalent).

Kubectl and Cluster Access

Ensure you have kubectl installed and configured to access your Kubernetes cluster. Verify access:

kubectl cluster-info

You should see your cluster’s API server and DNS information.

Helm Chart Deployment Strategy

The Helm chart deployment is the foundation of this reference architecture. Rather than manually writing Kubernetes manifests, Helm abstracts complexity and provides sensible defaults whilst allowing customisation.

Adding the Superset Helm Repository

The official Apache Superset Helm chart can be added from multiple sources. The community-maintained chart in the Apache Superset repository is the recommended option:

helm repo add superset https://apache.github.io/superset
helm repo update

Alternatively, Bitnami maintains a packaged Helm chart available on Artifact Hub, which offers additional features and support. Choose based on your organisation’s preference for community-maintained versus commercially-supported options.

Creating a Custom Values File

The Helm chart is configured via a values.yaml file. This file specifies resource allocation, replica counts, database connections, and component-specific settings. For the D23.io reference architecture, create a custom values file that enables all necessary components:

image:
  repository: apache/superset
  tag: "latest-dev"
  pullPolicy: IfNotPresent

replicaCount: 3

resources:
  limits:
    cpu: 2
    memory: 2Gi
  requests:
    cpu: 1
    memory: 1Gi

postgresql:
  enabled: false  # Use external PostgreSQL
  
externalPostgresql:
  host: "postgres.example.com"
  port: 5432
  database: "superset"
  username: "superset"
  password: "${DB_PASSWORD}"  # Use secrets

redis:
  enabled: true
  replica:
    replicaCount: 2
  master:
    persistence:
      enabled: true
      size: 10Gi

celery:
  enabled: true
  worker:
    replicas: 4
    resources:
      limits:
        cpu: 2
        memory: 2Gi
      requests:
        cpu: 1
        memory: 1Gi
  beat:
    enabled: true
    replicas: 1
    resources:
      limits:
        cpu: 1
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 512Mi

ingress:
  enabled: true
  ingressClassName: nginx
  hosts:
    - host: superset.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: superset-tls
      hosts:
        - superset.example.com

This configuration:

  • Disables the bundled PostgreSQL (you’ll use an external managed database)
  • Sets 3 replicas for the web service for high availability
  • Allocates 1-2 CPU cores and 1-2GB RAM per pod
  • Enables Redis with 2 replicas for caching and task queuing
  • Configures 4 Celery worker replicas for background task processing
  • Enables Celery Beat for scheduled task execution
  • Configures ingress for external access

Deploying with Helm

Create a Kubernetes namespace for Superset:

kubectl create namespace superset

Store sensitive values (database passwords, secret keys) in Kubernetes secrets:

kubectl create secret generic superset-db-credentials \
  --from-literal=password='your-secure-password' \
  -n superset

Deploy the Helm chart:

helm install superset superset/superset \
  -f values.yaml \
  -n superset

Monitor the deployment:

kubectl get pods -n superset
kubectl logs -n superset -l app=superset --tail=50

Wait for all pods to reach the Running state before proceeding. This typically takes 2-3 minutes.

Autoscaling Workers Configuration

One of the key advantages of Kubernetes is the ability to scale components independently based on demand. The D23.io architecture implements autoscaling for both web pods and Celery workers, ensuring your Superset deployment grows with query volume.

Horizontal Pod Autoscaler (HPA) for Web Pods

The Horizontal Pod Autoscaler automatically increases or decreases the number of web pod replicas based on CPU or memory usage. Add this configuration to your values file:

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilisationPercentage: 70
  targetMemoryUtilisationPercentage: 80

This means:

  • Minimum of 3 pods always running (for availability)
  • Maximum of 10 pods (cost control)
  • Scale up when CPU exceeds 70%
  • Scale up when memory exceeds 80%

Deploy the HPA:

kubectl apply -f - <<EOF
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: superset-hpa
  namespace: superset
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: superset
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
EOF

Monitor HPA activity:

kubectl get hpa -n superset --watch

Celery Worker Autoscaling

Celery workers handle background tasks. Autoscaling workers based on queue depth ensures long-running operations don’t block user-facing requests. Install the KEDA (Kubernetes Event Autoscaling) operator to enable queue-based scaling:

helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace

Create a ScaledObject to scale Celery workers based on Redis queue length:

kubectl apply -f - <<EOF
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: superset-celery-scaler
  namespace: superset
spec:
  scaleTargetRef:
    name: superset-celery-worker
  minReplicaCount: 2
  maxReplicaCount: 20
  triggers:
  - type: redis
    metadata:
      address: superset-redis:6379
      listName: celery
      listLength: "30"
      databaseIndex: "0"
EOF

This configuration:

  • Maintains a minimum of 2 worker pods
  • Scales up to 20 pods when needed
  • Adds a new worker when the Redis queue exceeds 30 pending tasks
  • Scales down when queue depth decreases

Monitor worker scaling:

kubectl get scaledobjects -n superset
kubectl describe scaledobject superset-celery-scaler -n superset

Vertical Pod Autoscaler (VPA) for Right-Sizing

Whilst HPA scales the number of pods, the Vertical Pod Autoscaler (VPA) optimises resource requests and limits based on actual usage. Install VPA:

kubectl apply -f https://github.com/kubernetes/autoscaler/releases/download/vertical-pod-autoscaler-0.14.0/vpa-v0.14.0.yaml

Create a VPA policy for Superset:

kubectl apply -f - <<EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: superset-vpa
  namespace: superset
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: superset
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: superset
      minAllowed:
        cpu: 500m
        memory: 512Mi
      maxAllowed:
        cpu: 4
        memory: 4Gi
EOF

VPA monitors actual resource consumption and recommends adjustments, helping you avoid both under-provisioning (performance issues) and over-provisioning (wasted cost).

Redis Caching Layer Implementation

Redis is critical to Superset performance at scale. It serves two purposes: as the Celery task broker (managing background job queues) and as Superset’s query result cache (storing query outputs for rapid retrieval).

Redis Architecture for High Availability

For production deployments, Redis must be highly available. The recommended topology is Redis Sentinel or Redis Cluster. Using the Helm chart, enable Redis with replication:

redis:
  enabled: true
  architecture: replication
  auth:
    enabled: true
    password: "${REDIS_PASSWORD}"
  master:
    persistence:
      enabled: true
      size: 20Gi
      storageClassName: "fast-ssd"  # Use fast storage for Redis
  replica:
    replicaCount: 2
    persistence:
      enabled: true
      size: 20Gi
  sentinel:
    enabled: true
    quorum: 2
    downAfterMilliseconds: 5000
    failoverTimeout: 10000

This configuration:

  • Enables Redis replication with 1 master and 2 replicas
  • Activates Redis Sentinel for automatic failover
  • Persists data to fast SSD storage
  • Requires quorum of 2 sentinels to declare master failure
  • Fails over within 10 seconds of master unavailability

Configuring Superset to Use Redis

Superset needs to know how to connect to Redis. Add these environment variables to your Superset deployment:

env:
  - name: REDIS_HOST
    value: "superset-redis-master"
  - name: REDIS_PORT
    value: "6379"
  - name: REDIS_PASSWORD
    valueFrom:
      secretKeyRef:
        name: superset-redis-password
        key: password
  - name: REDIS_DB
    value: "0"
  - name: CACHE_REDIS_URL
    value: "redis://:$(REDIS_PASSWORD)@$(REDIS_HOST):$(REDIS_PORT)/1"
  - name: CELERY_BROKER_URL
    value: "redis://:$(REDIS_PASSWORD)@$(REDIS_HOST):$(REDIS_PORT)/2"

Superset uses separate Redis databases for caching (DB 1) and Celery task queuing (DB 2), preventing interference between workloads.

Query Result Caching Strategy

Caching is where Superset truly shines. Without caching, repeated queries execute against the database every time. With caching, results are served from Redis in milliseconds. Configure caching in your Superset configuration:

CACHE_CONFIG = {
    "CACHE_TYPE": "RedisCache",
    "CACHE_REDIS_URL": os.environ.get("CACHE_REDIS_URL"),
    "CACHE_DEFAULT_TIMEOUT": 300,  # 5 minutes
    "CACHE_KEY_PREFIX": "superset_query_cache_",
}

DATA_CACHE_CONFIG = {
    "CACHE_TYPE": "RedisCache",
    "CACHE_REDIS_URL": os.environ.get("CACHE_REDIS_URL"),
    "CACHE_DEFAULT_TIMEOUT": 3600,  # 1 hour for data cache
}

For mid-market deployments handling thousands of queries daily, query result caching typically reduces database load by 60-80%. Users see results instantly for repeated queries, whilst fresh queries run asynchronously in Celery workers.

Monitoring Redis Performance

Monitor Redis metrics to ensure the cache layer isn’t becoming a bottleneck:

kubectl exec -it superset-redis-master-0 -n superset -- redis-cli
> INFO stats
> INFO memory
> DBSIZE

Key metrics to watch:

  • Hit Rate: Percentage of cache hits vs. misses (aim for >70%)
  • Memory Usage: Ensure you’re not evicting cached data due to memory pressure
  • Connected Clients: Number of clients connected to Redis
  • Slow Log: Queries taking >1ms (investigate and optimise)

Celery Beat for Scheduled Reports

Celery Beat is Superset’s task scheduler. It triggers periodic jobs like refreshing cached data, generating scheduled reports, and running data quality checks. Unlike cron jobs, Celery Beat is distributed and fault-tolerant—if a Beat pod fails, another takes over.

Enabling and Configuring Celery Beat

Enable Celery Beat in your Helm values:

celery:
  beat:
    enabled: true
    replicas: 1
    resources:
      limits:
        cpu: 1
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 512Mi
    persistence:
      enabled: true
      size: 1Gi

Celery Beat requires persistent storage to maintain its schedule database. This ensures that if the pod restarts, it picks up where it left off without missing scheduled tasks.

Defining Scheduled Tasks

Scheduled tasks are defined in your Superset configuration. Create a celerybeat_config.py file:

from celery.schedules import crontab
from datetime import timedelta

CELERY_BEAT_SCHEDULE = {
    "refresh-dashboards": {
        "task": "superset.tasks.refresh_dashboard_cache",
        "schedule": timedelta(minutes=30),
        "args": (),
    },
    "generate-daily-reports": {
        "task": "superset.tasks.generate_report",
        "schedule": crontab(hour=6, minute=0),  # 6 AM daily
        "args": (),
    },
    "cleanup-old-cache": {
        "task": "superset.tasks.cleanup_cache",
        "schedule": crontab(hour=2, minute=0),  # 2 AM daily
        "args": (),
    },
    "data-quality-checks": {
        "task": "superset.tasks.run_data_quality_checks",
        "schedule": timedelta(hours=1),
        "args": (),
    },
}

Mount this configuration into your Superset pods:

configMap:
  SUPERSET_CONFIG_PATH: /app/pythonpath/superset_config.py

volumes:
  - name: config
    configMap:
      name: superset-beat-config

volumeMounts:
  - name: config
    mountPath: /app/pythonpath

Monitoring Scheduled Tasks

View active Celery Beat tasks:

kubectl logs -n superset -l app=superset,component=beat --tail=100

Monitor task execution in Redis:

kubectl exec -it superset-redis-master-0 -n superset -- redis-cli
> KEYS celery-task-meta-*
> GET celery-task-meta-{task-id}

For production visibility, integrate Celery with Prometheus and Grafana to track task success rates, execution times, and failure patterns. This ensures scheduled reports run reliably and you’re alerted to failures immediately.

Production Readiness and Testing

Before deploying this architecture to production, rigorous testing at mid-market scale is essential. The D23.io reference architecture has been validated against realistic workloads, but your specific use case may have unique characteristics.

Load Testing

Use Apache JMeter or Locust to simulate realistic user and query loads:

from locust import HttpUser, task, between

class SupersetUser(HttpUser):
    wait_time = between(1, 5)
    
    @task(3)
    def view_dashboard(self):
        self.client.get("/api/v1/dashboard/1/")
    
    @task(1)
    def run_query(self):
        self.client.post("/api/v1/chart/data", json={
            "query_context": {"datasource": {"id": 1, "type": "table"}}
        })
    
    @task(1)
    def export_chart(self):
        self.client.get("/api/v1/chart/1/data/")

Run load tests with gradually increasing concurrency:

locust -f locustfile.py -u 100 -r 10 -t 10m --headless -H https://superset.example.com

Monitor system behaviour:

  • Pod CPU and memory usage
  • Query response times
  • Celery worker queue depth
  • Redis hit rate and latency
  • Database connection pool status

Database Performance Validation

Superset’s performance is ultimately limited by database performance. Before production deployment:

  1. Index Key Columns: Ensure all columns used in WHERE clauses, JOIN conditions, and GROUP BY statements are indexed.

  2. Analyse Query Plans: Use EXPLAIN ANALYSE to understand how queries execute:

EXPLAIN ANALYSE SELECT * FROM large_table WHERE indexed_column = 'value';
  1. Test with Production Data Volume: Load tests should use realistic data volumes. A query that performs well on 1M rows may struggle with 1B rows.

  2. Monitor Connection Pool: PostgreSQL has a limited number of connections. Ensure Superset and Celery workers don’t exhaust the pool:

SELECT datname, count(*) FROM pg_stat_activity GROUP BY datname;

Chaos Engineering

Test failure scenarios to ensure the architecture is resilient:

  1. Pod Failures: Kill random pods and verify the system recovers:
kubectl delete pod -n superset -l app=superset --random=1
  1. Network Partitions: Simulate network latency or failures using network policies.

  2. Resource Exhaustion: Reduce available memory and verify graceful degradation.

  3. Database Failover: Test behaviour when the primary database becomes unavailable.

For teams implementing data modernisation projects—as discussed in PADISO’s AI agency for enterprises Sydney services—chaos testing ensures that critical analytics infrastructure remains available during incidents.

Security and Compliance Considerations

Superset handles sensitive business data. Security must be built in from the start, not retrofitted later.

Network Security

Implement network policies to restrict traffic:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: superset-network-policy
  namespace: superset
spec:
  podSelector:
    matchLabels:
      app: superset
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: nginx-ingress
    ports:
    - protocol: TCP
      port: 8088
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: superset-redis
    ports:
    - protocol: TCP
      port: 6379
  - to:
    - podSelector:
        matchLabels:
          app: superset-postgres
    ports:
    - protocol: TCP
      port: 5432
  - to:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: TCP
      port: 53  # DNS

RBAC and Authentication

Superset supports multiple authentication backends. For enterprise deployments, integrate with LDAP or SAML:

AUTHENTICATION_BACKENDS = [
    "superset.security.SupersetSecurityManager",
]

AUTH_LDAP_SERVER = "ldap://ldap.example.com"
AUTH_LDAP_USE_SSL = True
AUTH_LDAP_BIND_USER = "cn=admin,dc=example,dc=com"
AUTH_LDAP_BIND_PASSWORD = os.environ.get("LDAP_BIND_PASSWORD")
AUTH_LDAP_SEARCH = "ou=users,dc=example,dc=com"
AUTH_LDAP_UID_FIELD = "uid"

For Kubernetes-native authentication, use OpenID Connect (OIDC) with providers like Keycloak or Auth0.

Data Encryption

Enable encryption in transit and at rest:

  1. TLS for Ingress: Configure TLS certificates for HTTPS access:
ingress:
  tls:
  - secretName: superset-tls
    hosts:
    - superset.example.com
  1. Database Encryption: Enable encryption at rest in your managed PostgreSQL service.

  2. Redis Encryption: Enable TLS for Redis connections:

redis:
  tls:
    enabled: true
    certSecret: redis-tls-cert

Secrets Management

Never store secrets in Helm values or ConfigMaps. Use Kubernetes Secrets or external secret management:

kubectl create secret generic superset-secrets \
  --from-literal=SECRET_KEY='your-secret-key' \
  --from-literal=REDIS_PASSWORD='your-redis-password' \
  --from-literal=DB_PASSWORD='your-db-password' \
  -n superset

Reference secrets in your deployment:

env:
- name: SECRET_KEY
  valueFrom:
    secretKeyRef:
      name: superset-secrets
      key: SECRET_KEY

For organisations pursuing SOC 2 or ISO 27001 compliance—as outlined in PADISO’s security audit services—proper secrets management is a foundational control. Audit logs should track all secret access, and rotation policies should be enforced.

Audit Logging

Enable audit logging to track who accessed what data and when:

LOG_FORMAT = "%(asctime)s:%(name)s:%(levelname)s:%(message)s"
LOG_LEVEL = "INFO"

LOGGING_CONFIG = {
    "version": 1,
    "disable_existing_loggers": False,
    "handlers": {
        "audit": {
            "class": "logging.handlers.RotatingFileHandler",
            "filename": "/var/log/superset/audit.log",
            "maxBytes": 10485760,  # 10MB
            "backupCount": 10,
        },
    },
    "loggers": {
        "superset.security": {
            "handlers": ["audit"],
            "level": "INFO",
        },
    },
}

Forward logs to a centralised logging system (ELK, Splunk, CloudWatch) for retention and analysis.

Monitoring and Observability

You can’t operate what you can’t see. Comprehensive monitoring is essential for maintaining Superset at scale.

Prometheus Metrics

Deploy Prometheus to scrape metrics from Superset, Celery, and Kubernetes:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring --create-namespace

Configure Superset to expose Prometheus metrics:

env:
  - name: SUPERSET_ENABLE_PROMETHEUS_METRICS
    value: "true"

Create a ServiceMonitor to scrape Superset metrics:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: superset-monitor
  namespace: superset
spec:
  selector:
    matchLabels:
      app: superset
  endpoints:
  - port: metrics
    interval: 30s

Grafana Dashboards

Create Grafana dashboards to visualise key metrics:

helm install grafana grafana/grafana -n monitoring
kubectl port-forward -n monitoring svc/grafana 3000:80

Key metrics to dashboard:

  • Pod Health: CPU, memory, restart count
  • Request Latency: p50, p95, p99 response times
  • Query Performance: Query count, average execution time, cache hit rate
  • Celery Tasks: Queue depth, task success rate, average execution time
  • Database Connections: Active connections, connection pool utilisation
  • Redis: Memory usage, hit rate, eviction rate

Alerting Rules

Define alert rules to notify on-call engineers of issues:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: superset-alerts
  namespace: superset
spec:
  groups:
  - name: superset.rules
    interval: 30s
    rules:
    - alert: SupersetHighErrorRate
      expr: rate(superset_request_errors_total[5m]) > 0.05
      for: 5m
      annotations:
        summary: "High error rate on Superset"
    - alert: CeleryQueueDepthHigh
      expr: superset_celery_queue_depth > 1000
      for: 10m
      annotations:
        summary: "Celery queue depth exceeds threshold"
    - alert: RedisMemoryHigh
      expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.9
      for: 5m
      annotations:
        summary: "Redis memory usage above 90%"

Route alerts to PagerDuty, Slack, or your incident management system.

Distributed Tracing

For complex request flows spanning multiple services, implement distributed tracing with Jaeger:

helm install jaeger jaegertracing/jaeger -n monitoring

Configure Superset to send traces:

from jaeger_client import Config

config = Config(
    config={
        "sampler": {"type": "const", "param": 1},
        "logging": True,
    },
    service_name="superset",
)
jaeger_tracer = config.initialize_tracer()

Traces help identify performance bottlenecks in query execution, API calls, and background task processing.

Troubleshooting and Optimisation

Even with careful planning, issues arise in production. Here’s how to diagnose and resolve common problems.

Slow Query Execution

Symptom: Dashboards load slowly, users report timeouts.

Diagnosis:

  1. Check Celery worker logs:
kubectl logs -n superset -l app=superset,component=worker --tail=100
  1. Enable slow query logging in PostgreSQL:
ALTER SYSTEM SET log_min_duration_statement = 1000;  -- Log queries >1 second
SELECT pg_reload_conf();
  1. Check Redis cache hit rate:
kubectl exec -it superset-redis-master-0 -n superset -- redis-cli INFO stats

Solutions:

  • Add database indexes on frequently filtered columns
  • Increase Celery worker replicas to process queries in parallel
  • Increase Redis cache TTL for stable datasets
  • Pre-aggregate data in a data warehouse (Snowflake, BigQuery, Redshift)

Memory Leaks

Symptom: Pod memory usage grows over time, pods restart frequently.

Diagnosis:

kubectl top pods -n superset
kubectl describe pod superset-0 -n superset | grep -A 5 "Last State"

Solutions:

  • Update Superset to the latest version (memory leaks are often fixed in patches)
  • Reduce resource requests to trigger earlier pod restarts (forces garbage collection)
  • Implement pod lifecycle hooks to periodically restart pods:
lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 15"]

High Database Connection Count

Symptom: Database connection pool exhausted, new queries fail.

Diagnosis:

SELECT datname, count(*) FROM pg_stat_activity GROUP BY datname;
SHOW max_connections;

Solutions:

  • Increase database max_connections parameter
  • Reduce Superset pod replicas (fewer pods = fewer connections)
  • Implement connection pooling with PgBouncer:
helm install pgbouncer pgbouncer/pgbouncer -n superset

Celery Tasks Not Executing

Symptom: Scheduled reports don’t run, background tasks queue up indefinitely.

Diagnosis:

kubectl logs -n superset -l app=superset,component=beat
kubectl logs -n superset -l app=superset,component=worker

Check Redis queue:

kubectl exec -it superset-redis-master-0 -n superset -- redis-cli
> LLEN celery
> LRANGE celery 0 -1

Solutions:

  • Restart Celery Beat pod to reload task schedule:
kubectl rollout restart deployment/superset-celery-beat -n superset
  • Increase Celery worker replicas
  • Check worker logs for task exceptions
  • Verify Redis connectivity from worker pods

Network Connectivity Issues

Symptom: Pods can’t reach external databases or services.

Diagnosis:

kubectl exec -it superset-0 -n superset -- nc -zv postgres.example.com 5432
kubectl exec -it superset-0 -n superset -- nc -zv superset-redis 6379

Solutions:

  • Check network policies aren’t blocking traffic:
kubectl get networkpolicies -n superset
kubectl describe networkpolicy superset-network-policy -n superset
  • Verify DNS resolution:
kubectl exec -it superset-0 -n superset -- nslookup postgres.example.com
  • Check security group / firewall rules if databases are external

Next Steps and Scaling Strategy

Once you’ve deployed the D23.io reference architecture and validated it at mid-market scale, the next phase is optimising for your specific workloads and planning for future growth.

Immediate Optimisations

  1. Baseline Your Metrics: Establish performance baselines (query latency, cache hit rate, database load) against which to measure improvements.

  2. Implement Cost Monitoring: Use Kubernetes cost analysis tools (Kubecost, CloudZero) to understand spending by component. Optimise resource requests to match actual usage.

  3. Establish On-Call Procedures: Document runbooks for common issues. Assign on-call rotation to ensure 24/7 coverage for critical incidents.

  4. Automate Backups: Implement automated daily backups of PostgreSQL and configuration:

kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: CronJob
metadata:
  name: superset-backup
  namespace: superset
spec:
  schedule: "0 2 * * *"  # 2 AM daily
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: postgres:latest
            command:
            - /bin/sh
            - -c
            - pg_dump -h postgres.example.com -U superset superset | gzip > /backup/superset-$(date +%Y%m%d).sql.gz
EOF

Medium-Term Scaling (3-6 months)

  1. Multi-Region Deployment: For global teams, deploy Superset across multiple regions with read replicas:
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values:
          - superset
      topologyKey: topology.kubernetes.io/zone
  1. Advanced Caching: Implement multi-layer caching (Redis + CDN) for dashboards accessed by many users.

  2. Data Warehouse Integration: Move heavy aggregations to a dedicated data warehouse (Snowflake, BigQuery, Redshift) to reduce database load.

  3. Superset Plugins: Develop custom visualisations and data sources tailored to your domain.

Long-Term Architecture Evolution (6+ months)

  1. Kubernetes Operator: As your deployment matures, consider building a custom Kubernetes Operator for Superset to automate upgrades, scaling, and disaster recovery. The CNCF has documented progress on a native Superset Kubernetes Operator, which may become available.

  2. Federated Deployments: Deploy Superset instances per business unit or geography, with a central metadata repository for consistency.

  3. Agentic AI Integration: Combine Superset with agentic AI systems that automatically generate insights, alerts, and recommendations. This is particularly relevant for organisations exploring AI strategy and readiness as part of broader digital transformation.

Partnering for Expertise

The D23.io reference architecture is battle-tested, but implementing it requires deep Kubernetes expertise, database optimisation knowledge, and understanding of analytics workflows. For organisations in Sydney and Australia seeking fractional CTO leadership and hands-on engineering support, PADISO’s venture studio and CTO as a Service model provides the expertise to ship production-grade analytics infrastructure without hiring full-time specialists.

Teams implementing platform engineering and custom software development projects often pair Superset with agentic AI systems, creating self-serve analytics platforms that scale to enterprise complexity. Whether you’re a seed-stage startup building your first analytics stack or a mid-market company modernising legacy BI infrastructure, the principles in this guide apply.

Testing at Scale

Before declaring your deployment production-ready, stress-test it against realistic scenarios. Deploy Apache Superset on Amazon EKS using the patterns outlined in AWS documentation if you’re on AWS, or follow equivalent guides for Azure AKS or Google GKE.

The reference architecture has been validated at mid-market scale (hundreds of concurrent users, thousands of dashboards, millions of data points). Your specific workload may have unique characteristics—test thoroughly before going live.

Conclusion

The D23.io reference architecture for Apache Superset on Kubernetes represents a mature, production-grade approach to deploying analytics infrastructure at scale. By combining Helm-based deployment, autoscaling workers, Redis caching, and Celery Beat scheduling, you create a resilient, cost-efficient platform that grows with demand.

Key takeaways:

  • Kubernetes provides the orchestration and elasticity that modern analytics platforms require
  • Helm charts abstract complexity whilst maintaining flexibility for customisation
  • Autoscaling ensures your platform grows with query volume without manual intervention
  • Redis caching is transformative for query performance, reducing database load by 60-80%
  • Celery workers decouple user-facing requests from background processing, preventing slowdowns
  • Security and compliance must be built in, not retrofitted
  • Comprehensive monitoring provides visibility into system health and performance

Implementing this architecture requires expertise in Kubernetes, PostgreSQL, Redis, and analytics platforms. For organisations seeking hands-on support, PADISO’s platform engineering and custom software development services provide fractional CTO leadership and co-build partnerships to ship production-grade infrastructure.

Start with the Helm chart deployment, validate at scale with load testing, then progressively implement autoscaling, caching, and monitoring. Once you’ve mastered the fundamentals, evolve toward advanced patterns like multi-region deployment and agentic AI integration.

Your analytics infrastructure should enable self-serve insights, not become a bottleneck. The D23.io reference architecture removes that constraint.