Guide 24 mins

Apache Superset on Helm Chart: Reference Deployment Pattern

Step-by-step Superset on Helm deployment guide covering networking, storage, secrets, autoscaling, and operational habits for production readiness.

The PADISO Team ·2026-06-09

Why Helm for Superset
Pre-deployment Architecture Decisions
Cluster Prerequisites and Namespace Setup
Helm Chart Installation and Values Override
Networking and Service Exposure
Persistent Storage Configuration
Secrets Management and Environment Variables
Database Backend and Metadata Store
Autoscaling and Resource Management
Production Operational Habits
Monitoring, Logging, and Observability
Troubleshooting Common Deployment Issues
Next Steps and Ongoing Support

Why Helm for Superset

Apache Superset is a modern, open-source data visualisation and business intelligence platform that powers analytics across thousands of organisations globally. When deployed on Kubernetes, Superset becomes a scalable, multi-tenant analytics engine capable of serving hundreds of concurrent users and queries. However, deploying Superset directly using raw Kubernetes manifests introduces complexity: you must manually orchestrate ConfigMaps, Secrets, StatefulSets, Services, and Persistent Volumes whilst managing interdependencies between the web application, worker processes, and backing databases.

Helm, the Kubernetes package manager, solves this by providing a templated, versioned deployment pattern. The official Apache Superset Helm chart abstracts away boilerplate, enforces best practices, and allows you to customise your deployment through a single values.yaml file rather than maintaining dozens of individual manifests.

For teams at PADISO working with startups and enterprises modernising their data platforms, Helm-based Superset deployments are the standard. They reduce time-to-production, improve reproducibility, and align with platform engineering disciplines that underpin SOC 2 and ISO 27001 compliance.

This guide walks through a production-ready Superset deployment on Helm, covering the decisions, configurations, and operational habits that keep analytics infrastructure healthy at scale. Whether you’re embedding Superset into a multi-tenant SaaS, building a standalone BI platform, or replacing per-seat BI tools, this reference pattern applies across financial services, insurance, retail, and government sectors.

Pre-deployment Architecture Decisions

Before installing Superset on Helm, you must make four foundational architecture decisions that shape your entire deployment.

Single-Tenant vs. Multi-Tenant Superset

Single-tenant Superset runs one isolated instance per customer or business unit. Multi-tenant Superset runs a single shared instance where databases, dashboards, and users are logically isolated via Superset’s RBAC and row-level security (RLS) features.

Single-tenant deployments are simpler operationally but expensive at scale: each instance consumes compute, storage, and database resources. Multi-tenant deployments are cost-efficient but require careful configuration of Superset’s authentication (LDAP, OAuth2, SAML), database connections, and RLS rules to prevent data leakage.

For platform development in Sydney and across Australia’s financial services and retail sectors, we typically recommend multi-tenant deployments with strong isolation guarantees. This reduces operational overhead and aligns with modern SaaS architecture.

Metadata Database: PostgreSQL or MySQL

Superset stores dashboards, users, permissions, and query metadata in a relational database called the metadata store. The Helm chart defaults to PostgreSQL, which is the recommended choice for production. PostgreSQL offers superior performance, better support for Superset’s ORM queries, and more robust backup and recovery tooling.

MySQL is supported but introduces subtle compatibility issues with Superset’s SQLAlchemy ORM, particularly around transaction isolation and JSON column handling. For production deployments, always use PostgreSQL.

Your metadata database must be highly available. For AWS deployments, use Amazon RDS with Multi-AZ failover. For on-premises or Kubernetes-native setups, consider a managed PostgreSQL service or a StatefulSet-based PostgreSQL operator like CloudNativePG.

Message Queue: Redis or RabbitMQ

Superset uses a message queue to distribute asynchronous tasks: query execution, cache warming, and report generation. The Helm chart supports both Redis and RabbitMQ, but Redis is the de facto standard because it is simpler, faster, and requires less operational overhead.

Redis must be highly available. Use a managed Redis service (AWS ElastiCache, Azure Cache for Redis) or a Kubernetes-native Redis operator. Single-node Redis in production is a single point of failure and will cause cascading outages when it restarts.

Results Backend: Redis, S3, or Filesystem

Superset caches query results in a results backend to avoid re-executing identical queries. For distributed deployments, the results backend must be shared across all Superset pods. Redis is the simplest choice, but for large result sets or long-term archival, S3-compatible object storage (AWS S3, MinIO, DigitalOcean Spaces) is preferred.

Filesystem-based results backends are only acceptable for single-node or development deployments. In Kubernetes, pods are ephemeral; storing results on local disk guarantees data loss when pods restart.

Cluster Prerequisites and Namespace Setup

Before deploying Superset, ensure your Kubernetes cluster meets baseline requirements.

Cluster Sizing

Superset is not lightweight. A production deployment requires:

Control plane: Standard Kubernetes control plane (AWS EKS, Azure AKS, Google GKE all provide this).
Worker nodes: At least 3 nodes with 4 CPU and 8 GB RAM per node. For high-concurrency deployments (100+ simultaneous users), scale to 6+ nodes with 8 CPU and 16 GB RAM each.
Persistent storage: A StorageClass that supports ReadWriteOnce (RWO) volumes for the metadata database and ReadWriteMany (RWX) for shared caches if using NFS.
Network policies: Ingress controller for external access, network policies for pod-to-pod communication.

Namespace and RBAC

Create a dedicated namespace for Superset to isolate it from other workloads and simplify RBAC:

kubectl create namespace superset
kubectl label namespace superset environment=production

Create a ServiceAccount with minimal permissions:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: superset
  namespace: superset
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: superset
  namespace: superset
rules:
- apiGroups: [""]
  resources: ["configmaps", "secrets"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: superset
  namespace: superset
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: superset
subjects:
- kind: ServiceAccount
  name: superset
  namespace: superset

This ServiceAccount can read ConfigMaps and Secrets (needed for Superset to mount configuration) but cannot modify cluster resources.

StorageClass Configuration

Verify that your cluster has a default StorageClass:

kubectl get storageclass

If none exists, create one. For AWS EKS:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-gp3
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
allowVolumeExpansion: true

For platform development across Canada and the United States, cloud-native storage classes (EBS, Azure Disk, GCE Persistent Disk) are standard. For on-premises deployments, use a CSI driver that integrates with your storage array (NetApp Trident, Pure FlashBlade, Portworx).

Helm Chart Installation and Values Override

The Apache Superset Helm chart is maintained in the official Apache Superset repository. You can also find it on Artifact Hub and Docker Hub.

Adding the Helm Repository

helm repo add superset https://apache.github.io/superset
helm repo update

Verify the chart is available:

helm search repo superset

You should see output like:

NAME            CHART VERSION   APP VERSION
superset/superset   0.X.X         2.1.X

Creating a Custom values.yaml

The canonical values.yaml file is extensive. Rather than modifying it directly, create a custom override file that contains only the settings you need to change:

# custom-values.yaml

image:
  repository: apache/superset
  tag: "2.1.3"
  pullPolicy: IfNotPresent

replicaCount: 3

resources:
  limits:
    cpu: 2
    memory: 4Gi
  requests:
    cpu: 1
    memory: 2Gi

postgresql:
  enabled: false  # Use external PostgreSQL

externalDatabase:
  type: postgresql
  host: superset-metadata.c9akciq32.ng.0001.use1.cache.amazonaws.com
  port: 5432
  user: superset
  password: "${SUPERSET_DB_PASSWORD}"  # Injected via Secret
  database: superset

redis:
  enabled: false  # Use external Redis

externalRedis:
  host: superset-cache.c9akciq32.ng.0001.use1.cache.amazonaws.com
  port: 6379
  password: "${SUPERSET_REDIS_PASSWORD}"

ingress:
  enabled: true
  ingressClassName: nginx
  hosts:
    - host: superset.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: superset-tls
      hosts:
        - superset.example.com

env:
  SUPERSET_ENV: production
  SUPERSET_LOAD_EXAMPLES: "false"
  SUPERSET_SECRET_KEY: "${SUPERSET_SECRET_KEY}"  # Injected via Secret

This configuration:

Uses external PostgreSQL and Redis (managed services or separate Kubernetes deployments).
Sets 3 Superset web replicas for high availability.
Allocates 1–2 CPU and 2–4 GB RAM per pod (adjust based on your workload).
Configures HTTPS ingress with a TLS certificate.
Disables example data to reduce initial setup time.

Installing the Chart

Before installing, create a Secret containing sensitive values:

kubectl create secret generic superset-secrets \
  --from-literal=db-password='your-secure-password' \
  --from-literal=redis-password='your-redis-password' \
  --from-literal=secret-key='your-secret-key' \
  -n superset

Then install the chart:

helm install superset superset/superset \
  -f custom-values.yaml \
  --namespace superset \
  --create-namespace

Monitor the rollout:

kubectl rollout status deployment/superset -n superset
kubectl get pods -n superset

Once all pods are running, verify the Superset web UI is accessible via your ingress hostname.

Networking and Service Exposure

Superset must be accessible to end users and to internal services (worker pods, monitoring agents). The Helm chart exposes Superset via a Kubernetes Service and an Ingress resource.

Service Type Configuration

The chart creates a ClusterIP Service by default, which is correct for most deployments. The Kubernetes documentation on Services explains the trade-offs:

ClusterIP: Accessible only within the cluster. Use an Ingress controller to expose Superset to the internet.
NodePort: Exposes Superset on a high-numbered port on each node. Suitable for development, not production.
LoadBalancer: Provisions a cloud load balancer (AWS ELB, Azure LB, GCP LB). Use this if you do not have an Ingress controller.

For production, ClusterIP + Ingress is the standard pattern:

service:
  type: ClusterIP
  port: 8088
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8088"
    prometheus.io/path: "/metrics"

Ingress Configuration for HTTPS

Expose Superset over HTTPS with a TLS certificate:

ingress:
  enabled: true
  ingressClassName: nginx  # Or "alb" for AWS ALB, "gce" for GCP
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
  hosts:
    - host: superset.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: superset-tls
      hosts:
        - superset.example.com

If using cert-manager for automatic TLS certificate provisioning:

helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace

Then create a ClusterIssuer:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: your-email@example.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx

Network Policies for Pod-to-Pod Communication

In a production cluster, implement network policies to restrict traffic between pods:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: superset-allow-internal
  namespace: superset
spec:
  podSelector:
    matchLabels:
      app: superset
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: superset
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8088
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: superset
    ports:
    - protocol: TCP
      port: 6379  # Redis
    - protocol: TCP
      port: 5432  # PostgreSQL
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 53  # DNS
    - protocol: UDP
      port: 53

This policy allows ingress traffic from the NGINX Ingress controller and permits egress to Redis, PostgreSQL, and DNS services.

Persistent Storage Configuration

Superset requires persistent storage for user uploads (CSV files, database drivers), cache data, and metadata. The Helm chart manages this via PersistentVolumeClaims (PVCs).

ConfigMap and Secret Volumes

Superset reads configuration from environment variables and mounted files. Store sensitive configuration in Kubernetes Secrets:

apiVersion: v1
kind: Secret
metadata:
  name: superset-config
  namespace: superset
type: Opaque
stringData:
  superset_config.py: |
    import os
    
    SECRET_KEY = os.environ.get('SUPERSET_SECRET_KEY')
    SQLALCHEMY_DATABASE_URI = os.environ.get('DATABASE_URL')
    REDIS_HOST = os.environ.get('REDIS_HOST')
    REDIS_PORT = int(os.environ.get('REDIS_PORT', 6379))
    REDIS_PASSWORD = os.environ.get('REDIS_PASSWORD')
    
    CACHE_CONFIG = {
        'CACHE_TYPE': 'RedisCache',
        'CACHE_REDIS_URL': f'redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}/0',
        'CACHE_DEFAULT_TIMEOUT': 300,
    }
    
    RESULTS_BACKEND_USE_MSGPACK = True
    RESULTS_BACKEND = {
        'uri': f'redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}/1',
        'key_prefix': 'superset_results',
    }

Mount this Secret as a volume in the Superset deployment:

volumes:
  - name: superset-config
    secret:
      secretName: superset-config
      items:
        - key: superset_config.py
          path: superset_config.py

volumeMounts:
  - name: superset-config
    mountPath: /app/superset_config.py
    subPath: superset_config.py

PersistentVolume for Uploads and Cache

Superset stores user-uploaded files in /app/superset_home. For multi-pod deployments, this directory must be shared across all pods via a ReadWriteMany (RWX) PersistentVolume:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: superset-home
  namespace: superset
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: nfs  # Or your RWX storage class
  resources:
    requests:
      storage: 10Gi

Then mount it in the Superset pods:

volumeMounts:
  - name: superset-home
    mountPath: /app/superset_home

volumes:
  - name: superset-home
    persistentVolumeClaim:
      claimName: superset-home

For AWS EKS, use EFS (Elastic File System) with the EFS CSI driver. For Azure AKS, use Azure Files. For GKE, use Filestore.

Database Backup and Recovery

Your PostgreSQL metadata database is the source of truth for all Superset configuration. Implement automated backups:

AWS RDS: Enable automated backups with a 30-day retention window and cross-region snapshots.
Azure Database for PostgreSQL: Enable geo-redundant backups.
On-premises: Use pg_dump in a CronJob to back up the database to S3 or object storage daily.

Example CronJob for PostgreSQL backup:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: superset-db-backup
  namespace: superset
spec:
  schedule: "0 2 * * *"  # 2 AM daily
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: superset
          containers:
          - name: backup
            image: postgres:15
            command:
            - /bin/bash
            - -c
            - |
              pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME | \
              aws s3 cp - s3://superset-backups/$(date +%Y%m%d-%H%M%S).sql.gz --sse AES256
            env:
            - name: DB_HOST
              value: superset-metadata.c9akciq32.ng.0001.use1.cache.amazonaws.com
            - name: DB_USER
              valueFrom:
                secretKeyRef:
                  name: superset-secrets
                  key: db-user
            - name: DB_NAME
              value: superset
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: superset-secrets
                  key: db-password
          restartPolicy: OnFailure

Secrets Management and Environment Variables

Superset requires sensitive configuration: database passwords, Redis credentials, API keys, and encryption keys. Never commit these to version control.

Using Kubernetes Secrets

Store all sensitive data in Kubernetes Secrets:

kubectl create secret generic superset-secrets \
  --from-literal=DATABASE_URL='postgresql://superset:password@db-host:5432/superset' \
  --from-literal=REDIS_URL='redis://:password@redis-host:6379/0' \
  --from-literal=SUPERSET_SECRET_KEY='your-256-bit-secret-key' \
  --from-literal=MAPBOX_API_KEY='pk.xxxxx' \
  -n superset

Inject these into the Superset deployment:

env:
  - name: DATABASE_URL
    valueFrom:
      secretKeyRef:
        name: superset-secrets
        key: DATABASE_URL
  - name: REDIS_URL
    valueFrom:
      secretKeyRef:
        name: superset-secrets
        key: REDIS_URL
  - name: SUPERSET_SECRET_KEY
    valueFrom:
      secretKeyRef:
        name: superset-secrets
        key: SUPERSET_SECRET_KEY
  - name: MAPBOX_API_KEY
    valueFrom:
      secretKeyRef:
        name: superset-secrets
        key: MAPBOX_API_KEY

Rotating Secrets

Secrets should be rotated periodically (every 90 days for passwords, immediately if compromised). Use a secret management tool like HashiCorp Vault or AWS Secrets Manager to automate rotation:

# Update a secret in Kubernetes
kubectl patch secret superset-secrets -p \
  '{"data":{"DATABASE_URL":"'$(echo -n 'new-password' | base64)'"}}' \
  -n superset

# Restart Superset pods to pick up the new secret
kubectl rollout restart deployment/superset -n superset

External Secret Management with Sealed Secrets

For GitOps workflows, encrypt secrets using Sealed Secrets:

helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
helm install sealed-secrets sealed-secrets/sealed-secrets -n kube-system

Then create and seal a secret:

echo -n 'my-password' | kubectl create secret generic superset-secrets \
  --dry-run=client --from-file=db-password=/dev/stdin \
  -o yaml | kubeseal -o yaml > sealed-secret.yaml

kubectl apply -f sealed-secret.yaml

Now the sealed secret can be safely committed to Git and automatically decrypted by the Sealed Secrets controller.

Database Backend and Metadata Store

Superset’s metadata store is the backbone of the system. It must be highly available, backed up regularly, and monitored for performance.

PostgreSQL Configuration

For production, use a managed PostgreSQL service:

AWS RDS: Multi-AZ deployment with automated failover, automated backups, and read replicas.
Azure Database for PostgreSQL: Flexible Server with zone-redundant high availability.
Google Cloud SQL: High-availability configuration with automatic failover.
On-premises: PostgreSQL 13+ with streaming replication and automated failover via Patroni or etcd.

Configure Superset to connect to PostgreSQL:

externalDatabase:
  type: postgresql
  host: superset-metadata.c9akciq32.ng.0001.use1.cache.amazonaws.com
  port: 5432
  user: superset
  password: "${SUPERSET_DB_PASSWORD}"
  database: superset
  sslMode: require
  sslRootCertificate: /etc/ssl/certs/ca-bundle.crt

Enable SSL/TLS for database connections to encrypt credentials in transit.

Database Initialisation

Before starting Superset, initialise the metadata database:

kubectl exec -it deployment/superset -n superset -- \
  superset db upgrade

This creates all required tables and indexes. For production deployments, run this once during initial setup, not on every pod restart.

Connection Pooling

Superset uses SQLAlchemy to connect to PostgreSQL. Configure connection pooling to avoid exhausting database connections:

env:
  SQLALCHEMY_POOL_SIZE: "20"
  SQLALCHEMY_MAX_OVERFLOW: "10"
  SQLALCHEMY_POOL_RECYCLE: "3600"
  SQLALCHEMY_POOL_PRE_PING: "true"

These settings:

Maintain a pool of 20 idle connections.
Allow up to 10 additional connections under load.
Recycle connections after 1 hour to prevent stale connections.
Test connections before reusing them (pre-ping).

Monitoring Database Performance

Monitor PostgreSQL for slow queries, connection count, and disk usage:

-- Slow queries
SELECT query, mean_exec_time, calls FROM pg_stat_statements 
ORDER BY mean_exec_time DESC LIMIT 10;

-- Connection count
SELECT datname, count(*) FROM pg_stat_activity GROUP BY datname;

-- Disk usage
SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) 
FROM pg_tables WHERE schemaname NOT IN ('pg_catalog', 'information_schema') 
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

Set up alerts in your monitoring system (Prometheus, DataDog, New Relic) for:

Connections > 80% of max_connections.
Query duration > 5 seconds.
Disk usage > 80% of available space.

Autoscaling and Resource Management

Superset workloads are unpredictable: query load spikes during reporting periods, cache misses trigger expensive database queries, and user onboarding can double concurrent users overnight. Autoscaling ensures the system remains responsive without over-provisioning.

Horizontal Pod Autoscaler (HPA)

Autoscale Superset web pods based on CPU and memory usage:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: superset-hpa
  namespace: superset
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: superset
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30

This HPA:

Maintains 3–10 Superset pods.
Scales up aggressively (double the replicas every 30 seconds) when CPU exceeds 70% or memory exceeds 80%.
Scales down conservatively (reduce by 50% every 60 seconds) after a 5-minute stability window.

Worker Pod Autoscaling

Superset workers execute asynchronous tasks (query caching, report generation). They also need autoscaling:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: superset-worker-hpa
  namespace: superset
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: superset-worker
  minReplicas: 2
  maxReplicas: 8
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 75

Vertical Pod Autoscaler (VPA)

VPA automatically adjusts CPU and memory requests based on actual usage. This is useful for right-sizing resource requests:

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa --namespace kube-system

Then create a VPA recommendation:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: superset-vpa
  namespace: superset
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: superset
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: "superset"
      minAllowed:
        cpu: 100m
        memory: 256Mi
      maxAllowed:
        cpu: 4
        memory: 8Gi

VPA will recommend resource adjustments and automatically restart pods with updated resource requests.

Node Autoscaling

Ensure your cluster can scale worker nodes to accommodate pod scaling. Enable cluster autoscaler:

AWS EKS:

helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
  --namespace kube-system \
  --set autoDiscovery.clusterName=my-cluster \
  --set awsRegion=us-east-1

Azure AKS:

az aks update --enable-cluster-autoscaler \
  --min-count 3 --max-count 10 \
  --resource-group my-rg --name my-cluster

Production Operational Habits

Deploying Superset is the beginning, not the end. Production systems require discipline, monitoring, and incident response.

Daily Health Checks

Every morning, verify:

Pod status: kubectl get pods -n superset — all pods should be Running.
Ingress accessibility: Curl the Superset URL and verify the login page loads.
Database connectivity: Run a test query via the Superset UI.
Redis health: Check Redis is reachable and not evicting keys.
Disk usage: Verify persistent volumes are not near capacity.

Autommate these checks with a Kubernetes CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: superset-health-check
  namespace: superset
spec:
  schedule: "0 8 * * *"  # 8 AM daily
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: superset
          containers:
          - name: health-check
            image: curlimages/curl:latest
            command:
            - /bin/sh
            - -c
            - |
              curl -f https://superset.example.com/api/v1/health || exit 1
              echo "Superset health check passed"
          restartPolicy: OnFailure

Update and Patch Management

Superset releases updates monthly. Plan updates for low-traffic periods:

Test in staging: Deploy the new Superset version to a staging cluster first.
Run database migrations: superset db upgrade may add new tables or columns.
Drain connections: Use pod disruption budgets to prevent connection loss:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: superset-pdb
  namespace: superset
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: superset

Perform a rolling update: Helm will automatically perform a rolling update:

helm upgrade superset superset/superset \
  -f custom-values.yaml \
  --namespace superset \
  --wait

Verify the update: Check that all pods are running and the UI is responsive.

Capacity Planning

Track usage over time and plan for growth:

Concurrent users: Monitor active sessions via Superset’s admin panel.
Query volume: Track queries per minute and average query duration.
Data volume: Monitor the size of cached results and metadata database.
Storage growth: Project when persistent volumes will reach 80% capacity.

For platform development in Canberra and San Francisco, we typically plan for 50% annual growth in query volume and data size.

Disaster Recovery

Prepare for failures:

Metadata database backup: Automate daily backups to S3 or object storage (see Database Backup and Recovery above).
Restore procedure: Document how to restore from backup and test monthly.
Disaster recovery drill: Simulate a complete Superset failure (delete the namespace) and verify you can restore within 1 hour.
RTO and RPO targets: Define Recovery Time Objective (RTO, e.g., 4 hours) and Recovery Point Objective (RPO, e.g., 24 hours of data loss).

Example disaster recovery procedure:

# 1. Delete the failed Superset namespace
kubectl delete namespace superset

# 2. Restore the PostgreSQL database from backup
aws s3 cp s3://superset-backups/latest.sql.gz - | gunzip | psql -h new-db-host -U superset

# 3. Reinstall Superset
helm install superset superset/superset \
  -f custom-values.yaml \
  --namespace superset \
  --create-namespace

# 4. Verify the UI is accessible and dashboards are restored
curl https://superset.example.com

Monitoring, Logging, and Observability

You cannot operate what you cannot see. Implement comprehensive monitoring, logging, and tracing.

Prometheus Metrics

Superset exposes Prometheus metrics on /metrics. Scrape them with Prometheus:

apiVersion: v1
kind: ServiceMonitor
metadata:
  name: superset
  namespace: superset
spec:
  selector:
    matchLabels:
      app: superset
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

Key metrics to alert on:

superset_request_duration_seconds: Histogram of request latency.
superset_database_query_duration_seconds: Time spent executing database queries.
superset_cache_hits: Count of cache hits and misses.
process_resident_memory_bytes: Memory usage per pod.
process_cpu_seconds_total: CPU usage per pod.

Create PrometheusRule alerts:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: superset-alerts
  namespace: superset
spec:
  groups:
  - name: superset
    interval: 30s
    rules:
    - alert: SupersetHighErrorRate
      expr: rate(superset_request_errors_total[5m]) > 0.05
      for: 5m
      annotations:
        summary: "Superset error rate > 5%"
    - alert: SupersetHighMemory
      expr: process_resident_memory_bytes > 3.5e9
      for: 5m
      annotations:
        summary: "Superset pod memory > 3.5 GB"
    - alert: SupersetSlowQueries
      expr: histogram_quantile(0.95, superset_database_query_duration_seconds) > 10
      for: 5m
      annotations:
        summary: "95th percentile query duration > 10 seconds"

Logging with ELK or Loki

Collect Superset logs in a centralised logging system:

Using Loki (lightweight, Kubernetes-native):

helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
  --namespace monitoring \
  --create-namespace

Configure Superset to log to stdout (the Helm chart does this by default), and Loki will scrape logs from pod output.

Using ELK (Elasticsearch, Logstash, Kibana):

helm repo add elastic https://helm.elastic.co
helm install elasticsearch elastic/elasticsearch --namespace elk --create-namespace
helm install kibana elastic/kibana --namespace elk

Then configure Logstash to parse Superset logs and ship them to Elasticsearch.

Distributed Tracing with Jaeger

Trace requests across Superset, Redis, and PostgreSQL:

helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm install jaeger jaegertracing/jaeger --namespace monitoring

Enable tracing in Superset:

env:
  SUPERSET_TRACING_ENABLED: "true"
  SUPERSET_TRACING_JAEGER_HOST: jaeger-agent.monitoring.svc.cluster.local
  SUPERSET_TRACING_JAEGER_PORT: "6831"

Then query Jaeger UI to see end-to-end latency breakdowns.

Troubleshooting Common Deployment Issues

Even well-architected deployments encounter issues. Here are the most common problems and solutions.

Pod Stuck in CrashLoopBackOff

Symptom: Superset pods restart repeatedly.

Diagnosis:

kubectl logs -f deployment/superset -n superset
kubectl describe pod <pod-name> -n superset

Common causes and fixes:

Database connection failed: Verify DATABASE_URL is correct and PostgreSQL is reachable.

kubectl exec -it deployment/superset -n superset -- \
  psql $DATABASE_URL -c "SELECT 1"

Redis unavailable: Verify Redis is running and password is correct.

kubectl exec -it deployment/superset -n superset -- \
  redis-cli -h $REDIS_HOST -p $REDIS_PORT ping

Insufficient memory: Increase memory request/limit in values.yaml.
Secret missing: Verify all required Secrets exist:
```
kubectl get secrets -n superset
```

Slow Queries and Timeouts

Symptom: Dashboards load slowly or time out.

Diagnosis:

Check Superset logs for slow query warnings:

kubectl logs deployment/superset -n superset | grep "slow"

Query PostgreSQL directly to identify slow queries:

SELECT query, mean_exec_time, calls FROM pg_stat_statements 
ORDER BY mean_exec_time DESC LIMIT 5;

Check Redis for evictions:

kubectl exec -it <redis-pod> -- redis-cli INFO stats | grep evicted

Fixes:

Add database indexes: Superset’s ORM queries often benefit from indexes on dashboard, datasource, and user tables.
Increase query timeout: Set SUPERSET_SQLLAB_TIMEOUT = 300 (seconds).
Enable query caching: Ensure Redis is configured and cache keys are not being evicted.
Reduce query scope: Encourage users to filter by date range or use materialized views.

Out of Memory (OOM) Kills

Symptom: Superset pods are killed with OOMKilled status.

Diagnosis:

kubectl describe pod <pod-name> -n superset | grep OOMKilled
kubectl top pods -n superset  # Current memory usage

Fixes:

Increase memory limit: Update resources.limits.memory in values.yaml.
Reduce result set size: Add SUPERSET_SQLLAB_ROW_LIMIT = 10000 to prevent loading massive result sets into memory.
Enable result backend pagination: Configure Superset to stream large results rather than buffering them.
Use VPA: Let VPA recommend appropriate memory limits based on actual usage.

Persistent Volume Claims Stuck in Pending

Symptom: PVCs are not being provisioned.

Diagnosis:

kubectl describe pvc superset-home -n superset
kubectl get storageclass

Fixes:

Verify StorageClass exists: Create one if missing (see Cluster Prerequisites section).
Check node capacity: Ensure nodes have available disk space.
Verify CSI driver: For EFS, EBS, or Azure Files, ensure the CSI driver is installed.

Ingress Not Accessible

Symptom: Cannot reach Superset via the ingress hostname.

Diagnosis:

kubectl get ingress -n superset
kubectl describe ingress superset -n superset
kubectl logs -n ingress-nginx deployment/nginx-ingress-controller

Fixes:

Verify DNS resolution: nslookup superset.example.com should resolve to the ingress IP.
Check TLS certificate: Verify the TLS secret exists and is valid.
Verify backend service: Ensure the Service is pointing to running pods.
```
kubectl get endpoints superset -n superset
```

Next Steps and Ongoing Support

You now have a production-ready Superset deployment on Helm. The next phase is optimisation and integration.

Integration with Data Platforms

Superset is most powerful when connected to robust data warehouses. Consider integrating with:

ClickHouse: For time-series analytics at scale. Superset has native ClickHouse support.
Snowflake: For cloud-native data warehousing with unlimited scalability.
BigQuery: For serverless analytics on Google Cloud.
Redshift: For AWS-native data warehousing.
Postgres: For self-hosted analytics (as in this deployment).

For teams at PADISO, we often recommend a Superset + ClickHouse architecture for cost-efficient analytics at scale. This combination replaces per-seat BI tools and reduces infrastructure costs by 60–80%.

Security Hardening

Before going to production, implement additional security measures:

Enable RBAC: Configure Superset’s role-based access control to restrict dashboard and datasource access by user role.
Enable row-level security (RLS): Prevent users from seeing data outside their scope.
Audit logging: Enable audit logs to track who accessed which dashboards and when.
API authentication: Require API keys for programmatic access.
SOC 2 / ISO 27001 compliance: If required, implement audit-readiness via Vanta or similar tools.

For teams pursuing compliance, PADISO offers a fixed-fee AI Quickstart Audit (AU$10K, 2 weeks) to assess your current state and define a roadmap to SOC 2 or ISO 27001 certification.

Scaling Beyond a Single Instance

As your analytics workload grows, consider:

Multi-region deployments: Deploy Superset in multiple regions for low-latency access and disaster recovery.
Federated query execution: Use Superset’s query federation to query across multiple data sources in a single dashboard.
Embedding analytics: Embed Superset dashboards in your product using Superset’s embedded dashboard feature.
Custom plugins: Develop custom Superset plugins to add domain-specific visualisations.

For enterprises modernising their analytics stacks, PADISO provides platform engineering services across Australia, the United States, Canada, and New Zealand. We specialise in Superset + ClickHouse deployments that replace legacy BI tools and unlock real-time analytics.

Getting Help

If you encounter issues beyond this guide:

Apache Superset documentation: https://apache.github.io/superset/ — the authoritative reference.
Kubernetes documentation: https://kubernetes.io/docs/ — for cluster-level issues.
Helm documentation: https://helm.sh/docs/ — for chart management.
Community support: The Apache Superset Slack and GitHub discussions are active and helpful.
Professional support: PADISO offers fractional CTO and platform engineering services to teams deploying Superset at scale. Book a call to discuss your analytics architecture.

Summary

Apache Superset on Helm is a powerful, scalable analytics platform when deployed with discipline. This guide covered:

Architecture decisions: Single vs. multi-tenant, database choice, message queue, and results backend.
Cluster setup: Namespace, RBAC, and StorageClass configuration.
Helm installation: Custom values, chart installation, and rollout verification.
Networking: Service exposure, Ingress configuration, and network policies.
Storage: ConfigMaps, Secrets, PVCs, and backup strategies.
Secrets management: Kubernetes Secrets, rotation, and sealed secrets for GitOps.
Database backend: PostgreSQL configuration, connection pooling, and monitoring.
Autoscaling: HPA, VPA, and node autoscaling for dynamic workloads.
Operations: Health checks, updates, capacity planning, and disaster recovery.
Observability: Prometheus metrics, logging, and distributed tracing.
Troubleshooting: Common issues and solutions.

With this foundation, you can deploy Superset confidently, scale it reliably, and operate it securely. For teams seeking expert guidance, PADISO’s platform engineering services and CTO as a Service offerings provide the fractional leadership and co-build support to move fast and build right.

Want to talk through your situation?

Book a 30-minute call with Kevin (Founder/CEO). No pitch - direct advice on what to do next.

Book a 30-min call

Apache Superset on Helm Chart: Reference Deployment Pattern

Table of Contents

Why Helm for Superset

Pre-deployment Architecture Decisions

Single-Tenant vs. Multi-Tenant Superset

Metadata Database: PostgreSQL or MySQL

Message Queue: Redis or RabbitMQ

Results Backend: Redis, S3, or Filesystem

Cluster Prerequisites and Namespace Setup

Cluster Sizing

Namespace and RBAC

StorageClass Configuration

Helm Chart Installation and Values Override

Adding the Helm Repository

Creating a Custom values.yaml

Installing the Chart

Networking and Service Exposure

Service Type Configuration

Ingress Configuration for HTTPS

Network Policies for Pod-to-Pod Communication

Persistent Storage Configuration

ConfigMap and Secret Volumes

PersistentVolume for Uploads and Cache

Database Backup and Recovery

Secrets Management and Environment Variables

Using Kubernetes Secrets

Rotating Secrets

External Secret Management with Sealed Secrets

Database Backend and Metadata Store

PostgreSQL Configuration

Database Initialisation

Connection Pooling

Monitoring Database Performance

Autoscaling and Resource Management

Horizontal Pod Autoscaler (HPA)

Worker Pod Autoscaling

Vertical Pod Autoscaler (VPA)

Node Autoscaling

Production Operational Habits

Daily Health Checks

Update and Patch Management

Capacity Planning

Disaster Recovery

Monitoring, Logging, and Observability

Prometheus Metrics

Logging with ELK or Loki

Distributed Tracing with Jaeger

Troubleshooting Common Deployment Issues

Pod Stuck in CrashLoopBackOff

Slow Queries and Timeouts

Out of Memory (OOM) Kills

Persistent Volume Claims Stuck in Pending

Ingress Not Accessible

Next Steps and Ongoing Support

Integration with Data Platforms

Security Hardening

Scaling Beyond a Single Instance

Getting Help

Summary

Want to talk through your situation?