Table of Contents
- Why Helm for Superset
- Pre-deployment Architecture Decisions
- Cluster Prerequisites and Namespace Setup
- Helm Chart Installation and Values Override
- Networking and Service Exposure
- Persistent Storage Configuration
- Secrets Management and Environment Variables
- Database Backend and Metadata Store
- Autoscaling and Resource Management
- Production Operational Habits
- Monitoring, Logging, and Observability
- Troubleshooting Common Deployment Issues
- Next Steps and Ongoing Support
Why Helm for Superset
Apache Superset is a modern, open-source data visualisation and business intelligence platform that powers analytics across thousands of organisations globally. When deployed on Kubernetes, Superset becomes a scalable, multi-tenant analytics engine capable of serving hundreds of concurrent users and queries. However, deploying Superset directly using raw Kubernetes manifests introduces complexity: you must manually orchestrate ConfigMaps, Secrets, StatefulSets, Services, and Persistent Volumes whilst managing interdependencies between the web application, worker processes, and backing databases.
Helm, the Kubernetes package manager, solves this by providing a templated, versioned deployment pattern. The official Apache Superset Helm chart abstracts away boilerplate, enforces best practices, and allows you to customise your deployment through a single values.yaml file rather than maintaining dozens of individual manifests.
For teams at PADISO working with startups and enterprises modernising their data platforms, Helm-based Superset deployments are the standard. They reduce time-to-production, improve reproducibility, and align with platform engineering disciplines that underpin SOC 2 and ISO 27001 compliance.
This guide walks through a production-ready Superset deployment on Helm, covering the decisions, configurations, and operational habits that keep analytics infrastructure healthy at scale. Whether you’re embedding Superset into a multi-tenant SaaS, building a standalone BI platform, or replacing per-seat BI tools, this reference pattern applies across financial services, insurance, retail, and government sectors.
Pre-deployment Architecture Decisions
Before installing Superset on Helm, you must make four foundational architecture decisions that shape your entire deployment.
Single-Tenant vs. Multi-Tenant Superset
Single-tenant Superset runs one isolated instance per customer or business unit. Multi-tenant Superset runs a single shared instance where databases, dashboards, and users are logically isolated via Superset’s RBAC and row-level security (RLS) features.
Single-tenant deployments are simpler operationally but expensive at scale: each instance consumes compute, storage, and database resources. Multi-tenant deployments are cost-efficient but require careful configuration of Superset’s authentication (LDAP, OAuth2, SAML), database connections, and RLS rules to prevent data leakage.
For platform development in Sydney and across Australia’s financial services and retail sectors, we typically recommend multi-tenant deployments with strong isolation guarantees. This reduces operational overhead and aligns with modern SaaS architecture.
Metadata Database: PostgreSQL or MySQL
Superset stores dashboards, users, permissions, and query metadata in a relational database called the metadata store. The Helm chart defaults to PostgreSQL, which is the recommended choice for production. PostgreSQL offers superior performance, better support for Superset’s ORM queries, and more robust backup and recovery tooling.
MySQL is supported but introduces subtle compatibility issues with Superset’s SQLAlchemy ORM, particularly around transaction isolation and JSON column handling. For production deployments, always use PostgreSQL.
Your metadata database must be highly available. For AWS deployments, use Amazon RDS with Multi-AZ failover. For on-premises or Kubernetes-native setups, consider a managed PostgreSQL service or a StatefulSet-based PostgreSQL operator like CloudNativePG.
Message Queue: Redis or RabbitMQ
Superset uses a message queue to distribute asynchronous tasks: query execution, cache warming, and report generation. The Helm chart supports both Redis and RabbitMQ, but Redis is the de facto standard because it is simpler, faster, and requires less operational overhead.
Redis must be highly available. Use a managed Redis service (AWS ElastiCache, Azure Cache for Redis) or a Kubernetes-native Redis operator. Single-node Redis in production is a single point of failure and will cause cascading outages when it restarts.
Results Backend: Redis, S3, or Filesystem
Superset caches query results in a results backend to avoid re-executing identical queries. For distributed deployments, the results backend must be shared across all Superset pods. Redis is the simplest choice, but for large result sets or long-term archival, S3-compatible object storage (AWS S3, MinIO, DigitalOcean Spaces) is preferred.
Filesystem-based results backends are only acceptable for single-node or development deployments. In Kubernetes, pods are ephemeral; storing results on local disk guarantees data loss when pods restart.
Cluster Prerequisites and Namespace Setup
Before deploying Superset, ensure your Kubernetes cluster meets baseline requirements.
Cluster Sizing
Superset is not lightweight. A production deployment requires:
- Control plane: Standard Kubernetes control plane (AWS EKS, Azure AKS, Google GKE all provide this).
- Worker nodes: At least 3 nodes with 4 CPU and 8 GB RAM per node. For high-concurrency deployments (100+ simultaneous users), scale to 6+ nodes with 8 CPU and 16 GB RAM each.
- Persistent storage: A StorageClass that supports ReadWriteOnce (RWO) volumes for the metadata database and ReadWriteMany (RWX) for shared caches if using NFS.
- Network policies: Ingress controller for external access, network policies for pod-to-pod communication.
Namespace and RBAC
Create a dedicated namespace for Superset to isolate it from other workloads and simplify RBAC:
kubectl create namespace superset
kubectl label namespace superset environment=production
Create a ServiceAccount with minimal permissions:
apiVersion: v1
kind: ServiceAccount
metadata:
name: superset
namespace: superset
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: superset
namespace: superset
rules:
- apiGroups: [""]
resources: ["configmaps", "secrets"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: superset
namespace: superset
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: superset
subjects:
- kind: ServiceAccount
name: superset
namespace: superset
This ServiceAccount can read ConfigMaps and Secrets (needed for Superset to mount configuration) but cannot modify cluster resources.
StorageClass Configuration
Verify that your cluster has a default StorageClass:
kubectl get storageclass
If none exists, create one. For AWS EKS:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ebs-gp3
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "3000"
throughput: "125"
allowVolumeExpansion: true
For platform development across Canada and the United States, cloud-native storage classes (EBS, Azure Disk, GCE Persistent Disk) are standard. For on-premises deployments, use a CSI driver that integrates with your storage array (NetApp Trident, Pure FlashBlade, Portworx).
Helm Chart Installation and Values Override
The Apache Superset Helm chart is maintained in the official Apache Superset repository. You can also find it on Artifact Hub and Docker Hub.
Adding the Helm Repository
helm repo add superset https://apache.github.io/superset
helm repo update
Verify the chart is available:
helm search repo superset
You should see output like:
NAME CHART VERSION APP VERSION
superset/superset 0.X.X 2.1.X
Creating a Custom values.yaml
The canonical values.yaml file is extensive. Rather than modifying it directly, create a custom override file that contains only the settings you need to change:
# custom-values.yaml
image:
repository: apache/superset
tag: "2.1.3"
pullPolicy: IfNotPresent
replicaCount: 3
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 1
memory: 2Gi
postgresql:
enabled: false # Use external PostgreSQL
externalDatabase:
type: postgresql
host: superset-metadata.c9akciq32.ng.0001.use1.cache.amazonaws.com
port: 5432
user: superset
password: "${SUPERSET_DB_PASSWORD}" # Injected via Secret
database: superset
redis:
enabled: false # Use external Redis
externalRedis:
host: superset-cache.c9akciq32.ng.0001.use1.cache.amazonaws.com
port: 6379
password: "${SUPERSET_REDIS_PASSWORD}"
ingress:
enabled: true
ingressClassName: nginx
hosts:
- host: superset.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: superset-tls
hosts:
- superset.example.com
env:
SUPERSET_ENV: production
SUPERSET_LOAD_EXAMPLES: "false"
SUPERSET_SECRET_KEY: "${SUPERSET_SECRET_KEY}" # Injected via Secret
This configuration:
- Uses external PostgreSQL and Redis (managed services or separate Kubernetes deployments).
- Sets 3 Superset web replicas for high availability.
- Allocates 1–2 CPU and 2–4 GB RAM per pod (adjust based on your workload).
- Configures HTTPS ingress with a TLS certificate.
- Disables example data to reduce initial setup time.
Installing the Chart
Before installing, create a Secret containing sensitive values:
kubectl create secret generic superset-secrets \
--from-literal=db-password='your-secure-password' \
--from-literal=redis-password='your-redis-password' \
--from-literal=secret-key='your-secret-key' \
-n superset
Then install the chart:
helm install superset superset/superset \
-f custom-values.yaml \
--namespace superset \
--create-namespace
Monitor the rollout:
kubectl rollout status deployment/superset -n superset
kubectl get pods -n superset
Once all pods are running, verify the Superset web UI is accessible via your ingress hostname.
Networking and Service Exposure
Superset must be accessible to end users and to internal services (worker pods, monitoring agents). The Helm chart exposes Superset via a Kubernetes Service and an Ingress resource.
Service Type Configuration
The chart creates a ClusterIP Service by default, which is correct for most deployments. The Kubernetes documentation on Services explains the trade-offs:
- ClusterIP: Accessible only within the cluster. Use an Ingress controller to expose Superset to the internet.
- NodePort: Exposes Superset on a high-numbered port on each node. Suitable for development, not production.
- LoadBalancer: Provisions a cloud load balancer (AWS ELB, Azure LB, GCP LB). Use this if you do not have an Ingress controller.
For production, ClusterIP + Ingress is the standard pattern:
service:
type: ClusterIP
port: 8088
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8088"
prometheus.io/path: "/metrics"
Ingress Configuration for HTTPS
Expose Superset over HTTPS with a TLS certificate:
ingress:
enabled: true
ingressClassName: nginx # Or "alb" for AWS ALB, "gce" for GCP
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
hosts:
- host: superset.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: superset-tls
hosts:
- superset.example.com
If using cert-manager for automatic TLS certificate provisioning:
helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace
Then create a ClusterIssuer:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: your-email@example.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
Network Policies for Pod-to-Pod Communication
In a production cluster, implement network policies to restrict traffic between pods:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: superset-allow-internal
namespace: superset
spec:
podSelector:
matchLabels:
app: superset
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: superset
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8088
egress:
- to:
- podSelector:
matchLabels:
app: superset
ports:
- protocol: TCP
port: 6379 # Redis
- protocol: TCP
port: 5432 # PostgreSQL
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 53 # DNS
- protocol: UDP
port: 53
This policy allows ingress traffic from the NGINX Ingress controller and permits egress to Redis, PostgreSQL, and DNS services.
Persistent Storage Configuration
Superset requires persistent storage for user uploads (CSV files, database drivers), cache data, and metadata. The Helm chart manages this via PersistentVolumeClaims (PVCs).
ConfigMap and Secret Volumes
Superset reads configuration from environment variables and mounted files. Store sensitive configuration in Kubernetes Secrets:
apiVersion: v1
kind: Secret
metadata:
name: superset-config
namespace: superset
type: Opaque
stringData:
superset_config.py: |
import os
SECRET_KEY = os.environ.get('SUPERSET_SECRET_KEY')
SQLALCHEMY_DATABASE_URI = os.environ.get('DATABASE_URL')
REDIS_HOST = os.environ.get('REDIS_HOST')
REDIS_PORT = int(os.environ.get('REDIS_PORT', 6379))
REDIS_PASSWORD = os.environ.get('REDIS_PASSWORD')
CACHE_CONFIG = {
'CACHE_TYPE': 'RedisCache',
'CACHE_REDIS_URL': f'redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}/0',
'CACHE_DEFAULT_TIMEOUT': 300,
}
RESULTS_BACKEND_USE_MSGPACK = True
RESULTS_BACKEND = {
'uri': f'redis://:{REDIS_PASSWORD}@{REDIS_HOST}:{REDIS_PORT}/1',
'key_prefix': 'superset_results',
}
Mount this Secret as a volume in the Superset deployment:
volumes:
- name: superset-config
secret:
secretName: superset-config
items:
- key: superset_config.py
path: superset_config.py
volumeMounts:
- name: superset-config
mountPath: /app/superset_config.py
subPath: superset_config.py
PersistentVolume for Uploads and Cache
Superset stores user-uploaded files in /app/superset_home. For multi-pod deployments, this directory must be shared across all pods via a ReadWriteMany (RWX) PersistentVolume:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: superset-home
namespace: superset
spec:
accessModes:
- ReadWriteMany
storageClassName: nfs # Or your RWX storage class
resources:
requests:
storage: 10Gi
Then mount it in the Superset pods:
volumeMounts:
- name: superset-home
mountPath: /app/superset_home
volumes:
- name: superset-home
persistentVolumeClaim:
claimName: superset-home
For AWS EKS, use EFS (Elastic File System) with the EFS CSI driver. For Azure AKS, use Azure Files. For GKE, use Filestore.
Database Backup and Recovery
Your PostgreSQL metadata database is the source of truth for all Superset configuration. Implement automated backups:
- AWS RDS: Enable automated backups with a 30-day retention window and cross-region snapshots.
- Azure Database for PostgreSQL: Enable geo-redundant backups.
- On-premises: Use
pg_dumpin a CronJob to back up the database to S3 or object storage daily.
Example CronJob for PostgreSQL backup:
apiVersion: batch/v1
kind: CronJob
metadata:
name: superset-db-backup
namespace: superset
spec:
schedule: "0 2 * * *" # 2 AM daily
jobTemplate:
spec:
template:
spec:
serviceAccountName: superset
containers:
- name: backup
image: postgres:15
command:
- /bin/bash
- -c
- |
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME | \
aws s3 cp - s3://superset-backups/$(date +%Y%m%d-%H%M%S).sql.gz --sse AES256
env:
- name: DB_HOST
value: superset-metadata.c9akciq32.ng.0001.use1.cache.amazonaws.com
- name: DB_USER
valueFrom:
secretKeyRef:
name: superset-secrets
key: db-user
- name: DB_NAME
value: superset
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: superset-secrets
key: db-password
restartPolicy: OnFailure
Secrets Management and Environment Variables
Superset requires sensitive configuration: database passwords, Redis credentials, API keys, and encryption keys. Never commit these to version control.
Using Kubernetes Secrets
Store all sensitive data in Kubernetes Secrets:
kubectl create secret generic superset-secrets \
--from-literal=DATABASE_URL='postgresql://superset:password@db-host:5432/superset' \
--from-literal=REDIS_URL='redis://:password@redis-host:6379/0' \
--from-literal=SUPERSET_SECRET_KEY='your-256-bit-secret-key' \
--from-literal=MAPBOX_API_KEY='pk.xxxxx' \
-n superset
Inject these into the Superset deployment:
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: superset-secrets
key: DATABASE_URL
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: superset-secrets
key: REDIS_URL
- name: SUPERSET_SECRET_KEY
valueFrom:
secretKeyRef:
name: superset-secrets
key: SUPERSET_SECRET_KEY
- name: MAPBOX_API_KEY
valueFrom:
secretKeyRef:
name: superset-secrets
key: MAPBOX_API_KEY
Rotating Secrets
Secrets should be rotated periodically (every 90 days for passwords, immediately if compromised). Use a secret management tool like HashiCorp Vault or AWS Secrets Manager to automate rotation:
# Update a secret in Kubernetes
kubectl patch secret superset-secrets -p \
'{"data":{"DATABASE_URL":"'$(echo -n 'new-password' | base64)'"}}' \
-n superset
# Restart Superset pods to pick up the new secret
kubectl rollout restart deployment/superset -n superset
External Secret Management with Sealed Secrets
For GitOps workflows, encrypt secrets using Sealed Secrets:
helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
helm install sealed-secrets sealed-secrets/sealed-secrets -n kube-system
Then create and seal a secret:
echo -n 'my-password' | kubectl create secret generic superset-secrets \
--dry-run=client --from-file=db-password=/dev/stdin \
-o yaml | kubeseal -o yaml > sealed-secret.yaml
kubectl apply -f sealed-secret.yaml
Now the sealed secret can be safely committed to Git and automatically decrypted by the Sealed Secrets controller.
Database Backend and Metadata Store
Superset’s metadata store is the backbone of the system. It must be highly available, backed up regularly, and monitored for performance.
PostgreSQL Configuration
For production, use a managed PostgreSQL service:
- AWS RDS: Multi-AZ deployment with automated failover, automated backups, and read replicas.
- Azure Database for PostgreSQL: Flexible Server with zone-redundant high availability.
- Google Cloud SQL: High-availability configuration with automatic failover.
- On-premises: PostgreSQL 13+ with streaming replication and automated failover via Patroni or etcd.
Configure Superset to connect to PostgreSQL:
externalDatabase:
type: postgresql
host: superset-metadata.c9akciq32.ng.0001.use1.cache.amazonaws.com
port: 5432
user: superset
password: "${SUPERSET_DB_PASSWORD}"
database: superset
sslMode: require
sslRootCertificate: /etc/ssl/certs/ca-bundle.crt
Enable SSL/TLS for database connections to encrypt credentials in transit.
Database Initialisation
Before starting Superset, initialise the metadata database:
kubectl exec -it deployment/superset -n superset -- \
superset db upgrade
This creates all required tables and indexes. For production deployments, run this once during initial setup, not on every pod restart.
Connection Pooling
Superset uses SQLAlchemy to connect to PostgreSQL. Configure connection pooling to avoid exhausting database connections:
env:
SQLALCHEMY_POOL_SIZE: "20"
SQLALCHEMY_MAX_OVERFLOW: "10"
SQLALCHEMY_POOL_RECYCLE: "3600"
SQLALCHEMY_POOL_PRE_PING: "true"
These settings:
- Maintain a pool of 20 idle connections.
- Allow up to 10 additional connections under load.
- Recycle connections after 1 hour to prevent stale connections.
- Test connections before reusing them (pre-ping).
Monitoring Database Performance
Monitor PostgreSQL for slow queries, connection count, and disk usage:
-- Slow queries
SELECT query, mean_exec_time, calls FROM pg_stat_statements
ORDER BY mean_exec_time DESC LIMIT 10;
-- Connection count
SELECT datname, count(*) FROM pg_stat_activity GROUP BY datname;
-- Disk usage
SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename))
FROM pg_tables WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
Set up alerts in your monitoring system (Prometheus, DataDog, New Relic) for:
- Connections > 80% of max_connections.
- Query duration > 5 seconds.
- Disk usage > 80% of available space.
Autoscaling and Resource Management
Superset workloads are unpredictable: query load spikes during reporting periods, cache misses trigger expensive database queries, and user onboarding can double concurrent users overnight. Autoscaling ensures the system remains responsive without over-provisioning.
Horizontal Pod Autoscaler (HPA)
Autoscale Superset web pods based on CPU and memory usage:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: superset-hpa
namespace: superset
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: superset
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
This HPA:
- Maintains 3–10 Superset pods.
- Scales up aggressively (double the replicas every 30 seconds) when CPU exceeds 70% or memory exceeds 80%.
- Scales down conservatively (reduce by 50% every 60 seconds) after a 5-minute stability window.
Worker Pod Autoscaling
Superset workers execute asynchronous tasks (query caching, report generation). They also need autoscaling:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: superset-worker-hpa
namespace: superset
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: superset-worker
minReplicas: 2
maxReplicas: 8
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75
Vertical Pod Autoscaler (VPA)
VPA automatically adjusts CPU and memory requests based on actual usage. This is useful for right-sizing resource requests:
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa --namespace kube-system
Then create a VPA recommendation:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: superset-vpa
namespace: superset
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: superset
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "superset"
minAllowed:
cpu: 100m
memory: 256Mi
maxAllowed:
cpu: 4
memory: 8Gi
VPA will recommend resource adjustments and automatically restart pods with updated resource requests.
Node Autoscaling
Ensure your cluster can scale worker nodes to accommodate pod scaling. Enable cluster autoscaler:
AWS EKS:
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
--namespace kube-system \
--set autoDiscovery.clusterName=my-cluster \
--set awsRegion=us-east-1
Azure AKS:
az aks update --enable-cluster-autoscaler \
--min-count 3 --max-count 10 \
--resource-group my-rg --name my-cluster
Production Operational Habits
Deploying Superset is the beginning, not the end. Production systems require discipline, monitoring, and incident response.
Daily Health Checks
Every morning, verify:
- Pod status:
kubectl get pods -n superset— all pods should be Running. - Ingress accessibility: Curl the Superset URL and verify the login page loads.
- Database connectivity: Run a test query via the Superset UI.
- Redis health: Check Redis is reachable and not evicting keys.
- Disk usage: Verify persistent volumes are not near capacity.
Autommate these checks with a Kubernetes CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
name: superset-health-check
namespace: superset
spec:
schedule: "0 8 * * *" # 8 AM daily
jobTemplate:
spec:
template:
spec:
serviceAccountName: superset
containers:
- name: health-check
image: curlimages/curl:latest
command:
- /bin/sh
- -c
- |
curl -f https://superset.example.com/api/v1/health || exit 1
echo "Superset health check passed"
restartPolicy: OnFailure
Update and Patch Management
Superset releases updates monthly. Plan updates for low-traffic periods:
- Test in staging: Deploy the new Superset version to a staging cluster first.
- Run database migrations:
superset db upgrademay add new tables or columns. - Drain connections: Use pod disruption budgets to prevent connection loss:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: superset-pdb
namespace: superset
spec:
minAvailable: 2
selector:
matchLabels:
app: superset
- Perform a rolling update: Helm will automatically perform a rolling update:
helm upgrade superset superset/superset \
-f custom-values.yaml \
--namespace superset \
--wait
- Verify the update: Check that all pods are running and the UI is responsive.
Capacity Planning
Track usage over time and plan for growth:
- Concurrent users: Monitor active sessions via Superset’s admin panel.
- Query volume: Track queries per minute and average query duration.
- Data volume: Monitor the size of cached results and metadata database.
- Storage growth: Project when persistent volumes will reach 80% capacity.
For platform development in Canberra and San Francisco, we typically plan for 50% annual growth in query volume and data size.
Disaster Recovery
Prepare for failures:
- Metadata database backup: Automate daily backups to S3 or object storage (see Database Backup and Recovery above).
- Restore procedure: Document how to restore from backup and test monthly.
- Disaster recovery drill: Simulate a complete Superset failure (delete the namespace) and verify you can restore within 1 hour.
- RTO and RPO targets: Define Recovery Time Objective (RTO, e.g., 4 hours) and Recovery Point Objective (RPO, e.g., 24 hours of data loss).
Example disaster recovery procedure:
# 1. Delete the failed Superset namespace
kubectl delete namespace superset
# 2. Restore the PostgreSQL database from backup
aws s3 cp s3://superset-backups/latest.sql.gz - | gunzip | psql -h new-db-host -U superset
# 3. Reinstall Superset
helm install superset superset/superset \
-f custom-values.yaml \
--namespace superset \
--create-namespace
# 4. Verify the UI is accessible and dashboards are restored
curl https://superset.example.com
Monitoring, Logging, and Observability
You cannot operate what you cannot see. Implement comprehensive monitoring, logging, and tracing.
Prometheus Metrics
Superset exposes Prometheus metrics on /metrics. Scrape them with Prometheus:
apiVersion: v1
kind: ServiceMonitor
metadata:
name: superset
namespace: superset
spec:
selector:
matchLabels:
app: superset
endpoints:
- port: metrics
interval: 30s
path: /metrics
Key metrics to alert on:
superset_request_duration_seconds: Histogram of request latency.superset_database_query_duration_seconds: Time spent executing database queries.superset_cache_hits: Count of cache hits and misses.process_resident_memory_bytes: Memory usage per pod.process_cpu_seconds_total: CPU usage per pod.
Create PrometheusRule alerts:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: superset-alerts
namespace: superset
spec:
groups:
- name: superset
interval: 30s
rules:
- alert: SupersetHighErrorRate
expr: rate(superset_request_errors_total[5m]) > 0.05
for: 5m
annotations:
summary: "Superset error rate > 5%"
- alert: SupersetHighMemory
expr: process_resident_memory_bytes > 3.5e9
for: 5m
annotations:
summary: "Superset pod memory > 3.5 GB"
- alert: SupersetSlowQueries
expr: histogram_quantile(0.95, superset_database_query_duration_seconds) > 10
for: 5m
annotations:
summary: "95th percentile query duration > 10 seconds"
Logging with ELK or Loki
Collect Superset logs in a centralised logging system:
Using Loki (lightweight, Kubernetes-native):
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
--namespace monitoring \
--create-namespace
Configure Superset to log to stdout (the Helm chart does this by default), and Loki will scrape logs from pod output.
Using ELK (Elasticsearch, Logstash, Kibana):
helm repo add elastic https://helm.elastic.co
helm install elasticsearch elastic/elasticsearch --namespace elk --create-namespace
helm install kibana elastic/kibana --namespace elk
Then configure Logstash to parse Superset logs and ship them to Elasticsearch.
Distributed Tracing with Jaeger
Trace requests across Superset, Redis, and PostgreSQL:
helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm install jaeger jaegertracing/jaeger --namespace monitoring
Enable tracing in Superset:
env:
SUPERSET_TRACING_ENABLED: "true"
SUPERSET_TRACING_JAEGER_HOST: jaeger-agent.monitoring.svc.cluster.local
SUPERSET_TRACING_JAEGER_PORT: "6831"
Then query Jaeger UI to see end-to-end latency breakdowns.
Troubleshooting Common Deployment Issues
Even well-architected deployments encounter issues. Here are the most common problems and solutions.
Pod Stuck in CrashLoopBackOff
Symptom: Superset pods restart repeatedly.
Diagnosis:
kubectl logs -f deployment/superset -n superset
kubectl describe pod <pod-name> -n superset
Common causes and fixes:
-
Database connection failed: Verify DATABASE_URL is correct and PostgreSQL is reachable.
kubectl exec -it deployment/superset -n superset -- \ psql $DATABASE_URL -c "SELECT 1" -
Redis unavailable: Verify Redis is running and password is correct.
kubectl exec -it deployment/superset -n superset -- \ redis-cli -h $REDIS_HOST -p $REDIS_PORT ping -
Insufficient memory: Increase memory request/limit in values.yaml.
-
Secret missing: Verify all required Secrets exist:
kubectl get secrets -n superset
Slow Queries and Timeouts
Symptom: Dashboards load slowly or time out.
Diagnosis:
-
Check Superset logs for slow query warnings:
kubectl logs deployment/superset -n superset | grep "slow" -
Query PostgreSQL directly to identify slow queries:
SELECT query, mean_exec_time, calls FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 5; -
Check Redis for evictions:
kubectl exec -it <redis-pod> -- redis-cli INFO stats | grep evicted
Fixes:
- Add database indexes: Superset’s ORM queries often benefit from indexes on dashboard, datasource, and user tables.
- Increase query timeout: Set
SUPERSET_SQLLAB_TIMEOUT = 300(seconds). - Enable query caching: Ensure Redis is configured and cache keys are not being evicted.
- Reduce query scope: Encourage users to filter by date range or use materialized views.
Out of Memory (OOM) Kills
Symptom: Superset pods are killed with OOMKilled status.
Diagnosis:
kubectl describe pod <pod-name> -n superset | grep OOMKilled
kubectl top pods -n superset # Current memory usage
Fixes:
- Increase memory limit: Update resources.limits.memory in values.yaml.
- Reduce result set size: Add SUPERSET_SQLLAB_ROW_LIMIT = 10000 to prevent loading massive result sets into memory.
- Enable result backend pagination: Configure Superset to stream large results rather than buffering them.
- Use VPA: Let VPA recommend appropriate memory limits based on actual usage.
Persistent Volume Claims Stuck in Pending
Symptom: PVCs are not being provisioned.
Diagnosis:
kubectl describe pvc superset-home -n superset
kubectl get storageclass
Fixes:
- Verify StorageClass exists: Create one if missing (see Cluster Prerequisites section).
- Check node capacity: Ensure nodes have available disk space.
- Verify CSI driver: For EFS, EBS, or Azure Files, ensure the CSI driver is installed.
Ingress Not Accessible
Symptom: Cannot reach Superset via the ingress hostname.
Diagnosis:
kubectl get ingress -n superset
kubectl describe ingress superset -n superset
kubectl logs -n ingress-nginx deployment/nginx-ingress-controller
Fixes:
- Verify DNS resolution:
nslookup superset.example.comshould resolve to the ingress IP. - Check TLS certificate: Verify the TLS secret exists and is valid.
- Verify backend service: Ensure the Service is pointing to running pods.
kubectl get endpoints superset -n superset
Next Steps and Ongoing Support
You now have a production-ready Superset deployment on Helm. The next phase is optimisation and integration.
Integration with Data Platforms
Superset is most powerful when connected to robust data warehouses. Consider integrating with:
- ClickHouse: For time-series analytics at scale. Superset has native ClickHouse support.
- Snowflake: For cloud-native data warehousing with unlimited scalability.
- BigQuery: For serverless analytics on Google Cloud.
- Redshift: For AWS-native data warehousing.
- Postgres: For self-hosted analytics (as in this deployment).
For teams at PADISO, we often recommend a Superset + ClickHouse architecture for cost-efficient analytics at scale. This combination replaces per-seat BI tools and reduces infrastructure costs by 60–80%.
Security Hardening
Before going to production, implement additional security measures:
- Enable RBAC: Configure Superset’s role-based access control to restrict dashboard and datasource access by user role.
- Enable row-level security (RLS): Prevent users from seeing data outside their scope.
- Audit logging: Enable audit logs to track who accessed which dashboards and when.
- API authentication: Require API keys for programmatic access.
- SOC 2 / ISO 27001 compliance: If required, implement audit-readiness via Vanta or similar tools.
For teams pursuing compliance, PADISO offers a fixed-fee AI Quickstart Audit (AU$10K, 2 weeks) to assess your current state and define a roadmap to SOC 2 or ISO 27001 certification.
Scaling Beyond a Single Instance
As your analytics workload grows, consider:
- Multi-region deployments: Deploy Superset in multiple regions for low-latency access and disaster recovery.
- Federated query execution: Use Superset’s query federation to query across multiple data sources in a single dashboard.
- Embedding analytics: Embed Superset dashboards in your product using Superset’s embedded dashboard feature.
- Custom plugins: Develop custom Superset plugins to add domain-specific visualisations.
For enterprises modernising their analytics stacks, PADISO provides platform engineering services across Australia, the United States, Canada, and New Zealand. We specialise in Superset + ClickHouse deployments that replace legacy BI tools and unlock real-time analytics.
Getting Help
If you encounter issues beyond this guide:
- Apache Superset documentation: https://apache.github.io/superset/ — the authoritative reference.
- Kubernetes documentation: https://kubernetes.io/docs/ — for cluster-level issues.
- Helm documentation: https://helm.sh/docs/ — for chart management.
- Community support: The Apache Superset Slack and GitHub discussions are active and helpful.
- Professional support: PADISO offers fractional CTO and platform engineering services to teams deploying Superset at scale. Book a call to discuss your analytics architecture.
Summary
Apache Superset on Helm is a powerful, scalable analytics platform when deployed with discipline. This guide covered:
- Architecture decisions: Single vs. multi-tenant, database choice, message queue, and results backend.
- Cluster setup: Namespace, RBAC, and StorageClass configuration.
- Helm installation: Custom values, chart installation, and rollout verification.
- Networking: Service exposure, Ingress configuration, and network policies.
- Storage: ConfigMaps, Secrets, PVCs, and backup strategies.
- Secrets management: Kubernetes Secrets, rotation, and sealed secrets for GitOps.
- Database backend: PostgreSQL configuration, connection pooling, and monitoring.
- Autoscaling: HPA, VPA, and node autoscaling for dynamic workloads.
- Operations: Health checks, updates, capacity planning, and disaster recovery.
- Observability: Prometheus metrics, logging, and distributed tracing.
- Troubleshooting: Common issues and solutions.
With this foundation, you can deploy Superset confidently, scale it reliably, and operate it securely. For teams seeking expert guidance, PADISO’s platform engineering services and CTO as a Service offerings provide the fractional leadership and co-build support to move fast and build right.