Guide 24 mins

Apache Superset on Fly.io: Reference Deployment Pattern

Step-by-step production deployment of Apache Superset on Fly.io. Covers networking, storage, secrets, autoscaling, and operational best practices.

The PADISO Team ·2026-06-03

Why Superset on Fly.io
Architecture Overview
Prerequisites and Setup
Building the Superset Container
Configuring Fly.io for Production
Networking and Database Configuration
Secrets Management and Environment Variables
Persistent Storage and Volume Management
Autoscaling and Performance Tuning
Monitoring, Logging, and Health Checks
Operational Habits for Production Stability
Troubleshooting Common Issues
Next Steps and Scaling Considerations

Why Superset on Fly.io

Apache Superset is a modern, open-source data visualisation and business intelligence platform. Fly.io is a container-native platform that lets you deploy applications globally with minimal infrastructure overhead. Together, they form a compelling foundation for teams that need embedded analytics, dashboard-driven insights, or a lightweight BI layer without the operational burden of managing Kubernetes or traditional cloud infrastructure.

Why this combination works: Superset is stateless by design—it stores metadata in a database and serves dashboards from memory. Fly.io is built on Firecracker VMs and Docker containers, making it ideal for stateless workloads. You get sub-second deploys, automatic rollbacks, and a pricing model that scales with actual usage rather than reserved capacity.

At PADISO, we’ve deployed Superset at scale across financial services, retail, and government teams in Australia and the US. We use Superset as the embedded analytics layer in multi-tenant SaaS platforms, as the BI backbone for data-driven operations, and as a cost-efficient replacement for per-seat BI tools. This guide captures the reference pattern we’ve refined across 50+ production deployments.

The stakes are real: a misconfigured Superset instance can leak data, expose query credentials, or become a bottleneck during peak reporting hours. This guide walks you through the exact steps to avoid those traps.

Architecture Overview

Before you write a single line of configuration, understand the shape of the system you’re building.

Components and Data Flow

Superset comprises several moving parts:

Web Server: The Flask application that serves dashboards, handles user sessions, and manages the UI.
Metadata Database: Stores dashboard definitions, user accounts, data source configurations, and query cache. PostgreSQL is the production standard.
Message Broker: Redis or Celery for background task queuing (email alerts, scheduled reports, query execution).
Query Cache: Redis, used to cache query results and reduce load on data sources.
Data Sources: The external databases (PostgreSQL, MySQL, Snowflake, BigQuery, ClickHouse, etc.) that Superset queries.

On Fly.io, you’ll run:

Superset app container (one or more instances, auto-scaled)
Metadata database (managed PostgreSQL or self-hosted on Fly Volumes)
Redis instance (for caching and message brokering)
Data source connectivity (direct outbound connections or private networking)

The diagram is simple:

User Browser → Fly Proxy → Superset App Instances ↔ Redis Cache
                                      ↓
                              PostgreSQL Metadata DB
                                      ↓
                            Data Source (e.g., ClickHouse)

Each Superset instance is stateless. If one crashes, Fly.io starts a replacement. If traffic spikes, Fly.io scales horizontally. The metadata database and Redis are the only stateful components, and both must be backed up and monitored.

Why This Pattern Works at Scale

We’ve used this pattern to support dashboards serving 1000+ concurrent users, query latencies under 500ms, and zero-downtime deployments. The pattern holds because:

Superset is horizontally scalable: Add more app instances, and throughput increases linearly.
Fly.io’s networking is fast: Sub-millisecond latency between instances and to managed databases.
Redis is a proven cache layer: Query results cached for 5–10 minutes reduce database load by 80–90%.
Fly Volumes provide durability: If you need persistent state on the app container (e.g., for file uploads), Volumes ensure data survives restarts.

This is the same pattern used by teams at Platform Development in Sydney, Platform Development in Melbourne, and Platform Development in Australia to replace per-seat BI tools with embedded Superset + ClickHouse analytics.

Prerequisites and Setup

You’ll need:

A Fly.io account (free tier available, but production deployments need a paid account).
The Fly CLI installed on your machine. Install it here.
Docker installed locally for building and testing the container image.
A PostgreSQL database for Superset metadata. You can use Fly Postgres or an external managed service (AWS RDS, Azure Database, etc.).
A Redis instance for caching and message brokering. Again, Fly Redis or external.
A data source (e.g., ClickHouse, PostgreSQL, Snowflake) that Superset will query.
A domain name and SSL certificate (Fly.io provides free certificates via Let’s Encrypt).

Initial Fly.io Setup

fly auth login

Create a new Fly app:

fly apps create superset-prod

This reserves the app name and creates a fly.toml configuration file in your project root. You’ll edit this file extensively.

Database and Cache Infrastructure

For a production deployment, you have two paths:

Path A: Managed Services (Recommended for most teams)

Use Fly Postgres and Fly Redis. They’re integrated, highly available, and require minimal configuration:

fly postgres create --name superset-db --initial-cluster-size 3
fly redis create --name superset-cache

Fly will output connection strings. Save these—you’ll need them in your secrets.

Path B: External Managed Services

If you already have PostgreSQL on AWS RDS and Redis on ElastiCache, you can connect directly. This works well if you want to centralise infrastructure or use region-specific services. The connection strings go into Fly secrets (covered below).

For data sources, Superset connects outbound. If your data source is in a private VPC, you’ll need to either:

Expose it to the internet (with strong authentication).
Use Fly’s private networking to connect to another Fly app.
Set up a VPN or bastion host.

For teams at Platform Development in Canberra working with government and defence, we often use IRAP-aligned private networking and bastion hosts to keep data sources off the public internet.

Building the Superset Container

Superset ships as a Docker image, but you’ll want to customise it for your environment: add drivers for your data sources, bake in configuration, and optimise for production.

Dockerfile Strategy

Here’s a production-grade Dockerfile:

FROM apache/superset:latest-dev

# Install additional database drivers
RUN pip install --no-cache-dir \
    psycopg2-binary \
    pymysql \
    clickhouse-driver \
    snowflake-sqlalchemy \
    google-cloud-bigquery \
    pyarrow

# Copy custom configuration
COPY superset_config.py /app/superset_config.py

# Set environment variables for production
ENV SUPERSET_CONFIG_PATH=/app/superset_config.py
ENV FLASK_ENV=production
ENV PYTHONUNBUFFERED=1

# Expose port
EXPOSE 8088

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD curl -f http://localhost:8088/health || exit 1

CMD ["gunicorn", "--workers", "4", "--worker-class", "gthread", "--threads", "2", "--bind", "0.0.0.0:8088", "superset.app:create_app()"]

Key decisions:

Database drivers: Install only the drivers you need. Each adds size and attack surface.
Custom config file: Bake in sensible defaults for production (secret key, cache configuration, feature flags).
Gunicorn with threading: Use 4 workers with 2 threads each. This balances memory and concurrency. Adjust based on Fly machine size.
Health check: Fly.io uses this to detect crashed instances and replace them.

For Installing Superset Using Docker Compose - Apache Superset, the official documentation is a good starting point, but production deployments need the hardening shown above.

Superset Configuration File

Create superset_config.py in your project root:

import os
from datetime import timedelta

# Flask app config
SECRET_KEY = os.environ.get('SECRET_KEY', 'dev-key-change-in-production')
WTF_CSRF_ENABLED = True
WTF_CSRF_EXEMPT_LIST = []

# Database
SQLALCHEMY_DATABASE_URI = os.environ.get(
    'DATABASE_URL',
    'postgresql://user:password@localhost:5432/superset'
)
SQLALCHEMY_TRACK_MODIFICATIONS = False
SQLALCHEMY_ECHO = False
SQLALCHEMY_POOL_SIZE = 20
SQLALCHEMY_POOL_RECYCLE = 3600
SQLALCHEMY_POOL_TIMEOUT = 30

# Cache
CACHE_DEFAULT_TIMEOUT = 300  # 5 minutes
CACHE_CONFIG = {
    'CACHE_TYPE': 'RedisCache',
    'CACHE_REDIS_URL': os.environ.get('REDIS_URL', 'redis://localhost:6379/0'),
    'CACHE_DEFAULT_TIMEOUT': 300,
}

# Celery (background tasks)
CELERY_BROKER_URL = os.environ.get('REDIS_URL', 'redis://localhost:6379/1')
CELERY_RESULT_BACKEND = os.environ.get('REDIS_URL', 'redis://localhost:6379/2')
CELERY_TASK_TRACK_STARTED = True
CELERY_TASK_TIME_LIMIT = 30 * 60  # 30 minutes
CELERY_TASK_SOFT_TIME_LIMIT = 25 * 60  # 25 minutes

# Security
SUPERSET_WEBSERVER_PROTOCOL = 'https'
SUPERSET_WEBSERVER_PORT = 8088
CORS_ORIGINS = os.environ.get('CORS_ORIGINS', '').split(',')
ALLOWED_EXTENSIONS = {'csv', 'json', 'xlsx'}
MAX_CONTENT_LENGTH = 50 * 1024 * 1024  # 50 MB

# Feature flags
FEATURE_FLAGS = {
    'ALERT_REPORTS': True,
    'DASHBOARD_RBAC': True,
    'ENABLE_TEMPLATE_PROCESSING': True,
    'VERSIONED_EXPORT': True,
}

# Logging
LOG_FORMAT = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
LOG_LEVEL = 'INFO'

# Session
SESSION_COOKIE_SECURE = True
SESSION_COOKIE_HTTPONLY = True
SESSION_COOKIE_SAMESITE = 'Lax'
PERMANENT_SESSION_LIFETIME = timedelta(hours=24)

This configuration:

Reads secrets from environment variables (covered below).
Configures PostgreSQL with connection pooling (essential for production).
Points to Redis for caching and Celery message brokering.
Sets security headers and session cookies.
Enables critical feature flags (RBAC, alerts, versioned exports).

The pool settings are crucial: SQLALCHEMY_POOL_SIZE=20 means each Superset instance maintains up to 20 connections to the metadata database. With 3 instances, that’s 60 concurrent connections. Adjust based on your database’s max_connections setting.

Configuring Fly.io for Production

Now you’ll configure fly.toml and deploy.

fly.toml Essentials

Here’s a production-grade fly.toml:

app = "superset-prod"
primary_region = "syd"

[build]
  image = "superset-prod:latest"
  dockerfile = "Dockerfile"

[env]
  SUPERSET_ENV = "production"
  LOG_LEVEL = "INFO"
  WORKERS = "4"

[[services]]
  internal_port = 8088
  protocol = "tcp"
  [[services.ports]]
    port = 80
    handlers = ["http"]
  [[services.ports]]
    port = 443
    handlers = ["tls", "http"]

[http_service]
  internal_port = 8088
  force_https = true
  auto_stop_machines = "stop"
  auto_start_machines = true
  min_machines_running = 2
  processes = ["app"]

[[vm]]
  size = "shared-cpu-2x"
  memory_mb = 1024
  processes = ["app"]

[[processes]]
  name = "app"
  cmd = ["gunicorn", "--workers", "4", "--worker-class", "gthread", "--threads", "2", "--bind", "0.0.0.0:8088", "superset.app:create_app()"]

[checks]
  [checks.http]
    type = "http"
    interval = "30s"
    timeout = "5s"
    grace_period = "40s"
    method = "GET"
    path = "/health"
    expected_http_status = 200

Key settings:

primary_region: Set to your closest Fly region (e.g., syd for Sydney, sfo for San Francisco). This is where new instances start.
min_machines_running: Set to 2 for high availability. Fly will always keep at least 2 instances running.
auto_start_machines: Allows Fly to spin up additional instances during traffic spikes.
vm.size: shared-cpu-2x is a good starting point (2 vCPU, 1 GB RAM). For heavy workloads, use performance-1x or larger.
health check: Fly pings /health every 30 seconds. If it fails 3 times, the instance is replaced.

For teams using Deploy an app - Fly Docs, the official deployment guide covers the basics, but this configuration adds production hardening.

Networking and Database Configuration

Superset needs to reach two databases: the metadata database (PostgreSQL) and your data sources (ClickHouse, Snowflake, etc.).

Internal Networking with Fly Postgres

If you created Fly Postgres and Fly Redis above, they’re automatically discoverable via internal DNS. In your fly.toml, add:

[[services]]
  internal_port = 8088
  protocol = "tcp"
  [[services.ports]]
    port = 80
    handlers = ["http"]
  [[services.ports]]
    port = 443
    handlers = ["tls", "http"]

[env]
  DATABASE_URL = "postgresql://superset:password@superset-db.internal:5432/superset"
  REDIS_URL = "redis://superset-cache.internal:6379"

Fly’s internal DNS resolves superset-db.internal and superset-cache.internal automatically within your Fly organisation. This keeps traffic off the public internet and reduces latency.

External Data Source Connectivity

For data sources outside Fly (e.g., a ClickHouse cluster on AWS), Superset connects outbound over the public internet. This is fine as long as:

The data source requires authentication (username, password, API key).
The connection is encrypted (TLS/SSL).
Network access is restricted (firewall rules, IP allowlisting).

In Superset, you’ll create a data source connection with:

Host: The public hostname of your data source (e.g., clickhouse.example.com).
Port: Typically 8443 for ClickHouse over HTTPS.
Username and Password: Stored encrypted in the Superset metadata database.
SSL Mode: Enabled.

For sensitive deployments (government, finance), consider:

VPN tunnels: Route data source traffic through a VPN appliance.
Bastion hosts: Use an SSH tunnel or HTTP proxy to reach private data sources.
Fly private networking: If your data source is also on Fly, use internal DNS.

Teams at Platform Development in San Francisco and Platform Development in Seattle often use bastion hosts for multi-tenant SaaS platforms where data isolation is critical.

Connection Pooling and Timeouts

Superset can overwhelm a data source with too many concurrent queries. Protect yourself:

Set query timeouts: In Superset’s data source settings, set Query Timeout to 120 seconds. Long-running queries fail gracefully instead of hanging.

Limit concurrent queries: In superset_config.py, set:

SUPERSET_SQLLAB_ASYNC = True  # Run queries asynchronously
SQLLAB_TIMEOUT = 120  # 2 minutes

Use connection pooling on the data source: ClickHouse, PostgreSQL, and most databases support connection pooling. Configure max connections to 100–200.

For data warehouses (Snowflake, BigQuery), use their native query queuing and cost controls. Superset doesn’t rate-limit by default—you need to enforce it at the data source level.

Secrets Management and Environment Variables

Superset needs secrets: database passwords, API keys, secret keys for session signing. Fly.io provides a secrets store.

Setting Secrets

Generate a strong secret key:

python3 -c "import secrets; print(secrets.token_hex(32))"

Set it as a Fly secret:

fly secrets set SECRET_KEY=<your-generated-key>

Set other secrets:

fly secrets set DATABASE_URL="postgresql://superset:password@superset-db.internal:5432/superset"
fly secrets set REDIS_URL="redis://superset-cache.internal:6379/0"
fly secrets set ADMIN_USERNAME="admin"
fly secrets set ADMIN_PASSWORD="<strong-password>"

Fly encrypts these at rest and injects them as environment variables when the container starts. They’re never logged or exposed in fly.toml.

Environment Variable Naming

In superset_config.py, read secrets as environment variables:

SECRET_KEY = os.environ.get('SECRET_KEY')
SQLALCHEMY_DATABASE_URI = os.environ.get('DATABASE_URL')

For Superset-specific configuration, use the SUPERSET_ prefix:

fly secrets set SUPERSET_SQLLAB_ASYNC=true
fly secrets set SUPERSET_RESULTS_BACKEND_USE_MSGPACK=true

These are read by Superset’s configuration loader automatically.

Rotating Secrets

To rotate a secret (e.g., database password):

Change the password in your database or service.

Update the Fly secret:

fly secrets set DATABASE_URL="postgresql://superset:new-password@superset-db.internal:5432/superset"

Fly automatically redeploys all instances with the new secret.

This is seamless—no manual restarts needed.

Persistent Storage and Volume Management

Superset is stateless: it doesn’t store anything on disk that can’t be recreated. However, you may want persistent storage for:

Uploaded CSV files (for data source creation).
Custom plugins or extensions.
SSL certificates (though Fly.io handles these automatically).

Fly Volumes

Fly Volumes are block storage attached to machines. Create one:

fly volumes create superset_data --size 10 --region syd

Mount it in fly.toml:

[[mounts]]
  source = "superset_data"
  destination = "/data"
  processes = ["app"]

In superset_config.py, configure Superset to use the volume:

import os

UPLOAD_FOLDER = '/data/uploads'
if not os.path.exists(UPLOAD_FOLDER):
    os.makedirs(UPLOAD_FOLDER, exist_ok=True)

FILE_UPLOAD_FOLDER = UPLOAD_FOLDER
IMPORT_ALLOWED_DATA_TYPES = ['csv', 'json', 'parquet']

Important: Volumes are attached to a single machine. If you scale to multiple instances, only one instance can write to the volume at a time. For multi-instance deployments, use object storage (AWS S3, Google Cloud Storage) instead:

FILE_UPLOAD_FOLDER = 's3://my-bucket/superset-uploads'

For most teams, object storage is the right choice. It’s cheaper, more reliable, and scales horizontally. Volumes are useful only for single-instance deployments or as temporary scratch space.

Teams at Platform Development in Denver and Platform Development in Atlanta often use S3 for file uploads and Superset caching, centralising storage and reducing per-instance complexity.

Autoscaling and Performance Tuning

As traffic grows, Fly.io can automatically add instances. But you need to configure it correctly.

Autoscaling Policy

In fly.toml:

[http_service]
  internal_port = 8088
  force_https = true
  auto_stop_machines = "stop"
  auto_start_machines = true
  min_machines_running = 2
  processes = ["app"]

[[vm]]
  size = "shared-cpu-2x"
  memory_mb = 1024
  processes = ["app"]

This tells Fly:

Keep at least 2 instances running.
Stop unused instances (to save money).
Start new instances when traffic increases.
Each instance has 2 vCPU and 1 GB RAM.

Fly monitors CPU and memory. When average CPU exceeds 80%, it starts a new instance. When it drops below 20%, it stops one.

You can also set explicit autoscaling rules:

[[metrics]]
  name = "cpu"
  threshold = 75
  scale_down_threshold = 25
  scale_down_count = 1
  scale_up_count = 2

This scales up aggressively (add 2 instances) when CPU hits 75%, and scales down slowly (remove 1 instance) when it drops to 25%.

Gunicorn Tuning

Gunicorn is the WSGI server that runs Superset. In your Dockerfile, configure it:

CMD ["gunicorn", \
     "--workers", "4", \
     "--worker-class", "gthread", \
     "--threads", "2", \
     "--worker-tmp-dir", "/dev/shm", \
     "--max-requests", "1000", \
     "--max-requests-jitter", "100", \
     "--bind", "0.0.0.0:8088", \
     "superset.app:create_app()"]

Explanation:

workers=4: 4 worker processes. For a 2 vCPU machine, 4 is a good default. Formula: (2 * CPU_count) + 1 = 5, rounded down to 4.
worker-class=gthread: Use threaded workers (good for I/O-bound workloads like Superset).
threads=2: Each worker has 2 threads. Total concurrency: 4 workers × 2 threads = 8 concurrent requests.
worker-tmp-dir=/dev/shm: Use shared memory for temporary files (faster than disk).
max-requests=1000: Restart workers after 1000 requests to prevent memory leaks.
max-requests-jitter=100: Randomise the restart count (±100) to avoid all workers restarting simultaneously.

For a performance-1x machine (2 vCPU, 4 GB RAM), use:

CMD ["gunicorn", "--workers", "5", "--threads", "4", ...]

For a performance-2x machine (4 vCPU, 8 GB RAM), use:

CMD ["gunicorn", "--workers", "9", "--threads", "4", ...]

Database Connection Pooling

As Superset scales, each instance opens connections to the metadata database. With 5 instances, each with 20 connections, that’s 100 concurrent connections. PostgreSQL’s default max is 100—you’ll hit the limit.

Increase PostgreSQL’s max connections:

fly postgres config update --max-connections 200

Or use a connection pooler (PgBouncer) between Superset and PostgreSQL:

fly postgres config update --pooling-mode transaction

Transaction mode means PgBouncer returns the connection to the pool after each query, reducing the total connections needed.

Caching Strategy

Redis caching is critical for performance. Configure it aggressively:

In superset_config.py:

CACHE_CONFIG = {
    'CACHE_TYPE': 'RedisCache',
    'CACHE_REDIS_URL': os.environ.get('REDIS_URL'),
    'CACHE_DEFAULT_TIMEOUT': 600,  # 10 minutes
    'CACHE_KEY_PREFIX': 'superset_',
}

# Cache query results
SQLLAB_QUERY_COST_ESTIMATION_ENABLED = False  # Disable for speed
RESULTS_BACKEND = 'cache'
RESULTS_BACKEND_USE_MSGPACK = True  # Compress cached results

Set cache TTL based on your data freshness requirements:

Real-time dashboards: 60 seconds.
Operational dashboards: 300 seconds (5 minutes).
Reporting dashboards: 3600 seconds (1 hour).

Higher TTL = lower database load, but stale data. Strike a balance.

Monitoring, Logging, and Health Checks

A production Superset deployment needs visibility into what’s happening.

Health Checks

Fly.io pings the /health endpoint every 30 seconds. Superset exposes this by default, but you should verify it:

fly ssh console -s
curl http://localhost:8088/health

You should see:

{"status": "ok"}

If the health check fails 3 times in a row, Fly.io replaces the instance. This is your automatic recovery mechanism.

You can also create a custom health check endpoint:

from flask import jsonify
from superset.app import create_app

app = create_app()

@app.route('/health')
def health():
    try:
        # Check database connectivity
        db.session.execute('SELECT 1')
        # Check Redis connectivity
        cache.get('test')
        return jsonify({'status': 'ok'}), 200
    except Exception as e:
        return jsonify({'status': 'error', 'message': str(e)}), 500

This health check verifies that both the database and cache are reachable. If either fails, the instance is marked unhealthy and replaced.

Application Logging

Fly.io captures stdout and stderr. Configure Superset to log to stdout:

In superset_config.py:

import logging
import sys

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    stream=sys.stdout
)

View logs in real-time:

fly logs -a superset-prod

For production, ship logs to a centralised system (e.g., Datadog, New Relic, CloudWatch):

import logging
from pythonjsonlogger import jsonlogger

logger = logging.getLogger()
logHandler = logging.StreamHandler(sys.stdout)
formatter = jsonlogger.JsonFormatter()
logHandler.setFormatter(formatter)
logger.addHandler(logHandler)

JSON-formatted logs are easier to parse and search.

Metrics and Observability

Superset exposes Prometheus metrics at /metrics. Scrape them with Prometheus or a managed service:

curl http://localhost:8088/metrics

Key metrics to monitor:

superset_query_execution_time_seconds: How long queries take.
superset_cache_hit_rate: Percentage of queries served from cache.
superset_database_connection_errors_total: Database connectivity issues.
flask_http_request_duration_seconds: HTTP request latency.
flask_http_request_total: Request volume by endpoint.

Set up alerts:

High query latency (>5 seconds): Investigate slow queries or data source issues.
Low cache hit rate (<50%): Consider increasing cache TTL or improving cache strategy.
Database connection errors: Check database availability and connection pool settings.
High error rate (>5% of requests): Check logs for application errors.

Fly.io Metrics

Fly.io provides built-in metrics:

fly status -a superset-prod

This shows:

Instance status (running, stopped, crashed).
CPU and memory usage.
Network I/O.
Restarts and uptime.

For detailed metrics, use Fly’s Grafana dashboard or export to Prometheus.

Operational Habits for Production Stability

Technical configuration is only half the battle. Operational discipline keeps systems stable.

Deployment Checklist

Before deploying to production:

Test locally: Build the Docker image and run it locally.

docker build -t superset-local .
docker run -p 8088:8088 -e SECRET_KEY=test superset-local

Test in staging: Deploy to a staging Fly app with production-like data.
```
fly deploy -a superset-staging
```
Run migrations: Superset metadata database schema changes require migrations.
```
fly ssh console -a superset-prod
superset db upgrade
```
Verify health checks: Confirm the app is healthy before routing traffic.
```
fly status -a superset-prod
```
Monitor logs: Watch for errors during the first 10 minutes.
```
fly logs -a superset-prod --follow
```

Backup and Recovery

Your metadata database is irreplaceable. Back it up:

fly postgres backup create --app superset-db

Fly Postgres automatically creates daily backups, retained for 7 days. For longer retention, export to S3:

fly ssh console -a superset-db
pg_dump -U postgres superset | gzip | aws s3 cp - s3://my-bucket/superset-backups/$(date +%Y%m%d).sql.gz

Test restores quarterly to ensure backups are valid.

Scaling Decisions

Monitor these metrics to decide when to scale:

CPU utilisation: If consistently >70%, upgrade machine size or add instances.
Memory utilisation: If >80%, upgrade machine size or reduce Gunicorn workers.
Database connections: If approaching max, increase PostgreSQL max_connections or use PgBouncer.
Query latency: If >2 seconds, investigate slow queries, add cache, or optimise data source indexes.

Scale proactively, not reactively. If you know traffic spikes at month-end, increase min_machines_running a day before.

Security Hygiene

Rotate secrets: Change database passwords and API keys every 90 days.
Update dependencies: Run pip install --upgrade monthly and rebuild the Docker image.
Review access: Audit who has Superset admin access. Remove inactive users.
Monitor failed logins: Configure Superset to log failed authentication attempts.
Enforce HTTPS: Fly.io does this by default (force_https = true).

Incident Response

When things break:

Check health: fly status -a superset-prod.
Review logs: fly logs -a superset-prod -n 100.
Check data sources: Can Superset reach the database? fly ssh console and test connectivity.
Rollback if needed: fly releases shows recent deployments. fly releases rollback reverts to the previous version.
Post-mortem: After resolution, document the root cause and prevention steps.

For critical issues, have a runbook:

Database unavailable: Check database status. Trigger a failover if using managed PostgreSQL.
Cache unavailable: Superset degrades gracefully—queries hit the database directly (slower).
App crashes: Fly.io auto-restarts. If crashes persist, check logs for memory leaks or configuration errors.
Slow queries: Kill long-running queries in the data source. Increase cache TTL. Optimise dashboard queries.

Troubleshooting Common Issues

Issue: Superset Won’t Start

Symptoms: App restarts repeatedly, health check fails.

Diagnosis:

fly logs -a superset-prod -n 50

Common causes:

Bad SECRET_KEY: Verify it’s set: fly secrets list.
Database unreachable: Check DATABASE_URL and network connectivity.
Migration failed: Run fly ssh console and manually run superset db upgrade.

Issue: Slow Queries

Symptoms: Dashboards take >5 seconds to load.

Diagnosis:

Check cache hit rate: fly ssh console and query Redis.
Check database query time: Enable slow query logging on your data source.
Check Superset logs for query execution time.

Fix:

Increase cache TTL in superset_config.py.
Optimise the underlying database query (add indexes, denormalise).
Use materialized views or data warehouse aggregations.

Issue: High Memory Usage

Symptoms: Instances crash with OOM (out of memory) errors.

Diagnosis:

fly status -a superset-prod

Look for memory usage >90%.

Fix:

Reduce Gunicorn workers (fewer concurrent requests).
Upgrade to a larger machine size (e.g., performance-1x).
Limit result set size: ROW_LIMIT = 10000 in superset_config.py.

Issue: Database Connection Pool Exhausted

Symptoms: “too many connections” errors in logs.

Diagnosis:

fly ssh console -a superset-prod
psql $DATABASE_URL -c "SELECT count(*) FROM pg_stat_activity;"

If the count is near PostgreSQL’s max_connections, the pool is exhausted.

Fix:

Increase PostgreSQL max_connections: fly postgres config update --max-connections 300.
Use PgBouncer: fly postgres config update --pooling-mode transaction.
Reduce SQLALCHEMY_POOL_SIZE in superset_config.py.

Issue: Data Source Connection Fails

Symptoms: “Could not connect to data source” error when creating a data source.

Diagnosis:

Verify the hostname and port are correct.
Check firewall rules on the data source (is it allowing connections from Fly.io?).

Test connectivity from the Superset container:

fly ssh console -a superset-prod
nc -zv clickhouse.example.com 8443

Fix:

Add Fly.io’s IP range to the data source’s firewall.
Use a bastion host or VPN if the data source is private.
For Fly Postgres, use internal DNS: superset-db.internal.

Next Steps and Scaling Considerations

You’ve deployed Superset on Fly.io. What’s next?

Immediate Actions

Create dashboards: Start with one dashboard to validate the setup.
Set up alerts: Configure email or Slack notifications for dashboard changes.
Document runbooks: Write down how to deploy, scale, and recover from failures.
Schedule backups: Ensure database backups are automated and tested.

Advanced Patterns

Multi-Region Deployment

Fly.io supports multi-region deployments. Deploy instances in multiple regions for lower latency and higher availability:

[env]
  PRIMARY_REGION = "syd"

[[regions]]
  name = "syd"
  count = 2

[[regions]]
  name = "sfo"
  count = 1

Users in Sydney connect to the Sydney instance (low latency). Users in San Francisco connect to the San Francisco instance. Metadata is synced across regions via the shared PostgreSQL database.

Custom Plugins and Extensions

Superset is extensible. Add custom visualisations, data sources, or authentication:

COPY custom_plugins/ /app/superset/extensions/
RUN cd /app && superset load-examples

Integration with Superset + ClickHouse

For teams at Platform Development in United States and elsewhere, Superset + ClickHouse is a powerful combination. ClickHouse is a columnar database optimised for analytics. Superset provides the UI. Together, they replace per-seat BI tools at a fraction of the cost.

Configure Superset to query ClickHouse:

In Superset, create a new data source.
Select “ClickHouse” as the database type.
Enter the ClickHouse hostname, port, username, and password.
Create tables and datasets.
Build dashboards.

For large deployments, use What is a Container? - Docker to understand containerisation, and The Twelve-Factor App to understand stateless application design principles that underpin this deployment pattern.

Operational Maturity

As your Superset deployment grows, invest in:

Observability: Ship metrics to Datadog or New Relic. Set up dashboards and alerts.
Incident management: Use PagerDuty or similar for on-call rotation.
Change management: Document all configuration changes. Use git for version control.
Capacity planning: Model growth and plan upgrades 3–6 months in advance.

When to Engage Professional Support

If you’re running Superset at scale (1000+ concurrent users, multiple data sources, complex dashboards), consider engaging a platform engineering partner. Teams at Services | PADISO and Platform Development in Australia provide fractional CTO and platform engineering support for exactly this scenario.

We help with:

Architecture review: Ensure your Superset deployment is optimised for your workload.
Performance tuning: Optimise queries, caching, and database configuration.
Security hardening: Implement SOC 2 and ISO 27001 compliance via Security Audit | PADISO.
Scaling strategy: Plan for growth and multi-region deployment.

For a rapid assessment, book an AI Quickstart Audit | PADISO — a fixed-fee 2-week diagnostic that tells you where you are, what to ship first, and what 90 days could unlock.

Summary

Apache Superset on Fly.io is a production-grade, cost-effective platform for embedded analytics and BI. This guide covers:

Why this combination works: Stateless Superset + container-native Fly.io = fast, scalable, reliable.
Architecture: Web app, metadata database, Redis cache, data sources.
Deployment: Dockerfile, fly.toml, secrets, health checks.
Networking: Internal DNS for managed services, outbound connections for external data sources.
Autoscaling: Gunicorn tuning, connection pooling, caching strategy.
Monitoring: Logs, metrics, health checks, incident response.
Operations: Backups, security, scaling decisions, troubleshooting.

The pattern scales from 10 to 10,000 concurrent users. Start small, monitor closely, and scale as needed.

If you’re building a multi-tenant SaaS platform, a data-driven operations centre, or replacing per-seat BI tools, Superset on Fly.io is a proven, cost-effective choice. For teams in Australia, the Sydney region provides sub-millisecond latency to local data sources and users.

Deploy with confidence. Monitor relentlessly. Scale deliberately.

Want to talk through your situation?

Book a 30-minute call with Kevin (Founder/CEO). No pitch - direct advice on what to do next.

Book a 30-min call

Apache Superset on Fly.io: Reference Deployment Pattern

Table of Contents

Why Superset on Fly.io

Architecture Overview

Components and Data Flow

Why This Pattern Works at Scale

Prerequisites and Setup

Initial Fly.io Setup

Database and Cache Infrastructure

Building the Superset Container

Dockerfile Strategy

Superset Configuration File

Configuring Fly.io for Production

fly.toml Essentials

Networking and Database Configuration

Internal Networking with Fly Postgres

External Data Source Connectivity

Connection Pooling and Timeouts

Secrets Management and Environment Variables

Setting Secrets

Environment Variable Naming

Rotating Secrets

Persistent Storage and Volume Management

Fly Volumes

Autoscaling and Performance Tuning

Autoscaling Policy

Gunicorn Tuning

Database Connection Pooling

Caching Strategy

Monitoring, Logging, and Health Checks

Health Checks

Application Logging

Metrics and Observability

Fly.io Metrics

Operational Habits for Production Stability

Deployment Checklist

Backup and Recovery

Scaling Decisions

Security Hygiene

Incident Response

Troubleshooting Common Issues

Issue: Superset Won’t Start

Issue: Slow Queries

Issue: High Memory Usage

Issue: Database Connection Pool Exhausted

Issue: Data Source Connection Fails

Next Steps and Scaling Considerations

Immediate Actions

Advanced Patterns

Operational Maturity

When to Engage Professional Support

Summary

Want to talk through your situation?