Table of Contents
- Why Superset on Fly.io
- Architecture Overview
- Prerequisites and Setup
- Building the Superset Container
- Configuring Fly.io for Production
- Networking and Database Configuration
- Secrets Management and Environment Variables
- Persistent Storage and Volume Management
- Autoscaling and Performance Tuning
- Monitoring, Logging, and Health Checks
- Operational Habits for Production Stability
- Troubleshooting Common Issues
- Next Steps and Scaling Considerations
Why Superset on Fly.io
Apache Superset is a modern, open-source data visualisation and business intelligence platform. Fly.io is a container-native platform that lets you deploy applications globally with minimal infrastructure overhead. Together, they form a compelling foundation for teams that need embedded analytics, dashboard-driven insights, or a lightweight BI layer without the operational burden of managing Kubernetes or traditional cloud infrastructure.
Why this combination works: Superset is stateless by design—it stores metadata in a database and serves dashboards from memory. Fly.io is built on Firecracker VMs and Docker containers, making it ideal for stateless workloads. You get sub-second deploys, automatic rollbacks, and a pricing model that scales with actual usage rather than reserved capacity.
At PADISO, we’ve deployed Superset at scale across financial services, retail, and government teams in Australia and the US. We use Superset as the embedded analytics layer in multi-tenant SaaS platforms, as the BI backbone for data-driven operations, and as a cost-efficient replacement for per-seat BI tools. This guide captures the reference pattern we’ve refined across 50+ production deployments.
The stakes are real: a misconfigured Superset instance can leak data, expose query credentials, or become a bottleneck during peak reporting hours. This guide walks you through the exact steps to avoid those traps.
Architecture Overview
Before you write a single line of configuration, understand the shape of the system you’re building.
Components and Data Flow
Superset comprises several moving parts:
- Web Server: The Flask application that serves dashboards, handles user sessions, and manages the UI.
- Metadata Database: Stores dashboard definitions, user accounts, data source configurations, and query cache. PostgreSQL is the production standard.
- Message Broker: Redis or Celery for background task queuing (email alerts, scheduled reports, query execution).
- Query Cache: Redis, used to cache query results and reduce load on data sources.
- Data Sources: The external databases (PostgreSQL, MySQL, Snowflake, BigQuery, ClickHouse, etc.) that Superset queries.
On Fly.io, you’ll run:
- Superset app container (one or more instances, auto-scaled)
- Metadata database (managed PostgreSQL or self-hosted on Fly Volumes)
- Redis instance (for caching and message brokering)
- Data source connectivity (direct outbound connections or private networking)
The diagram is simple:
User Browser → Fly Proxy → Superset App Instances ↔ Redis Cache
↓
PostgreSQL Metadata DB
↓
Data Source (e.g., ClickHouse)
Each Superset instance is stateless. If one crashes, Fly.io starts a replacement. If traffic spikes, Fly.io scales horizontally. The metadata database and Redis are the only stateful components, and both must be backed up and monitored.
Why This Pattern Works at Scale
We’ve used this pattern to support dashboards serving 1000+ concurrent users, query latencies under 500ms, and zero-downtime deployments. The pattern holds because:
- Superset is horizontally scalable: Add more app instances, and throughput increases linearly.
- Fly.io’s networking is fast: Sub-millisecond latency between instances and to managed databases.
- Redis is a proven cache layer: Query results cached for 5–10 minutes reduce database load by 80–90%.
- Fly Volumes provide durability: If you need persistent state on the app container (e.g., for file uploads), Volumes ensure data survives restarts.
This is the same pattern used by teams at Platform Development in Sydney, Platform Development in Melbourne, and Platform Development in Australia to replace per-seat BI tools with embedded Superset + ClickHouse analytics.
Prerequisites and Setup
You’ll need:
- A Fly.io account (free tier available, but production deployments need a paid account).
- The Fly CLI installed on your machine. Install it here.
- Docker installed locally for building and testing the container image.
- A PostgreSQL database for Superset metadata. You can use Fly Postgres or an external managed service (AWS RDS, Azure Database, etc.).
- A Redis instance for caching and message brokering. Again, Fly Redis or external.
- A data source (e.g., ClickHouse, PostgreSQL, Snowflake) that Superset will query.
- A domain name and SSL certificate (Fly.io provides free certificates via Let’s Encrypt).
Initial Fly.io Setup
Log in to Fly.io:
fly auth login
Create a new Fly app:
fly apps create superset-prod
This reserves the app name and creates a fly.toml configuration file in your project root. You’ll edit this file extensively.
Database and Cache Infrastructure
For a production deployment, you have two paths:
Path A: Managed Services (Recommended for most teams)
Use Fly Postgres and Fly Redis. They’re integrated, highly available, and require minimal configuration:
fly postgres create --name superset-db --initial-cluster-size 3
fly redis create --name superset-cache
Fly will output connection strings. Save these—you’ll need them in your secrets.
Path B: External Managed Services
If you already have PostgreSQL on AWS RDS and Redis on ElastiCache, you can connect directly. This works well if you want to centralise infrastructure or use region-specific services. The connection strings go into Fly secrets (covered below).
For data sources, Superset connects outbound. If your data source is in a private VPC, you’ll need to either:
- Expose it to the internet (with strong authentication).
- Use Fly’s private networking to connect to another Fly app.
- Set up a VPN or bastion host.
For teams at Platform Development in Canberra working with government and defence, we often use IRAP-aligned private networking and bastion hosts to keep data sources off the public internet.
Building the Superset Container
Superset ships as a Docker image, but you’ll want to customise it for your environment: add drivers for your data sources, bake in configuration, and optimise for production.
Dockerfile Strategy
Here’s a production-grade Dockerfile:
FROM apache/superset:latest-dev
# Install additional database drivers
RUN pip install --no-cache-dir \
psycopg2-binary \
pymysql \
clickhouse-driver \
snowflake-sqlalchemy \
google-cloud-bigquery \
pyarrow
# Copy custom configuration
COPY superset_config.py /app/superset_config.py
# Set environment variables for production
ENV SUPERSET_CONFIG_PATH=/app/superset_config.py
ENV FLASK_ENV=production
ENV PYTHONUNBUFFERED=1
# Expose port
EXPOSE 8088
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
CMD curl -f http://localhost:8088/health || exit 1
CMD ["gunicorn", "--workers", "4", "--worker-class", "gthread", "--threads", "2", "--bind", "0.0.0.0:8088", "superset.app:create_app()"]
Key decisions:
- Database drivers: Install only the drivers you need. Each adds size and attack surface.
- Custom config file: Bake in sensible defaults for production (secret key, cache configuration, feature flags).
- Gunicorn with threading: Use 4 workers with 2 threads each. This balances memory and concurrency. Adjust based on Fly machine size.
- Health check: Fly.io uses this to detect crashed instances and replace them.
For Installing Superset Using Docker Compose - Apache Superset, the official documentation is a good starting point, but production deployments need the hardening shown above.
Superset Configuration File
Create superset_config.py in your project root:
import os
from datetime import timedelta
# Flask app config
SECRET_KEY = os.environ.get('SECRET_KEY', 'dev-key-change-in-production')
WTF_CSRF_ENABLED = True
WTF_CSRF_EXEMPT_LIST = []
# Database
SQLALCHEMY_DATABASE_URI = os.environ.get(
'DATABASE_URL',
'postgresql://user:password@localhost:5432/superset'
)
SQLALCHEMY_TRACK_MODIFICATIONS = False
SQLALCHEMY_ECHO = False
SQLALCHEMY_POOL_SIZE = 20
SQLALCHEMY_POOL_RECYCLE = 3600
SQLALCHEMY_POOL_TIMEOUT = 30
# Cache
CACHE_DEFAULT_TIMEOUT = 300 # 5 minutes
CACHE_CONFIG = {
'CACHE_TYPE': 'RedisCache',
'CACHE_REDIS_URL': os.environ.get('REDIS_URL', 'redis://localhost:6379/0'),
'CACHE_DEFAULT_TIMEOUT': 300,
}
# Celery (background tasks)
CELERY_BROKER_URL = os.environ.get('REDIS_URL', 'redis://localhost:6379/1')
CELERY_RESULT_BACKEND = os.environ.get('REDIS_URL', 'redis://localhost:6379/2')
CELERY_TASK_TRACK_STARTED = True
CELERY_TASK_TIME_LIMIT = 30 * 60 # 30 minutes
CELERY_TASK_SOFT_TIME_LIMIT = 25 * 60 # 25 minutes
# Security
SUPERSET_WEBSERVER_PROTOCOL = 'https'
SUPERSET_WEBSERVER_PORT = 8088
CORS_ORIGINS = os.environ.get('CORS_ORIGINS', '').split(',')
ALLOWED_EXTENSIONS = {'csv', 'json', 'xlsx'}
MAX_CONTENT_LENGTH = 50 * 1024 * 1024 # 50 MB
# Feature flags
FEATURE_FLAGS = {
'ALERT_REPORTS': True,
'DASHBOARD_RBAC': True,
'ENABLE_TEMPLATE_PROCESSING': True,
'VERSIONED_EXPORT': True,
}
# Logging
LOG_FORMAT = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
LOG_LEVEL = 'INFO'
# Session
SESSION_COOKIE_SECURE = True
SESSION_COOKIE_HTTPONLY = True
SESSION_COOKIE_SAMESITE = 'Lax'
PERMANENT_SESSION_LIFETIME = timedelta(hours=24)
This configuration:
- Reads secrets from environment variables (covered below).
- Configures PostgreSQL with connection pooling (essential for production).
- Points to Redis for caching and Celery message brokering.
- Sets security headers and session cookies.
- Enables critical feature flags (RBAC, alerts, versioned exports).
The pool settings are crucial: SQLALCHEMY_POOL_SIZE=20 means each Superset instance maintains up to 20 connections to the metadata database. With 3 instances, that’s 60 concurrent connections. Adjust based on your database’s max_connections setting.
Configuring Fly.io for Production
Now you’ll configure fly.toml and deploy.
fly.toml Essentials
Here’s a production-grade fly.toml:
app = "superset-prod"
primary_region = "syd"
[build]
image = "superset-prod:latest"
dockerfile = "Dockerfile"
[env]
SUPERSET_ENV = "production"
LOG_LEVEL = "INFO"
WORKERS = "4"
[[services]]
internal_port = 8088
protocol = "tcp"
[[services.ports]]
port = 80
handlers = ["http"]
[[services.ports]]
port = 443
handlers = ["tls", "http"]
[http_service]
internal_port = 8088
force_https = true
auto_stop_machines = "stop"
auto_start_machines = true
min_machines_running = 2
processes = ["app"]
[[vm]]
size = "shared-cpu-2x"
memory_mb = 1024
processes = ["app"]
[[processes]]
name = "app"
cmd = ["gunicorn", "--workers", "4", "--worker-class", "gthread", "--threads", "2", "--bind", "0.0.0.0:8088", "superset.app:create_app()"]
[checks]
[checks.http]
type = "http"
interval = "30s"
timeout = "5s"
grace_period = "40s"
method = "GET"
path = "/health"
expected_http_status = 200
Key settings:
- primary_region: Set to your closest Fly region (e.g.,
sydfor Sydney,sfofor San Francisco). This is where new instances start. - min_machines_running: Set to 2 for high availability. Fly will always keep at least 2 instances running.
- auto_start_machines: Allows Fly to spin up additional instances during traffic spikes.
- vm.size:
shared-cpu-2xis a good starting point (2 vCPU, 1 GB RAM). For heavy workloads, useperformance-1xor larger. - health check: Fly pings
/healthevery 30 seconds. If it fails 3 times, the instance is replaced.
For teams using Deploy an app - Fly Docs, the official deployment guide covers the basics, but this configuration adds production hardening.
Networking and Database Configuration
Superset needs to reach two databases: the metadata database (PostgreSQL) and your data sources (ClickHouse, Snowflake, etc.).
Internal Networking with Fly Postgres
If you created Fly Postgres and Fly Redis above, they’re automatically discoverable via internal DNS. In your fly.toml, add:
[[services]]
internal_port = 8088
protocol = "tcp"
[[services.ports]]
port = 80
handlers = ["http"]
[[services.ports]]
port = 443
handlers = ["tls", "http"]
[env]
DATABASE_URL = "postgresql://superset:password@superset-db.internal:5432/superset"
REDIS_URL = "redis://superset-cache.internal:6379"
Fly’s internal DNS resolves superset-db.internal and superset-cache.internal automatically within your Fly organisation. This keeps traffic off the public internet and reduces latency.
External Data Source Connectivity
For data sources outside Fly (e.g., a ClickHouse cluster on AWS), Superset connects outbound over the public internet. This is fine as long as:
- The data source requires authentication (username, password, API key).
- The connection is encrypted (TLS/SSL).
- Network access is restricted (firewall rules, IP allowlisting).
In Superset, you’ll create a data source connection with:
- Host: The public hostname of your data source (e.g.,
clickhouse.example.com). - Port: Typically 8443 for ClickHouse over HTTPS.
- Username and Password: Stored encrypted in the Superset metadata database.
- SSL Mode: Enabled.
For sensitive deployments (government, finance), consider:
- VPN tunnels: Route data source traffic through a VPN appliance.
- Bastion hosts: Use an SSH tunnel or HTTP proxy to reach private data sources.
- Fly private networking: If your data source is also on Fly, use internal DNS.
Teams at Platform Development in San Francisco and Platform Development in Seattle often use bastion hosts for multi-tenant SaaS platforms where data isolation is critical.
Connection Pooling and Timeouts
Superset can overwhelm a data source with too many concurrent queries. Protect yourself:
- Set query timeouts: In Superset’s data source settings, set
Query Timeoutto 120 seconds. Long-running queries fail gracefully instead of hanging. - Limit concurrent queries: In
superset_config.py, set:SUPERSET_SQLLAB_ASYNC = True # Run queries asynchronously SQLLAB_TIMEOUT = 120 # 2 minutes - Use connection pooling on the data source: ClickHouse, PostgreSQL, and most databases support connection pooling. Configure max connections to 100–200.
For data warehouses (Snowflake, BigQuery), use their native query queuing and cost controls. Superset doesn’t rate-limit by default—you need to enforce it at the data source level.
Secrets Management and Environment Variables
Superset needs secrets: database passwords, API keys, secret keys for session signing. Fly.io provides a secrets store.
Setting Secrets
Generate a strong secret key:
python3 -c "import secrets; print(secrets.token_hex(32))"
Set it as a Fly secret:
fly secrets set SECRET_KEY=<your-generated-key>
Set other secrets:
fly secrets set DATABASE_URL="postgresql://superset:password@superset-db.internal:5432/superset"
fly secrets set REDIS_URL="redis://superset-cache.internal:6379/0"
fly secrets set ADMIN_USERNAME="admin"
fly secrets set ADMIN_PASSWORD="<strong-password>"
Fly encrypts these at rest and injects them as environment variables when the container starts. They’re never logged or exposed in fly.toml.
Environment Variable Naming
In superset_config.py, read secrets as environment variables:
SECRET_KEY = os.environ.get('SECRET_KEY')
SQLALCHEMY_DATABASE_URI = os.environ.get('DATABASE_URL')
For Superset-specific configuration, use the SUPERSET_ prefix:
fly secrets set SUPERSET_SQLLAB_ASYNC=true
fly secrets set SUPERSET_RESULTS_BACKEND_USE_MSGPACK=true
These are read by Superset’s configuration loader automatically.
Rotating Secrets
To rotate a secret (e.g., database password):
- Change the password in your database or service.
- Update the Fly secret:
fly secrets set DATABASE_URL="postgresql://superset:new-password@superset-db.internal:5432/superset" - Fly automatically redeploys all instances with the new secret.
This is seamless—no manual restarts needed.
Persistent Storage and Volume Management
Superset is stateless: it doesn’t store anything on disk that can’t be recreated. However, you may want persistent storage for:
- Uploaded CSV files (for data source creation).
- Custom plugins or extensions.
- SSL certificates (though Fly.io handles these automatically).
Fly Volumes
Fly Volumes are block storage attached to machines. Create one:
fly volumes create superset_data --size 10 --region syd
Mount it in fly.toml:
[[mounts]]
source = "superset_data"
destination = "/data"
processes = ["app"]
In superset_config.py, configure Superset to use the volume:
import os
UPLOAD_FOLDER = '/data/uploads'
if not os.path.exists(UPLOAD_FOLDER):
os.makedirs(UPLOAD_FOLDER, exist_ok=True)
FILE_UPLOAD_FOLDER = UPLOAD_FOLDER
IMPORT_ALLOWED_DATA_TYPES = ['csv', 'json', 'parquet']
Important: Volumes are attached to a single machine. If you scale to multiple instances, only one instance can write to the volume at a time. For multi-instance deployments, use object storage (AWS S3, Google Cloud Storage) instead:
FILE_UPLOAD_FOLDER = 's3://my-bucket/superset-uploads'
For most teams, object storage is the right choice. It’s cheaper, more reliable, and scales horizontally. Volumes are useful only for single-instance deployments or as temporary scratch space.
Teams at Platform Development in Denver and Platform Development in Atlanta often use S3 for file uploads and Superset caching, centralising storage and reducing per-instance complexity.
Autoscaling and Performance Tuning
As traffic grows, Fly.io can automatically add instances. But you need to configure it correctly.
Autoscaling Policy
In fly.toml:
[http_service]
internal_port = 8088
force_https = true
auto_stop_machines = "stop"
auto_start_machines = true
min_machines_running = 2
processes = ["app"]
[[vm]]
size = "shared-cpu-2x"
memory_mb = 1024
processes = ["app"]
This tells Fly:
- Keep at least 2 instances running.
- Stop unused instances (to save money).
- Start new instances when traffic increases.
- Each instance has 2 vCPU and 1 GB RAM.
Fly monitors CPU and memory. When average CPU exceeds 80%, it starts a new instance. When it drops below 20%, it stops one.
You can also set explicit autoscaling rules:
[[metrics]]
name = "cpu"
threshold = 75
scale_down_threshold = 25
scale_down_count = 1
scale_up_count = 2
This scales up aggressively (add 2 instances) when CPU hits 75%, and scales down slowly (remove 1 instance) when it drops to 25%.
Gunicorn Tuning
Gunicorn is the WSGI server that runs Superset. In your Dockerfile, configure it:
CMD ["gunicorn", \
"--workers", "4", \
"--worker-class", "gthread", \
"--threads", "2", \
"--worker-tmp-dir", "/dev/shm", \
"--max-requests", "1000", \
"--max-requests-jitter", "100", \
"--bind", "0.0.0.0:8088", \
"superset.app:create_app()"]
Explanation:
- workers=4: 4 worker processes. For a 2 vCPU machine, 4 is a good default. Formula:
(2 * CPU_count) + 1 = 5, rounded down to 4. - worker-class=gthread: Use threaded workers (good for I/O-bound workloads like Superset).
- threads=2: Each worker has 2 threads. Total concurrency: 4 workers × 2 threads = 8 concurrent requests.
- worker-tmp-dir=/dev/shm: Use shared memory for temporary files (faster than disk).
- max-requests=1000: Restart workers after 1000 requests to prevent memory leaks.
- max-requests-jitter=100: Randomise the restart count (±100) to avoid all workers restarting simultaneously.
For a performance-1x machine (2 vCPU, 4 GB RAM), use:
CMD ["gunicorn", "--workers", "5", "--threads", "4", ...]
For a performance-2x machine (4 vCPU, 8 GB RAM), use:
CMD ["gunicorn", "--workers", "9", "--threads", "4", ...]
Database Connection Pooling
As Superset scales, each instance opens connections to the metadata database. With 5 instances, each with 20 connections, that’s 100 concurrent connections. PostgreSQL’s default max is 100—you’ll hit the limit.
Increase PostgreSQL’s max connections:
fly postgres config update --max-connections 200
Or use a connection pooler (PgBouncer) between Superset and PostgreSQL:
fly postgres config update --pooling-mode transaction
Transaction mode means PgBouncer returns the connection to the pool after each query, reducing the total connections needed.
Caching Strategy
Redis caching is critical for performance. Configure it aggressively:
In superset_config.py:
CACHE_CONFIG = {
'CACHE_TYPE': 'RedisCache',
'CACHE_REDIS_URL': os.environ.get('REDIS_URL'),
'CACHE_DEFAULT_TIMEOUT': 600, # 10 minutes
'CACHE_KEY_PREFIX': 'superset_',
}
# Cache query results
SQLLAB_QUERY_COST_ESTIMATION_ENABLED = False # Disable for speed
RESULTS_BACKEND = 'cache'
RESULTS_BACKEND_USE_MSGPACK = True # Compress cached results
Set cache TTL based on your data freshness requirements:
- Real-time dashboards: 60 seconds.
- Operational dashboards: 300 seconds (5 minutes).
- Reporting dashboards: 3600 seconds (1 hour).
Higher TTL = lower database load, but stale data. Strike a balance.
Monitoring, Logging, and Health Checks
A production Superset deployment needs visibility into what’s happening.
Health Checks
Fly.io pings the /health endpoint every 30 seconds. Superset exposes this by default, but you should verify it:
fly ssh console -s
curl http://localhost:8088/health
You should see:
{"status": "ok"}
If the health check fails 3 times in a row, Fly.io replaces the instance. This is your automatic recovery mechanism.
You can also create a custom health check endpoint:
from flask import jsonify
from superset.app import create_app
app = create_app()
@app.route('/health')
def health():
try:
# Check database connectivity
db.session.execute('SELECT 1')
# Check Redis connectivity
cache.get('test')
return jsonify({'status': 'ok'}), 200
except Exception as e:
return jsonify({'status': 'error', 'message': str(e)}), 500
This health check verifies that both the database and cache are reachable. If either fails, the instance is marked unhealthy and replaced.
Application Logging
Fly.io captures stdout and stderr. Configure Superset to log to stdout:
In superset_config.py:
import logging
import sys
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
stream=sys.stdout
)
View logs in real-time:
fly logs -a superset-prod
For production, ship logs to a centralised system (e.g., Datadog, New Relic, CloudWatch):
import logging
from pythonjsonlogger import jsonlogger
logger = logging.getLogger()
logHandler = logging.StreamHandler(sys.stdout)
formatter = jsonlogger.JsonFormatter()
logHandler.setFormatter(formatter)
logger.addHandler(logHandler)
JSON-formatted logs are easier to parse and search.
Metrics and Observability
Superset exposes Prometheus metrics at /metrics. Scrape them with Prometheus or a managed service:
curl http://localhost:8088/metrics
Key metrics to monitor:
superset_query_execution_time_seconds: How long queries take.superset_cache_hit_rate: Percentage of queries served from cache.superset_database_connection_errors_total: Database connectivity issues.flask_http_request_duration_seconds: HTTP request latency.flask_http_request_total: Request volume by endpoint.
Set up alerts:
- High query latency (>5 seconds): Investigate slow queries or data source issues.
- Low cache hit rate (<50%): Consider increasing cache TTL or improving cache strategy.
- Database connection errors: Check database availability and connection pool settings.
- High error rate (>5% of requests): Check logs for application errors.
Fly.io Metrics
Fly.io provides built-in metrics:
fly status -a superset-prod
This shows:
- Instance status (running, stopped, crashed).
- CPU and memory usage.
- Network I/O.
- Restarts and uptime.
For detailed metrics, use Fly’s Grafana dashboard or export to Prometheus.
Operational Habits for Production Stability
Technical configuration is only half the battle. Operational discipline keeps systems stable.
Deployment Checklist
Before deploying to production:
-
Test locally: Build the Docker image and run it locally.
docker build -t superset-local . docker run -p 8088:8088 -e SECRET_KEY=test superset-local -
Test in staging: Deploy to a staging Fly app with production-like data.
fly deploy -a superset-staging -
Run migrations: Superset metadata database schema changes require migrations.
fly ssh console -a superset-prod superset db upgrade -
Verify health checks: Confirm the app is healthy before routing traffic.
fly status -a superset-prod -
Monitor logs: Watch for errors during the first 10 minutes.
fly logs -a superset-prod --follow
Backup and Recovery
Your metadata database is irreplaceable. Back it up:
fly postgres backup create --app superset-db
Fly Postgres automatically creates daily backups, retained for 7 days. For longer retention, export to S3:
fly ssh console -a superset-db
pg_dump -U postgres superset | gzip | aws s3 cp - s3://my-bucket/superset-backups/$(date +%Y%m%d).sql.gz
Test restores quarterly to ensure backups are valid.
Scaling Decisions
Monitor these metrics to decide when to scale:
- CPU utilisation: If consistently >70%, upgrade machine size or add instances.
- Memory utilisation: If >80%, upgrade machine size or reduce Gunicorn workers.
- Database connections: If approaching max, increase PostgreSQL max_connections or use PgBouncer.
- Query latency: If >2 seconds, investigate slow queries, add cache, or optimise data source indexes.
Scale proactively, not reactively. If you know traffic spikes at month-end, increase min_machines_running a day before.
Security Hygiene
- Rotate secrets: Change database passwords and API keys every 90 days.
- Update dependencies: Run
pip install --upgrademonthly and rebuild the Docker image. - Review access: Audit who has Superset admin access. Remove inactive users.
- Monitor failed logins: Configure Superset to log failed authentication attempts.
- Enforce HTTPS: Fly.io does this by default (force_https = true).
Incident Response
When things break:
- Check health:
fly status -a superset-prod. - Review logs:
fly logs -a superset-prod -n 100. - Check data sources: Can Superset reach the database?
fly ssh consoleand test connectivity. - Rollback if needed:
fly releasesshows recent deployments.fly releases rollbackreverts to the previous version. - Post-mortem: After resolution, document the root cause and prevention steps.
For critical issues, have a runbook:
- Database unavailable: Check database status. Trigger a failover if using managed PostgreSQL.
- Cache unavailable: Superset degrades gracefully—queries hit the database directly (slower).
- App crashes: Fly.io auto-restarts. If crashes persist, check logs for memory leaks or configuration errors.
- Slow queries: Kill long-running queries in the data source. Increase cache TTL. Optimise dashboard queries.
Troubleshooting Common Issues
Issue: Superset Won’t Start
Symptoms: App restarts repeatedly, health check fails.
Diagnosis:
fly logs -a superset-prod -n 50
Common causes:
- Bad SECRET_KEY: Verify it’s set:
fly secrets list. - Database unreachable: Check DATABASE_URL and network connectivity.
- Migration failed: Run
fly ssh consoleand manually runsuperset db upgrade.
Issue: Slow Queries
Symptoms: Dashboards take >5 seconds to load.
Diagnosis:
- Check cache hit rate:
fly ssh consoleand query Redis. - Check database query time: Enable slow query logging on your data source.
- Check Superset logs for query execution time.
Fix:
- Increase cache TTL in
superset_config.py. - Optimise the underlying database query (add indexes, denormalise).
- Use materialized views or data warehouse aggregations.
Issue: High Memory Usage
Symptoms: Instances crash with OOM (out of memory) errors.
Diagnosis:
fly status -a superset-prod
Look for memory usage >90%.
Fix:
- Reduce Gunicorn workers (fewer concurrent requests).
- Upgrade to a larger machine size (e.g.,
performance-1x). - Limit result set size:
ROW_LIMIT = 10000insuperset_config.py.
Issue: Database Connection Pool Exhausted
Symptoms: “too many connections” errors in logs.
Diagnosis:
fly ssh console -a superset-prod
psql $DATABASE_URL -c "SELECT count(*) FROM pg_stat_activity;"
If the count is near PostgreSQL’s max_connections, the pool is exhausted.
Fix:
- Increase PostgreSQL max_connections:
fly postgres config update --max-connections 300. - Use PgBouncer:
fly postgres config update --pooling-mode transaction. - Reduce SQLALCHEMY_POOL_SIZE in
superset_config.py.
Issue: Data Source Connection Fails
Symptoms: “Could not connect to data source” error when creating a data source.
Diagnosis:
- Verify the hostname and port are correct.
- Check firewall rules on the data source (is it allowing connections from Fly.io?).
- Test connectivity from the Superset container:
fly ssh console -a superset-prod nc -zv clickhouse.example.com 8443
Fix:
- Add Fly.io’s IP range to the data source’s firewall.
- Use a bastion host or VPN if the data source is private.
- For Fly Postgres, use internal DNS:
superset-db.internal.
Next Steps and Scaling Considerations
You’ve deployed Superset on Fly.io. What’s next?
Immediate Actions
- Create dashboards: Start with one dashboard to validate the setup.
- Set up alerts: Configure email or Slack notifications for dashboard changes.
- Document runbooks: Write down how to deploy, scale, and recover from failures.
- Schedule backups: Ensure database backups are automated and tested.
Advanced Patterns
Multi-Region Deployment
Fly.io supports multi-region deployments. Deploy instances in multiple regions for lower latency and higher availability:
[env]
PRIMARY_REGION = "syd"
[[regions]]
name = "syd"
count = 2
[[regions]]
name = "sfo"
count = 1
Users in Sydney connect to the Sydney instance (low latency). Users in San Francisco connect to the San Francisco instance. Metadata is synced across regions via the shared PostgreSQL database.
Custom Plugins and Extensions
Superset is extensible. Add custom visualisations, data sources, or authentication:
COPY custom_plugins/ /app/superset/extensions/
RUN cd /app && superset load-examples
Integration with Superset + ClickHouse
For teams at Platform Development in United States and elsewhere, Superset + ClickHouse is a powerful combination. ClickHouse is a columnar database optimised for analytics. Superset provides the UI. Together, they replace per-seat BI tools at a fraction of the cost.
Configure Superset to query ClickHouse:
- In Superset, create a new data source.
- Select “ClickHouse” as the database type.
- Enter the ClickHouse hostname, port, username, and password.
- Create tables and datasets.
- Build dashboards.
For large deployments, use What is a Container? - Docker to understand containerisation, and The Twelve-Factor App to understand stateless application design principles that underpin this deployment pattern.
Operational Maturity
As your Superset deployment grows, invest in:
- Observability: Ship metrics to Datadog or New Relic. Set up dashboards and alerts.
- Incident management: Use PagerDuty or similar for on-call rotation.
- Change management: Document all configuration changes. Use git for version control.
- Capacity planning: Model growth and plan upgrades 3–6 months in advance.
When to Engage Professional Support
If you’re running Superset at scale (1000+ concurrent users, multiple data sources, complex dashboards), consider engaging a platform engineering partner. Teams at Services | PADISO and Platform Development in Australia provide fractional CTO and platform engineering support for exactly this scenario.
We help with:
- Architecture review: Ensure your Superset deployment is optimised for your workload.
- Performance tuning: Optimise queries, caching, and database configuration.
- Security hardening: Implement SOC 2 and ISO 27001 compliance via Security Audit | PADISO.
- Scaling strategy: Plan for growth and multi-region deployment.
For a rapid assessment, book an AI Quickstart Audit | PADISO — a fixed-fee 2-week diagnostic that tells you where you are, what to ship first, and what 90 days could unlock.
Summary
Apache Superset on Fly.io is a production-grade, cost-effective platform for embedded analytics and BI. This guide covers:
- Why this combination works: Stateless Superset + container-native Fly.io = fast, scalable, reliable.
- Architecture: Web app, metadata database, Redis cache, data sources.
- Deployment: Dockerfile, fly.toml, secrets, health checks.
- Networking: Internal DNS for managed services, outbound connections for external data sources.
- Autoscaling: Gunicorn tuning, connection pooling, caching strategy.
- Monitoring: Logs, metrics, health checks, incident response.
- Operations: Backups, security, scaling decisions, troubleshooting.
The pattern scales from 10 to 10,000 concurrent users. Start small, monitor closely, and scale as needed.
If you’re building a multi-tenant SaaS platform, a data-driven operations centre, or replacing per-seat BI tools, Superset on Fly.io is a proven, cost-effective choice. For teams in Australia, the Sydney region provides sub-millisecond latency to local data sources and users.
Deploy with confidence. Monitor relentlessly. Scale deliberately.