Table of Contents
- Why Superset on Pulumi Matters
- Pre-Deployment Architecture Decisions
- Networking and Security Foundation
- Storage, Secrets, and State Management
- Building the Superset Stack
- Autoscaling and Load Balancing
- Observability and Operational Habits
- Disaster Recovery and Backup Strategy
- Cost Optimisation and Governance
- Common Pitfalls and How to Avoid Them
- Next Steps and Scaling
Why Superset on Pulumi Matters
Apache Superset is a modern, open-source data visualisation and business intelligence platform. When deployed on Pulumi Stack, it becomes a repeatable, version-controlled, infrastructure-as-code asset that your team can ship, audit, and scale without manual configuration drift.
Pulumi lets you define cloud infrastructure using Python, TypeScript, Go, or C#. Unlike declarative tools that require learning domain-specific languages, Pulumi treats infrastructure as code in a real programming language. This means you can use loops, conditionals, functions, and version control the same way you would for application code.
For organisations running Superset across multiple environments—development, staging, production—or across multiple cloud providers, this approach saves weeks of manual deployment work and eliminates the human error that comes with point-and-click cloud consoles.
At PADISO, we’ve deployed Superset on Pulumi for teams modernising their analytics infrastructure. The pattern we’ve refined here is production-tested across insurance, retail, government, and financial services clients. Whether you’re a startup building your first analytics layer or an enterprise consolidating BI tools, this guide walks you through the complete pattern.
Pre-Deployment Architecture Decisions
Before writing a single line of Pulumi code, you need to make five critical architectural decisions. These choices shape everything downstream—cost, performance, security, and operational burden.
Cloud Provider Selection
Pulumi supports AWS, Azure, Google Cloud, Kubernetes, and others. For this guide, we’ll focus on AWS, the most common choice for analytics workloads in Australia and globally.
If you’re running government or defence workloads in Australia, you may need to deploy on AWS GovCloud or a sovereign alternative like Platform Development in Canberra | PADISO, which specialises in IRAP/PROTECTED-aligned architecture.
Compute Model: Containers or Serverless
Superset runs as a Python application. You have two main paths:
Path 1: Container-based (ECS Fargate or Kubernetes)
- You control resource allocation, scaling policies, and cost.
- Superset runs in a container, orchestrated by AWS ECS Fargate or self-managed Kubernetes.
- More operational overhead; more control.
- Better for teams with existing container orchestration experience.
Path 2: Serverless (AWS Lambda with API Gateway)
- Simpler operational model; AWS manages scaling.
- Superset’s long-running web server doesn’t fit Lambda’s execution model well.
- Not recommended for production Superset deployments.
We recommend Path 1: Container-based on ECS Fargate. Fargate removes the need to manage EC2 instances while keeping the flexibility you need for a stateful application like Superset.
Database Backend
Superset requires a metadata database (to store dashboards, users, and configuration) and typically connects to one or more data warehouses (to query your actual analytics data).
Metadata Database Options:
- PostgreSQL on RDS: Managed, highly available, easy to back up. Standard choice. Costs $20–100/month depending on instance size.
- MySQL on RDS: Similar to PostgreSQL. Slightly cheaper in some regions.
- Aurora PostgreSQL: Higher availability, auto-scaling storage. Better for large deployments; ~$50–200/month.
For most teams, PostgreSQL on RDS is the right balance of cost and reliability. The PostgreSQL Runtime Configuration documentation provides tuning guidance if you need to optimise for Superset’s workload.
Caching Layer
Superset benefits from a caching layer to reduce database load and improve dashboard load times. Redis is the standard choice.
Options:
- ElastiCache Redis: AWS-managed, highly available, supports encryption in transit. ~$15–50/month for a small instance.
- Self-managed Redis on EC2: Cheaper but requires operational overhead.
We recommend ElastiCache Redis. The Redis Cache Documentation covers operational best practices; for Superset, you’ll use Redis for caching query results and session storage.
Data Warehouse Connection
Superset queries external data warehouses. Common options:
- Amazon Redshift: AWS-native, excellent for analytics. Costs scale with cluster size.
- Snowflake: Cloud-agnostic, pay-per-query. Popular in Australia for financial services and retail.
- BigQuery: Google Cloud; good if your data is already there.
- ClickHouse: Open-source, fast columnar database. Increasingly popular for cost-conscious teams.
For Platform Development in Melbourne | PADISO and Platform Development in Sydney | PADISO, we’ve seen Superset + ClickHouse replace expensive per-seat BI tools, cutting costs by 60–70% while improving query speed.
Networking and Security Foundation
Superset handles sensitive data and user credentials. Your network and security posture must be locked down before you deploy.
VPC and Subnet Design
Create a VPC with public and private subnets. Superset runs in private subnets; only the load balancer sits in public subnets.
# Pulumi code snippet (Python)
import pulumi
import pulumi_aws as aws
config = pulumi.Config()
environment = config.require('environment')
# VPC
vpc = aws.ec2.Vpc(f'{environment}-superset-vpc',
cidr_block='10.0.0.0/16',
enable_dns_hostnames=True,
enable_dns_support=True,
tags={'Environment': environment})
# Public subnet for ALB
public_subnet = aws.ec2.Subnet(f'{environment}-public-subnet',
vpc_id=vpc.id,
cidr_block='10.0.1.0/24',
availability_zone='ap-southeast-2a',
map_public_ip_on_launch=True)
# Private subnet for Superset and RDS
private_subnet = aws.ec2.Subnet(f'{environment}-private-subnet',
vpc_id=vpc.id,
cidr_block='10.0.2.0/24',
availability_zone='ap-southeast-2a')
This pattern isolates Superset from the internet. Traffic flows: User → ALB (public) → Superset (private) → RDS (private).
Security Groups
Define security groups with least-privilege rules.
# ALB security group: allow HTTPS from internet
alb_sg = aws.ec2.SecurityGroup(f'{environment}-alb-sg',
vpc_id=vpc.id,
ingress=[
aws.ec2.SecurityGroupIngressArgs(
protocol='tcp',
from_port=443,
to_port=443,
cidr_blocks=['0.0.0.0/0'], # HTTPS from anywhere
),
aws.ec2.SecurityGroupIngressArgs(
protocol='tcp',
from_port=80,
to_port=80,
cidr_blocks=['0.0.0.0/0'], # HTTP (redirect to HTTPS)
),
],
egress=[
aws.ec2.SecurityGroupEgressArgs(
protocol='-1',
from_port=0,
to_port=0,
cidr_blocks=['0.0.0.0/0'],
),
])
# Superset security group: allow traffic from ALB only
superset_sg = aws.ec2.SecurityGroup(f'{environment}-superset-sg',
vpc_id=vpc.id,
ingress=[
aws.ec2.SecurityGroupIngressArgs(
protocol='tcp',
from_port=8088, # Superset default port
to_port=8088,
security_groups=[alb_sg.id],
),
],
egress=[
aws.ec2.SecurityGroupEgressArgs(
protocol='-1',
from_port=0,
to_port=0,
cidr_blocks=['0.0.0.0/0'],
),
])
# RDS security group: allow traffic from Superset only
rds_sg = aws.ec2.SecurityGroup(f'{environment}-rds-sg',
vpc_id=vpc.id,
ingress=[
aws.ec2.SecurityGroupIngressArgs(
protocol='tcp',
from_port=5432, # PostgreSQL
to_port=5432,
security_groups=[superset_sg.id],
),
])
This ensures Superset can only be reached through the load balancer, and RDS can only be reached from Superset.
TLS/SSL Certificates
Superset should always run over HTTPS. Use AWS Certificate Manager (ACM) for free, auto-renewing certificates.
# Request a certificate for your domain
cert = aws.acm.Certificate(f'{environment}-superset-cert',
domain_name='analytics.yourcompany.com',
validation_method='DNS',
tags={'Environment': environment})
If you’re running in Australia and need compliance audit readiness, review PADISO’s AI Quickstart Audit | PADISO — Fixed-fee 2-week diagnostic, which includes infrastructure security assessment.
Storage, Secrets, and State Management
Secrets Management
Superset needs credentials for:
- PostgreSQL (metadata database)
- Redis (cache)
- Data warehouse connections (Redshift, Snowflake, etc.)
- SMTP (for email alerts)
- OAuth/SAML (for SSO)
Store these in AWS Secrets Manager, not in code or environment variables.
# Create a secret for PostgreSQL
db_secret = aws.secretsmanager.Secret(f'{environment}-superset-db-secret',
description='PostgreSQL credentials for Superset metadata database',
tags={'Environment': environment})
db_secret_version = aws.secretsmanager.SecretVersion(
f'{environment}-superset-db-secret-version',
secret_id=db_secret.id,
secret_string=pulumi.Output.secret(pulumi.json.dumps({
'username': 'superset_user',
'password': config.require_secret('db_password'),
'engine': 'postgresql',
'host': rds_instance.endpoint,
'port': 5432,
'dbname': 'superset_metadata',
})))
# Create a secret for Redis
redis_secret = aws.secretsmanager.Secret(f'{environment}-superset-redis-secret',
description='Redis connection string for Superset caching',
tags={'Environment': environment})
redis_secret_version = aws.secretsmanager.SecretVersion(
f'{environment}-superset-redis-secret-version',
secret_id=redis_secret.id,
secret_string=pulumi.Output.secret(f'redis://{redis_endpoint}:6379/0'))
When Superset’s ECS task starts, it fetches these secrets from Secrets Manager at runtime. This keeps sensitive data out of container images and Pulumi state.
Pulumi State Backend
Pulumi stores the state of your infrastructure (resource IDs, outputs, etc.) in a state backend. For production, use AWS S3 with encryption and versioning enabled.
# Configure Pulumi to use S3 backend
pulumi login s3://your-pulumi-state-bucket
Enable S3 bucket versioning and encryption:
state_bucket = aws.s3.Bucket(f'{environment}-pulumi-state',
versioning=aws.s3.BucketVersioningArgs(
enabled=True,
),
server_side_encryption_configuration=aws.s3.BucketServerSideEncryptionConfigurationArgs(
rule=aws.s3.BucketServerSideEncryptionConfigurationRuleArgs(
apply_server_side_encryption_by_default=aws.s3.BucketServerSideEncryptionConfigurationRuleApplyServerSideEncryptionByDefaultArgs(
sse_algorithm='AES256',
),
),
),
block_public_acls=True,
block_public_policy=True,
ignore_public_acls=True,
restrict_public_buckets=True,
tags={'Environment': environment})
Persistent Storage for Uploads
Superset allows users to upload CSV files for analysis. Store these in S3, not in the container.
superset_uploads_bucket = aws.s3.Bucket(f'{environment}-superset-uploads',
versioning=aws.s3.BucketVersioningArgs(enabled=True),
server_side_encryption_configuration=aws.s3.BucketServerSideEncryptionConfigurationArgs(
rule=aws.s3.BucketServerSideEncryptionConfigurationRuleArgs(
apply_server_side_encryption_by_default=aws.s3.BucketServerSideEncryptionConfigurationRuleApplyServerSideEncryptionByDefaultArgs(
sse_algorithm='AES256',
),
),
),
block_public_acls=True,
block_public_policy=True,
tags={'Environment': environment})
# IAM role for ECS task to access S3
superset_task_role = aws.iam.Role(f'{environment}-superset-task-role',
assume_role_policy=pulumi.json.dumps({
'Version': '2012-10-17',
'Statement': [{
'Action': 'sts:AssumeRole',
'Effect': 'Allow',
'Principal': {'Service': 'ecs-tasks.amazonaws.com'},
}],
}))
# Policy to read/write to S3 uploads bucket
s3_policy = aws.iam.RolePolicy(f'{environment}-superset-s3-policy',
role=superset_task_role.id,
policy=pulumi.json.dumps({
'Version': '2012-10-17',
'Statement': [{
'Effect': 'Allow',
'Action': ['s3:GetObject', 's3:PutObject', 's3:DeleteObject'],
'Resource': pulumi.Output.concat(superset_uploads_bucket.arn, '/*'),
}],
}))
Building the Superset Stack
RDS PostgreSQL Instance
Create a managed PostgreSQL instance for Superset’s metadata database.
# Create a DB subnet group (required for RDS in a VPC)
db_subnet_group = aws.rds.SubnetGroup(f'{environment}-superset-db-subnet',
subnet_ids=[private_subnet.id],
tags={'Environment': environment})
# Create the RDS instance
rds_instance = aws.rds.Instance(f'{environment}-superset-db',
allocated_storage=20,
storage_type='gp3',
engine='postgres',
engine_version='15.3',
instance_class='db.t3.micro', # Start small; scale up as needed
db_name='superset_metadata',
username='superset_user',
password=config.require_secret('db_password'),
db_subnet_group_name=db_subnet_group.name,
vpc_security_group_ids=[rds_sg.id],
skip_final_snapshot=False, # Always snapshot before deletion
final_snapshot_identifier=f'{environment}-superset-db-final-{pulumi.automation.datetime.now().isoformat()}',
backup_retention_period=7, # Keep 7 days of backups
multi_az=True, # High availability
storage_encrypted=True,
tags={'Environment': environment})
pulumi.export('rds_endpoint', rds_instance.endpoint)
The multi_az=True setting ensures your metadata database is highly available. If the primary instance fails, RDS automatically promotes the standby replica.
ElastiCache Redis Instance
Create a managed Redis instance for caching.
# Create a cache subnet group
cache_subnet_group = aws.elasticache.SubnetGroup(f'{environment}-superset-cache-subnet',
subnet_ids=[private_subnet.id],
tags={'Environment': environment})
# Create the Redis cluster
redis_cluster = aws.elasticache.Cluster(f'{environment}-superset-redis',
engine='redis',
engine_version='7.0',
node_type='cache.t3.micro',
num_cache_nodes=1,
parameter_group_name='default.redis7',
port=6379,
subnet_group_name=cache_subnet_group.name,
security_group_ids=[redis_sg.id],
at_rest_encryption_enabled=True,
transit_encryption_enabled=True,
transit_encryption_mode='preferred',
auto_failover_enabled=False, # Single-node; failover not needed
tags={'Environment': environment})
pulumi.export('redis_endpoint', redis_cluster.cache_nodes[0].address)
For production deployments with higher availability requirements, use a Redis replication group instead of a single cluster node.
ECS Cluster and Task Definition
Create an ECS cluster and define a task to run Superset.
# Create ECS cluster
ecs_cluster = aws.ecs.Cluster(f'{environment}-superset-cluster',
settings=[aws.ecs.ClusterSettingArgs(
name='containerInsights',
value='enabled',
)],
tags={'Environment': environment})
# CloudWatch log group for Superset
log_group = aws.cloudwatch.LogGroup(f'{environment}-superset-logs',
retention_in_days=7,
tags={'Environment': environment})
# ECS task execution role (allows ECS to pull image and access Secrets Manager)
task_execution_role = aws.iam.Role(f'{environment}-superset-task-execution-role',
assume_role_policy=pulumi.json.dumps({
'Version': '2012-10-17',
'Statement': [{
'Action': 'sts:AssumeRole',
'Effect': 'Allow',
'Principal': {'Service': 'ecs-tasks.amazonaws.com'},
}],
}))
# Attach the standard ECS task execution policy
task_execution_policy_attachment = aws.iam.RolePolicyAttachment(
f'{environment}-superset-task-execution-policy',
role=task_execution_role.name,
policy_arn='arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy')
# Allow task execution role to access Secrets Manager
secrets_policy = aws.iam.RolePolicy(f'{environment}-superset-secrets-policy',
role=task_execution_role.id,
policy=pulumi.json.dumps({
'Version': '2012-10-17',
'Statement': [{
'Effect': 'Allow',
'Action': ['secretsmanager:GetSecretValue'],
'Resource': [db_secret.arn, redis_secret.arn],
}],
}))
# ECS task definition
task_definition = aws.ecs.TaskDefinition(f'{environment}-superset-task',
family=f'{environment}-superset',
network_mode='awsvpc',
requires_compatibilities=['FARGATE'],
cpu='512',
memory='1024',
execution_role_arn=task_execution_role.arn,
task_role_arn=superset_task_role.arn,
container_definitions=pulumi.Output.all(
log_group.name,
db_secret.arn,
redis_secret.arn,
superset_uploads_bucket.id
).apply(lambda args: pulumi.json.dumps([{
'name': 'superset',
'image': 'apache/superset:latest-dev', # Use a pinned version in production
'portMappings': [{
'containerPort': 8088,
'hostPort': 8088,
'protocol': 'tcp',
}],
'logConfiguration': {
'logDriver': 'awslogs',
'options': {
'awslogs-group': args[0],
'awslogs-region': 'ap-southeast-2',
'awslogs-stream-prefix': 'ecs',
},
},
'secrets': [
{
'name': 'SUPERSET_DATABASE_URL',
'valueFrom': args[1],
},
{
'name': 'REDIS_URL',
'valueFrom': args[2],
},
],
'environment': [
{'name': 'SUPERSET_LOAD_EXAMPLES', 'value': 'false'},
{'name': 'SUPERSET_SECRET_KEY', 'value': config.require_secret('superset_secret_key')},
{'name': 'SUPERSET_UPLOADS_FOLDER', 'value': f's3://{args[3]}/uploads'},
],
'essential': True,
}])),
tags={'Environment': environment})
This task definition pulls the official Apache Superset Docker image, configures logging to CloudWatch, and injects secrets at runtime. For production, pin the image to a specific version (e.g., apache/superset:2.1.0) rather than latest-dev.
Application Load Balancer
Create an ALB to distribute traffic to Superset tasks.
# Create target group
target_group = aws.lb.TargetGroup(f'{environment}-superset-tg',
port=8088,
protocol='HTTP',
target_type='ip',
vpc_id=vpc.id,
health_check=aws.lb.TargetGroupHealthCheckArgs(
healthy_threshold=2,
unhealthy_threshold=2,
timeout=5,
interval=30,
path='/health',
matcher='200',
),
tags={'Environment': environment})
# Create ALB
alb = aws.lb.LoadBalancer(f'{environment}-superset-alb',
internal=False,
load_balancer_type='application',
security_groups=[alb_sg.id],
subnets=[public_subnet.id],
tags={'Environment': environment})
# HTTPS listener
https_listener = aws.lb.Listener(f'{environment}-superset-https',
load_balancer_arn=alb.arn,
port=443,
protocol='HTTPS',
ssl_policy='ELBSecurityPolicy-TLS-1-2-2017-01',
certificate_arn=cert.arn,
default_actions=[aws.lb.ListenerDefaultActionArgs(
type='forward',
target_group_arn=target_group.arn,
)])
# HTTP listener (redirect to HTTPS)
http_listener = aws.lb.Listener(f'{environment}-superset-http',
load_balancer_arn=alb.arn,
port=80,
protocol='HTTP',
default_actions=[aws.lb.ListenerDefaultActionArgs(
type='redirect',
redirect=aws.lb.ListenerDefaultActionRedirectArgs(
port='443',
protocol='HTTPS',
status_code='HTTP_301',
),
)])
pulumi.export('alb_dns_name', alb.dns_name)
ECS Service
Finally, create an ECS service to run Superset tasks.
service = aws.ecs.Service(f'{environment}-superset-service',
cluster=ecs_cluster.arn,
task_definition=task_definition.arn,
desired_count=2, # Run 2 tasks for high availability
launch_type='FARGATE',
network_configuration=aws.ecs.ServiceNetworkConfigurationArgs(
subnets=[private_subnet.id],
security_groups=[superset_sg.id],
assign_public_ip=False,
),
load_balancers=[aws.ecs.ServiceLoadBalancerArgs(
target_group_arn=target_group.arn,
container_name='superset',
container_port=8088,
)],
depends_on=[https_listener],
tags={'Environment': environment})
Autoscaling and Load Balancing
ECS Service Autoscaling
Configure the ECS service to scale based on CPU and memory utilisation.
# Create autoscaling target
autoscaling_target = aws.appautoscaling.Target(f'{environment}-superset-autoscaling-target',
max_capacity=5,
min_capacity=2,
resource_id=pulumi.Output.concat('service/', ecs_cluster.name, '/', service.name),
scalable_dimension='ecs:service:DesiredCount',
service_namespace='ecs')
# Scale up when CPU > 70%
cpu_scaling_policy = aws.appautoscaling.Policy(f'{environment}-superset-cpu-scaling',
policy_type='TargetTrackingScaling',
resource_id=autoscaling_target.resource_id,
scalable_dimension=autoscaling_target.scalable_dimension,
service_namespace=autoscaling_target.service_namespace,
target_tracking_scaling_policy_configuration=aws.appautoscaling.TargetTrackingScalingPolicyConfigurationArgs(
target_value=70.0,
predefined_metric_specification=aws.appautoscaling.TargetTrackingScalingPolicyConfigurationPredefinedMetricSpecificationArgs(
predefined_metric_type='ECSServiceAverageCPUUtilization',
),
scale_out_cooldown=60,
scale_in_cooldown=300,
))
# Scale up when memory > 80%
memory_scaling_policy = aws.appautoscaling.Policy(f'{environment}-superset-memory-scaling',
policy_type='TargetTrackingScaling',
resource_id=autoscaling_target.resource_id,
scalable_dimension=autoscaling_target.scalable_dimension,
service_namespace=autoscaling_target.service_namespace,
target_tracking_scaling_policy_configuration=aws.appautoscaling.TargetTrackingScalingPolicyConfigurationArgs(
target_value=80.0,
predefined_metric_specification=aws.appautoscaling.TargetTrackingScalingPolicyConfigurationPredefinedMetricSpecificationArgs(
predefined_metric_type='ECSServiceAverageMemoryUtilization',
),
scale_out_cooldown=60,
scale_in_cooldown=300,
))
With these policies, your Superset deployment will automatically scale from 2 to 5 tasks as demand increases, and scale back down during quiet periods.
Database Connection Pooling
Superset instances need to share database connections efficiently. Configure Superset’s SQLALCHEMY_ENGINE_OPTIONS to use connection pooling.
# In your Superset configuration (superset_config.py or via environment variable)
SQLALCHEMY_ENGINE_OPTIONS = {
'pool_size': 10,
'pool_recycle': 3600,
'pool_pre_ping': True,
'max_overflow': 20,
}
These settings ensure:
pool_size=10: Maintain 10 persistent connections to the database.pool_recycle=3600: Recycle connections every hour (prevents stale connections).pool_pre_ping=True: Test connections before reusing them.max_overflow=20: Allow up to 20 additional temporary connections if the pool is exhausted.
Observability and Operational Habits
CloudWatch Monitoring
Set up CloudWatch dashboards and alarms to monitor Superset’s health.
# Create a CloudWatch dashboard
dashboard = aws.cloudwatch.Dashboard(f'{environment}-superset-dashboard',
dashboard_body=pulumi.Output.all(
ecs_cluster.name,
service.name,
alb.arn,
target_group.arn,
).apply(lambda args: pulumi.json.dumps({
'widgets': [
{
'type': 'metric',
'properties': {
'metrics': [
['AWS/ECS', 'CPUUtilization', {'stat': 'Average'}],
['.', 'MemoryUtilization', {'stat': 'Average'}],
],
'period': 300,
'stat': 'Average',
'region': 'ap-southeast-2',
'title': 'ECS Task CPU and Memory',
},
},
{
'type': 'metric',
'properties': {
'metrics': [
['AWS/ApplicationELB', 'TargetResponseTime', {'stat': 'Average'}],
['.', 'RequestCount', {'stat': 'Sum'}],
['.', 'HTTPCode_Target_5XX_Count', {'stat': 'Sum'}],
],
'period': 60,
'stat': 'Average',
'region': 'ap-southeast-2',
'title': 'ALB Performance',
},
},
],
})))
# Alarm: ECS task CPU > 85% for 2 minutes
cpu_alarm = aws.cloudwatch.MetricAlarm(f'{environment}-superset-cpu-alarm',
comparison_operator='GreaterThanThreshold',
evaluation_periods=2,
metric_name='CPUUtilization',
namespace='AWS/ECS',
period=60,
statistic='Average',
threshold=85,
alarm_description='Alert when Superset ECS task CPU exceeds 85%',
alarm_actions=[sns_topic.arn], # Send to SNS topic
dimensions=[
aws.cloudwatch.MetricAlarmDimensionArgs(
name='ClusterName',
value=ecs_cluster.name,
),
aws.cloudwatch.MetricAlarmDimensionArgs(
name='ServiceName',
value=service.name,
),
])
# Alarm: ALB target health
target_health_alarm = aws.cloudwatch.MetricAlarm(f'{environment}-superset-target-health-alarm',
comparison_operator='LessThanThreshold',
evaluation_periods=1,
metric_name='HealthyHostCount',
namespace='AWS/ApplicationELB',
period=60,
statistic='Average',
threshold=1,
alarm_description='Alert when fewer than 1 healthy Superset target',
alarm_actions=[sns_topic.arn],
dimensions=[
aws.cloudwatch.MetricAlarmDimensionArgs(
name='TargetGroup',
value=target_group.arn_suffix,
),
aws.cloudwatch.MetricAlarmDimensionArgs(
name='LoadBalancer',
value=alb.arn_suffix,
),
])
Application Logging and Log Insights
Superset logs to stdout, which ECS captures and sends to CloudWatch. Query logs with CloudWatch Logs Insights.
# Find errors in Superset logs
fields @timestamp, @message
| filter @message like /ERROR/
| stats count() by @message
For more sophisticated observability, integrate with Datadog, New Relic, or Prometheus.
Operational Runbook
Document operational procedures for your team:
- Deploying a new version: Update the task definition image, push to production.
- Scaling up: Modify
desired_countin the ECS service or let autoscaling handle it. - Database maintenance: Use AWS RDS console to create snapshots, modify parameter groups.
- Secrets rotation: Update secrets in AWS Secrets Manager; ECS tasks will pick up changes on next restart.
- Debugging a failed task: Check CloudWatch logs, ECS task details, and security group rules.
Disaster Recovery and Backup Strategy
RDS Automated Backups
RDS automatically backs up your metadata database. Configure retention and testing.
# Already configured in the RDS instance definition above:
# backup_retention_period=7 # Keep 7 days of backups
# multi_az=True # Synchronous standby replica
To restore from a backup:
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier superset-restored \
--db-snapshot-identifier superset-2024-01-15-03-00
Snapshots and Point-in-Time Recovery
Create manual snapshots before major changes.
# Manual snapshot via Pulumi (run before deployments)
manual_snapshot = aws.rds.ClusterSnapshot(f'{environment}-superset-snapshot-pre-deploy',
db_cluster_identifier=rds_instance.id,
db_cluster_snapshot_identifier=f'superset-pre-deploy-{pulumi.automation.datetime.now().isoformat()}')
Redis Persistence
Configure Redis to persist data to disk. By default, ElastiCache Redis uses RDB snapshots.
# Already configured in the Redis cluster definition above:
# at_rest_encryption_enabled=True
# transit_encryption_enabled=True
For critical workloads, use Redis Cluster with multi-AZ failover.
S3 Uploads Bucket Versioning
Enable versioning and lifecycle policies on the uploads bucket.
# Already configured above:
# versioning=aws.s3.BucketVersioningArgs(enabled=True)
# Add lifecycle rule to archive old versions
lifecycle_rule = aws.s3.BucketLifecycleConfigurationV2(f'{environment}-superset-uploads-lifecycle',
bucket=superset_uploads_bucket.id,
rules=[
aws.s3.BucketLifecycleConfigurationV2RuleArgs(
id='archive-old-versions',
status='Enabled',
noncurrent_version_transitions=[
aws.s3.BucketLifecycleConfigurationV2RuleNoncurrentVersionTransitionArgs(
storage_class='GLACIER',
days=30,
),
],
noncurrent_version_expiration=aws.s3.BucketLifecycleConfigurationV2RuleNoncurrentVersionExpirationArgs(
days=90,
),
),
])
Disaster Recovery Testing
Monthly, test your recovery procedures:
- Restore RDS from snapshot to a test instance.
- Verify Superset can connect and query the restored database.
- Document any issues and update runbooks.
Cost Optimisation and Governance
Right-Sizing Compute
Start with small instance types and scale based on actual usage. For a pilot deployment:
- ECS Fargate: 512 CPU, 1 GB memory per task; 2 tasks = ~$15/month.
- RDS:
db.t3.micro= ~$20/month. - ElastiCache:
cache.t3.micro= ~$10/month. - ALB: ~$15/month.
- NAT Gateway: ~$30/month (if using one).
Total pilot cost: ~$90/month. As you scale, costs will increase proportionally.
Reserved Instances and Savings Plans
Once you’ve stabilised your workload, purchase Reserved Instances or Savings Plans for 30–40% discounts.
# Example: Purchase a 1-year RDS reserved instance
rds_reservation = aws.rds.ReservedInstance(
offering_id='12345678-1234-1234-1234-123456789012',
reservation_id=f'{environment}-superset-db-reservation')
Cost Allocation Tags
Tag all resources consistently for cost tracking.
common_tags = {
'Environment': environment,
'Project': 'Superset',
'CostCenter': 'Analytics',
'Owner': 'Data Platform Team',
}
Use AWS Cost Explorer to filter costs by tag and identify optimisation opportunities.
Unused Resource Cleanup
Regularly audit your deployment for unused resources:
# List all Superset-related resources
aws resourcegroupstaggingapi get-resources \
--tag-filters Key=Project,Values=Superset
Common Pitfalls and How to Avoid Them
Pitfall 1: Database Connection Exhaustion
Problem: Superset tasks can’t connect to RDS because the connection pool is exhausted.
Symptom: Errors like FATAL: remaining connection slots are reserved for non-replication superuser connections.
Solution: Configure connection pooling and monitor connection count.
# Monitor RDS connections
connection_count_alarm = aws.cloudwatch.MetricAlarm(f'{environment}-rds-connections-alarm',
comparison_operator='GreaterThanThreshold',
evaluation_periods=2,
metric_name='DatabaseConnections',
namespace='AWS/RDS',
period=60,
statistic='Average',
threshold=80, # Alert if > 80 connections
alarm_description='Alert when RDS connection count exceeds 80',
alarm_actions=[sns_topic.arn])
Pitfall 2: Unencrypted Secrets in Task Definition
Problem: Secrets are logged in plaintext in ECS task definition history.
Solution: Use AWS Secrets Manager (as shown above) instead of hardcoded environment variables.
Pitfall 3: No Health Checks on ALB
Problem: Unhealthy Superset tasks continue to receive traffic, causing 502 errors.
Solution: Configure ALB health checks with appropriate thresholds (as shown above).
Pitfall 4: Single-AZ Deployment
Problem: If an availability zone goes down, Superset becomes unavailable.
Solution: Deploy Superset tasks across multiple AZs.
# Create a second private subnet in a different AZ
private_subnet_2 = aws.ec2.Subnet(f'{environment}-private-subnet-2',
vpc_id=vpc.id,
cidr_block='10.0.3.0/24',
availability_zone='ap-southeast-2b') # Different AZ
# Update ECS service to use both subnets
service = aws.ecs.Service(...,
network_configuration=aws.ecs.ServiceNetworkConfigurationArgs(
subnets=[private_subnet.id, private_subnet_2.id], # Both subnets
...
),
...)
Pitfall 5: No Log Retention
Problem: CloudWatch logs grow unbounded, increasing costs.
Solution: Set retention policies (as shown above).
Next Steps and Scaling
Beyond the Baseline
Once your baseline Superset deployment is stable, consider:
- Multi-region deployment: Deploy Superset in multiple AWS regions for global availability. Use Route 53 for DNS failover.
- Kubernetes migration: If you’re running other workloads on Kubernetes, migrate Superset to your existing cluster for operational simplicity.
- Advanced caching: Implement query result caching with TTLs to reduce database load.
- Custom plugins: Build Superset plugins for domain-specific visualisations or data connectors.
Integration with Data Platforms
Superset is most powerful when connected to a well-designed data platform. Consider:
- Data warehouse: Use Platform Development in Australia | PADISO to design a ClickHouse or Snowflake data warehouse optimised for Superset queries.
- ETL/ELT pipelines: Orchestrate data ingestion with Apache Airflow or Prefect.
- Data governance: Implement lineage tracking and metadata management.
For teams in specific regions, PADISO offers specialised platform engineering:
- Platform Development in Sydney | PADISO for financial services and retail.
- Platform Development in Melbourne | PADISO for insurance and health.
- Platform Development in Canberra | PADISO for government and defence.
- Platform Development in San Francisco | PADISO for AI and SaaS.
- Platform Development in Boston | PADISO for biotech and pharma.
- Platform Development in Seattle | PADISO for cloud-native and aerospace.
- Platform Development in Atlanta | PADISO for fintech and logistics.
- Platform Development in Denver | PADISO for energy and aerospace.
Compliance and Auditing
If your organisation requires SOC 2 or ISO 27001 compliance, use Vanta to automate compliance evidence collection. Your Pulumi-managed infrastructure integrates seamlessly with Vanta’s continuous compliance monitoring.
PADISO specialises in Security Audit (SOC 2 / ISO 27001) readiness. If you need guidance on audit preparation, book a consultation.
Operational Excellence
As your Superset deployment matures, invest in:
- Infrastructure-as-Code maturity: Modularise your Pulumi code into reusable stacks and components.
- CI/CD pipelines: Automate Superset deployments with GitHub Actions or GitLab CI.
- Cost optimisation: Use AWS Compute Optimizer and Cost Anomaly Detection to identify savings.
- Disaster recovery drills: Quarterly test your backup and recovery procedures.
Getting Help
If you need hands-on support building or scaling Superset on Pulumi, PADISO offers fractional CTO and platform engineering services. Visit PADISO: AI Solutions & Strategic Leadership — AIR Bootcamps | SOC2 & ISO27001 via Vanta to learn more.
For a rapid assessment of your current analytics stack and Superset readiness, book an AI Quickstart Audit | PADISO — Fixed-fee 2-week diagnostic. We’ll tell you where you are, what to ship first, and what 90 days could unlock.
Summary
Deploying Apache Superset on Pulumi Stack is a proven pattern for shipping repeatable, auditable analytics infrastructure. This guide covers:
- Architecture decisions: Cloud provider, compute model, databases, caching, and data warehouse selection.
- Networking and security: VPC design, security groups, TLS, and secrets management.
- Storage and state: Persistent uploads, Pulumi state backend, and database backups.
- Core infrastructure: RDS PostgreSQL, ElastiCache Redis, ECS Fargate, and Application Load Balancer.
- Autoscaling: ECS service scaling policies and database connection pooling.
- Observability: CloudWatch dashboards, alarms, and operational runbooks.
- Disaster recovery: Automated backups, snapshots, and recovery testing.
- Cost optimisation: Right-sizing, reserved instances, and cost allocation.
- Common pitfalls: Connection exhaustion, unencrypted secrets, missing health checks, single-AZ risk, and unbounded logs.
With this pattern, you can ship Superset to production in days, not weeks. Your infrastructure is version-controlled, auditable, and ready to scale. The operational habits described here—monitoring, backup testing, cost tracking—keep your deployment healthy and cost-effective.
Start small (pilot deployment, ~$90/month), validate your use case, then scale confidently. If you need support, PADISO’s platform engineering teams are experienced with Superset deployments across Australia and the US.