Guide 20 mins

Apache Superset on Terraform Module: Reference Deployment Pattern

Step-by-step guide to deploying Apache Superset on Terraform. Covers networking, storage, secrets, autoscaling, and operational habits for production.

The PADISO Team ·2026-06-18

Why Terraform for Superset
Architecture Overview
Networking and Security Foundation
Storage and Database Configuration
Secrets Management
Compute and Autoscaling
Deployment Walkthrough
Operational Habits
Monitoring and Observability
Next Steps

Why Terraform for Superset

Apache Superset is a powerful, open-source data visualisation and business intelligence platform. When you’re deploying it to production—especially at scale—you need infrastructure as code (IaC) to make it repeatable, auditable, and safe. Terraform is the industry standard for this work.

A Terraform module for Superset gives you several concrete wins:

Reproducibility: Ship the same Superset stack across dev, staging, and production without manual configuration drift.
Auditability: Every change is tracked in version control. This matters for SOC 2 and ISO 27001 compliance audits—you can prove what was deployed, when, and by whom.
Speed: Spin up a new environment in minutes instead of hours. Useful for testing, disaster recovery, and scaling.
Cost control: Terraform lets you version-control infrastructure decisions, making it easy to right-size compute and storage.

At PADISO, we’ve deployed Superset across regulated industries—financial services, government, healthcare—where audit trails and reproducibility are non-negotiable. This guide reflects that production-hardened approach.

Architecture Overview

A production Superset deployment typically includes:

Web tier: Gunicorn + Nginx running Superset itself, behind a load balancer.
Database tier: PostgreSQL (or compatible) for Superset metadata and user data.
Cache layer: Redis for session management, query caching, and task queues.
Task queue: Celery workers for long-running queries and report generation.
Storage: Object storage (S3, Azure Blob) for backups and exported dashboards.
Networking: VPC with public and private subnets, security groups, and NAT gateways.
Secrets: Encrypted key-value store (AWS Secrets Manager, HashiCorp Vault) for database credentials, API keys, and encryption keys.

Our reference module assumes AWS, but the pattern translates to Azure, GCP, or on-premises Kubernetes clusters with minor adjustments.

Module Structure

A well-organised Terraform module for Superset follows this structure:

superset-module/
├── main.tf              # Core resource definitions
├── variables.tf         # Input variables
├── outputs.tf           # Output values
├── networking.tf        # VPC, subnets, security groups
├── database.tf          # RDS PostgreSQL
├── cache.tf             # ElastiCache Redis
├── compute.tf           # ECS/EC2 for Superset
├── secrets.tf           # AWS Secrets Manager
├── storage.tf           # S3 buckets
├── iam.tf               # IAM roles and policies
└── versions.tf          # Provider versions

This modular approach keeps concerns separated and makes it easy to understand the dependency graph. When you need to adjust networking, you edit networking.tf. When you need to scale compute, you edit compute.tf.

Networking and Security Foundation

Networking is where most production issues originate. Get this wrong, and you’ll spend weeks debugging connectivity problems. Get it right, and your Superset cluster is isolated, resilient, and audit-ready.

VPC and Subnets

Define a VPC with public and private subnets across at least two availability zones (AZs):

resource "aws_vpc" "superset" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${var.environment}-superset-vpc"
  }
}

resource "aws_subnet" "public" {
  count                   = 2
  vpc_id                  = aws_vpc.superset.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone       = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.environment}-public-${count.index + 1}"
  }
}

resource "aws_subnet" "private" {
  count             = 2
  vpc_id            = aws_vpc.superset.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 100)
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    Name = "${var.environment}-private-${count.index + 1}"
  }
}

Public subnets host the NAT gateway and load balancer. Private subnets host Superset, the database, and Redis. This segmentation ensures only the load balancer is exposed to the internet.

Security Groups

Define granular security groups to control traffic:

resource "aws_security_group" "alb" {
  name        = "${var.environment}-superset-alb"
  description = "Allow inbound HTTP/HTTPS"
  vpc_id      = aws_vpc.superset.id

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.environment}-superset-alb-sg"
  }
}

resource "aws_security_group" "superset" {
  name        = "${var.environment}-superset-app"
  description = "Allow traffic from ALB to Superset"
  vpc_id      = aws_vpc.superset.id

  ingress {
    from_port       = 8088
    to_port         = 8088
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.environment}-superset-app-sg"
  }
}

resource "aws_security_group" "rds" {
  name        = "${var.environment}-superset-rds"
  description = "Allow traffic from Superset to RDS"
  vpc_id      = aws_vpc.superset.id

  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.superset.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.environment}-superset-rds-sg"
  }
}

resource "aws_security_group" "redis" {
  name        = "${var.environment}-superset-redis"
  description = "Allow traffic from Superset to Redis"
  vpc_id      = aws_vpc.superset.id

  ingress {
    from_port       = 6379
    to_port         = 6379
    protocol        = "tcp"
    security_groups = [aws_security_group.superset.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.environment}-superset-redis-sg"
  }
}

This setup ensures:

Only the ALB accepts inbound traffic from the internet.
Superset can only talk to RDS and Redis (and egress to the internet for data source connections).
RDS and Redis only accept traffic from Superset.

For teams in Australia working on regulated projects, this segmentation is essential for SOC 2 and ISO 27001 audits. When you’re working with PADISO on platform development in Sydney, Melbourne, or Canberra, compliance auditors will ask to see your network diagram and security group rules. Terraform makes this trivial to document.

Storage and Database Configuration

Superset stores metadata (dashboards, datasets, user permissions) in a relational database. It also needs object storage for backups and exports. Get this layer right, and you have a reliable, scalable foundation.

RDS PostgreSQL

PostgreSQL is the recommended backend for Superset. Use RDS for managed backups, multi-AZ failover, and automated patching:

resource "aws_db_subnet_group" "superset" {
  name       = "${var.environment}-superset"
  subnet_ids = aws_subnet.private[*].id

  tags = {
    Name = "${var.environment}-superset-db-subnet"
  }
}

resource "aws_rds_cluster" "superset" {
  cluster_identifier              = "${var.environment}-superset"
  engine                          = "aurora-postgresql"
  engine_version                  = "15.2"
  database_name                   = var.db_name
  master_username                 = var.db_username
  master_password                 = random_password.db_password.result
  backup_retention_period         = 30
  preferred_backup_window         = "03:00-04:00"
  preferred_maintenance_window    = "mon:04:00-mon:05:00"
  enabled_cloudwatch_logs_exports = ["postgresql"]
  skip_final_snapshot             = false
  final_snapshot_identifier       = "${var.environment}-superset-final-${formatdate("YYYY-MM-DD-hhmm", timestamp())}"
  db_subnet_group_name            = aws_db_subnet_group.superset.name
  db_cluster_parameter_group_name = aws_rds_cluster_parameter_group.superset.name
  vpc_security_group_ids          = [aws_security_group.rds.id]
  storage_encrypted               = true
  kms_key_id                      = aws_kms_key.rds.arn
  enable_iam_database_authentication = true

  tags = {
    Name = "${var.environment}-superset-cluster"
  }
}

resource "aws_rds_cluster_instance" "superset" {
  count              = var.db_instance_count
  identifier         = "${var.environment}-superset-${count.index + 1}"
  cluster_identifier = aws_rds_cluster.superset.id
  instance_class     = var.db_instance_class
  engine             = aws_rds_cluster.superset.engine
  engine_version     = aws_rds_cluster.superset.engine_version
  publicly_accessible = false
  monitoring_interval = 60
  monitoring_role_arn = aws_iam_role.rds_monitoring.arn

  tags = {
    Name = "${var.environment}-superset-${count.index + 1}"
  }
}

Use Aurora PostgreSQL for better performance and automatic failover. Enable encryption at rest (via KMS) and in transit. Set backup_retention_period to at least 30 days; most compliance frameworks require this. CloudWatch Logs exports let you audit database activity.

S3 for Backups and Exports

Store Superset backups and exported dashboards in S3:

resource "aws_s3_bucket" "superset_backups" {
  bucket = "${var.environment}-superset-backups-${data.aws_caller_identity.current.account_id}"

  tags = {
    Name = "${var.environment}-superset-backups"
  }
}

resource "aws_s3_bucket_versioning" "superset_backups" {
  bucket = aws_s3_bucket.superset_backups.id

  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "superset_backups" {
  bucket = aws_s3_bucket.superset_backups.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.s3.arn
    }
  }
}

resource "aws_s3_bucket_public_access_block" "superset_backups" {
  bucket = aws_s3_bucket.superset_backups.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket_lifecycle_configuration" "superset_backups" {
  bucket = aws_s3_bucket.superset_backups.id

  rule {
    id     = "archive-old-backups"
    status = "Enabled"

    transition {
      days          = 90
      storage_class = "GLACIER"
    }

    expiration {
      days = 365
    }
  }
}

Enable versioning so you can recover from accidental deletes. Use KMS encryption and block public access. Set up lifecycle policies to archive old backups to Glacier (cheaper long-term storage) and expire them after a year. This approach keeps costs down while maintaining audit trails.

Secrets Management

Superset needs database credentials, API keys, and encryption keys. Never hardcode these in Terraform code or container images. Use AWS Secrets Manager:

resource "random_password" "db_password" {
  length  = 32
  special = true
}

resource "aws_secretsmanager_secret" "db_password" {
  name                    = "${var.environment}/superset/db-password"
  recovery_window_in_days = 7

  tags = {
    Name = "${var.environment}-superset-db-password"
  }
}

resource "aws_secretsmanager_secret_version" "db_password" {
  secret_id       = aws_secretsmanager_secret.db_password.id
  secret_string   = random_password.db_password.result
}

resource "random_password" "superset_secret_key" {
  length  = 64
  special = true
}

resource "aws_secretsmanager_secret" "superset_secret_key" {
  name                    = "${var.environment}/superset/secret-key"
  recovery_window_in_days = 7

  tags = {
    Name = "${var.environment}-superset-secret-key"
  }
}

resource "aws_secretsmanager_secret_version" "superset_secret_key" {
  secret_id     = aws_secretsmanager_secret.superset_secret_key.id
  secret_string = random_password.superset_secret_key.result
}

When deploying Superset containers, use IAM roles to grant access to these secrets. The Superset application reads them at startup, not from environment variables. This pattern is audit-friendly: CloudTrail logs every secret access, which is exactly what SOC 2 and ISO 27001 auditors want to see.

Compute and Autoscaling

Superset runs on Gunicorn, a Python application server. Deploy it on ECS (Elastic Container Service) with autoscaling to handle variable load.

ECS Task Definition

Define a Superset task that runs in a container:

resource "aws_ecs_cluster" "superset" {
  name = "${var.environment}-superset"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }

  tags = {
    Name = "${var.environment}-superset-cluster"
  }
}

resource "aws_ecs_task_definition" "superset" {
  family                   = "${var.environment}-superset"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = var.task_cpu
  memory                   = var.task_memory
  execution_role_arn       = aws_iam_role.ecs_task_execution_role.arn
  task_role_arn            = aws_iam_role.ecs_task_role.arn

  container_definitions = jsonencode([{
    name      = "superset"
    image     = var.superset_image
    essential = true
    portMappings = [{
      containerPort = 8088
      hostPort      = 8088
      protocol      = "tcp"
    }]
    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"         = aws_cloudwatch_log_group.superset.name
        "awslogs-region"        = var.aws_region
        "awslogs-stream-prefix" = "ecs"
      }
    }
    environment = [
      {
        name  = "SUPERSET_ENV"
        value = var.environment
      },
      {
        name  = "DATABASE_HOST"
        value = aws_rds_cluster.superset.endpoint
      },
      {
        name  = "DATABASE_PORT"
        value = "5432"
      },
      {
        name  = "DATABASE_DB"
        value = var.db_name
      },
      {
        name  = "DATABASE_USER"
        value = var.db_username
      },
      {
        name  = "REDIS_HOST"
        value = aws_elasticache_cluster.superset.cache_nodes[0].address
      },
      {
        name  = "REDIS_PORT"
        value = "6379"
      },
      {
        name  = "SUPERSET_LOAD_EXAMPLES"
        value = var.load_examples ? "yes" : "no"
      }
    ]
    secrets = [
      {
        name      = "DATABASE_PASSWORD"
        valueFrom = aws_secretsmanager_secret.db_password.arn
      },
      {
        name      = "SUPERSET_SECRET_KEY"
        valueFrom = aws_secretsmanager_secret.superset_secret_key.arn
      }
    ]
  }])

  tags = {
    Name = "${var.environment}-superset-task"
  }
}

Use Fargate for serverless container management—no EC2 instances to patch. Pass secrets via the secrets block, not environment variables. CloudWatch Logs capture all output, making it easy to debug issues.

ECS Service and Load Balancer

Run the task definition as a service behind an Application Load Balancer (ALB):

resource "aws_lb" "superset" {
  name               = "${var.environment}-superset-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = aws_subnet.public[*].id

  enable_deletion_protection = var.environment == "production"

  tags = {
    Name = "${var.environment}-superset-alb"
  }
}

resource "aws_lb_target_group" "superset" {
  name        = "${var.environment}-superset"
  port        = 8088
  protocol    = "HTTP"
  vpc_id      = aws_vpc.superset.id
  target_type = "ip"

  health_check {
    healthy_threshold   = 2
    unhealthy_threshold = 3
    timeout             = 5
    interval            = 30
    path                = "/health"
    matcher             = "200"
  }

  tags = {
    Name = "${var.environment}-superset-tg"
  }
}

resource "aws_lb_listener" "superset_http" {
  load_balancer_arn = aws_lb.superset.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type = "redirect"

    redirect {
      port        = "443"
      protocol    = "HTTPS"
      status_code = "HTTP_301"
    }
  }
}

resource "aws_lb_listener" "superset_https" {
  load_balancer_arn = aws_lb.superset.arn
  port              = "443"
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS-1-2-2017-01"
  certificate_arn   = var.ssl_certificate_arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.superset.arn
  }
}

resource "aws_ecs_service" "superset" {
  name            = "${var.environment}-superset"
  cluster         = aws_ecs_cluster.superset.id
  task_definition = aws_ecs_task_definition.superset.arn
  desired_count   = var.desired_count
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = aws_subnet.private[*].id
    security_groups  = [aws_security_group.superset.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.superset.arn
    container_name   = "superset"
    container_port   = 8088
  }

  depends_on = [
    aws_lb_listener.superset_https,
    aws_rds_cluster_instance.superset,
    aws_elasticache_cluster.superset
  ]

  tags = {
    Name = "${var.environment}-superset-service"
  }
}

Force HTTP-to-HTTPS redirects. Use a valid SSL certificate (from ACM or your certificate provider). The ALB health check hits /health, so make sure your Superset container responds to that endpoint.

Autoscaling

Scale Superset based on CPU and memory utilisation:

resource "aws_appautoscaling_target" "superset" {
  max_capacity       = var.max_capacity
  min_capacity       = var.min_capacity
  resource_id        = "service/${aws_ecs_cluster.superset.name}/${aws_ecs_service.superset.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "superset_cpu" {
  policy_name               = "${var.environment}-superset-cpu"
  policy_type               = "TargetTrackingScaling"
  resource_id               = aws_appautoscaling_target.superset.resource_id
  scalable_dimension        = aws_appautoscaling_target.superset.scalable_dimension
  service_namespace         = aws_appautoscaling_target.superset.service_namespace
  target_tracking_scaling_policy_configuration {
    target_value = 70.0

    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }

    scale_out_cooldown  = 60
    scale_in_cooldown   = 300
  }
}

resource "aws_appautoscaling_policy" "superset_memory" {
  policy_name               = "${var.environment}-superset-memory"
  policy_type               = "TargetTrackingScaling"
  resource_id               = aws_appautoscaling_target.superset.resource_id
  scalable_dimension        = aws_appautoscaling_target.superset.scalable_dimension
  service_namespace         = aws_appautoscaling_target.superset.service_namespace
  target_tracking_scaling_policy_configuration {
    target_value = 80.0

    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageMemoryUtilization"
    }

    scale_out_cooldown  = 60
    scale_in_cooldown   = 300
  }
}

Set conservative thresholds (70% CPU, 80% memory). This means Superset scales up before hitting resource limits, avoiding user-facing slowdowns. Scale-in cooldown is longer (300 seconds) to avoid thrashing.

Deployment Walkthrough

Now that you have the module, here’s how to deploy it step by step.

Step 1: Initialise Terraform

terraform init

This downloads the AWS provider and initialises the working directory.

Step 2: Create a Variables File

Create terraform.tfvars with your deployment settings:

aws_region           = "ap-southeast-2"
environment          = "production"
vpc_cidr             = "10.0.0.0/16"
db_name              = "superset"
db_username          = "superset_admin"
db_instance_class    = "db.r6g.large"
db_instance_count    = 2
task_cpu             = "512"
task_memory          = "1024"
superset_image       = "apache/superset:latest"
desired_count        = 3
min_capacity         = 2
max_capacity         = 10
load_examples        = false
ssl_certificate_arn  = "arn:aws:acm:ap-southeast-2:123456789:certificate/abc123"

For Sydney-based teams using PADISO’s platform development services, we typically set aws_region to ap-southeast-2 and tune db_instance_class and task_cpu based on your expected query load.

Step 3: Plan the Deployment

terraform plan -out=tfplan

Review the output to ensure all resources are being created as expected. Look for:

VPC and subnets
RDS cluster with multiple instances
ElastiCache Redis cluster
ECS cluster, task definition, and service
ALB and target group
Security groups and IAM roles

Step 4: Apply the Configuration

terraform apply tfplan

Terraform will create all resources. This typically takes 10–15 minutes for RDS and the ALB to be fully operational.

Step 5: Initialise the Superset Database

Once the ECS service is running, initialise the Superset metadata database:

aws ecs execute-command \
  --cluster production-superset \
  --task <task-id> \
  --container superset \
  --interactive \
  --command "/bin/bash"

Inside the container:

superset db upgrade
superset fab create-admin --username admin --password <secure-password> --email admin@example.com
superset load-examples  # Optional: load sample dashboards

Then exit and check the ALB’s DNS name:

terraform output alb_dns_name

Visit that URL in your browser, log in with the admin credentials, and verify Superset is running.

Operational Habits

Deploying Superset is one thing. Keeping it healthy in production is another. Here are the operational habits that separate a working deployment from a reliable one.

Database Maintenance

Superset’s metadata database grows over time. Run maintenance tasks weekly:

# Vacuum and analyse the database
aws ecs execute-command \
  --cluster production-superset \
  --task <task-id> \
  --container superset \
  --command "psql -h $DATABASE_HOST -U $DATABASE_USER -d $DATABASE_DB -c 'VACUUM ANALYZE;'"

Set up a CloudWatch Events rule to run this automatically:

resource "aws_cloudwatch_event_rule" "db_maintenance" {
  name                = "${var.environment}-superset-db-maintenance"
  description         = "Run database maintenance weekly"
  schedule_expression = "cron(0 2 ? * SUN *)"

  tags = {
    Name = "${var.environment}-superset-db-maintenance"
  }
}

Cache Eviction

Redis can fill up if query results aren’t expired. Configure Superset to evict old cache entries:

In your Superset config, set:

CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': f'redis://{os.environ.get("REDIS_HOST")}:{os.environ.get("REDIS_PORT")}/1',
    'CACHE_DEFAULT_TIMEOUT': 86400,  # 24 hours
}

RESULT_CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': f'redis://{os.environ.get("REDIS_HOST")}:{os.environ.get("REDIS_PORT")}/0',
    'CACHE_DEFAULT_TIMEOUT': 3600,  # 1 hour
}

Monitor Redis memory usage in CloudWatch. If it’s consistently above 80%, increase the ElastiCache node type or reduce CACHE_DEFAULT_TIMEOUT.

Log Aggregation

ECS sends logs to CloudWatch. Aggregate them centrally for easier troubleshooting:

resource "aws_cloudwatch_log_group" "superset" {
  name              = "/ecs/${var.environment}-superset"
  retention_in_days = 30

  tags = {
    Name = "${var.environment}-superset-logs"
  }
}

resource "aws_cloudwatch_log_stream" "superset" {
  name           = "superset"
  log_group_name = aws_cloudwatch_log_group.superset.name
}

Set up CloudWatch Insights queries to find errors:

fields @timestamp, @message
| filter @message like /ERROR/
| stats count() as error_count by @message

Backup and Recovery Testing

RDS handles backups automatically, but test recovery monthly. Create a snapshot and restore it to a separate environment:

aws rds create-db-cluster-snapshot \
  --db-cluster-identifier production-superset \
  --db-cluster-snapshot-identifier production-superset-test-$(date +%Y-%m-%d)

Restore it:

aws rds restore-db-cluster-from-snapshot \
  --db-cluster-identifier production-superset-restore \
  --snapshot-identifier production-superset-test-2024-01-15

Verify Superset can connect to the restored database. This proves your backups work and gives you confidence in disaster recovery.

Monitoring and Observability

You can’t operate what you can’t see. Set up comprehensive monitoring from day one.

CloudWatch Dashboards

Create a dashboard that shows the health of your Superset deployment:

resource "aws_cloudwatch_dashboard" "superset" {
  dashboard_name = "${var.environment}-superset"

  dashboard_body = jsonencode({
    widgets = [
      {
        type = "metric"
        properties = {
          metrics = [
            ["AWS/ECS", "CPUUtilization", { stat = "Average" }],
            [".", "MemoryUtilization", { stat = "Average" }],
            ["AWS/RDS", "DatabaseConnections", { stat = "Average" }],
            [".", "CPUUtilization", { stat = "Average" }],
            ["AWS/ElastiCache", "EngineCPUUtilization", { stat = "Average" }],
            [".", "DatabaseMemoryUsagePercentage", { stat = "Average" }]
          ]
          period = 300
          stat   = "Average"
          region = var.aws_region
          title  = "Superset Infrastructure Health"
        }
      },
      {
        type = "log"
        properties = {
          query   = "fields @timestamp, @message | filter @message like /ERROR/ | stats count() as error_count"
          region  = var.aws_region
          title   = "Error Count (Last Hour)"
        }
      }
    ]
  })
}

Visit the CloudWatch dashboard daily. Look for:

ECS CPU and memory trending upward (sign of load growth).
RDS connections stable (sign of healthy connection pooling).
Redis memory usage under 80%.
Error logs increasing (sign of application issues).

Alarms

Set up alarms for critical conditions:

resource "aws_cloudwatch_metric_alarm" "ecs_cpu" {
  alarm_name          = "${var.environment}-superset-ecs-cpu-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/ECS"
  period              = 300
  statistic           = "Average"
  threshold           = 85
  alarm_description   = "Alert when ECS CPU exceeds 85% for 10 minutes"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    ClusterName = aws_ecs_cluster.superset.name
    ServiceName = aws_ecs_service.superset.name
  }
}

resource "aws_cloudwatch_metric_alarm" "rds_cpu" {
  alarm_name          = "${var.environment}-superset-rds-cpu-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "AWS/RDS"
  period              = 300
  statistic           = "Average"
  threshold           = 80
  alarm_description   = "Alert when RDS CPU exceeds 80% for 10 minutes"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    DBClusterIdentifier = aws_rds_cluster.superset.cluster_identifier
  }
}

resource "aws_cloudwatch_metric_alarm" "alb_unhealthy" {
  alarm_name          = "${var.environment}-superset-alb-unhealthy"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "UnHealthyHostCount"
  namespace           = "AWS/ApplicationELB"
  period              = 60
  statistic           = "Sum"
  threshold           = 0
  alarm_description   = "Alert when ALB has unhealthy targets"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    LoadBalancer = aws_lb.superset.arn_suffix
    TargetGroup  = aws_lb_target_group.superset.arn_suffix
  }
}

resource "aws_sns_topic" "alerts" {
  name = "${var.environment}-superset-alerts"

  tags = {
    Name = "${var.environment}-superset-alerts"
  }
}

resource "aws_sns_topic_subscription" "alerts_email" {
  topic_arn = aws_sns_topic.alerts.arn
  protocol  = "email"
  endpoint  = var.alert_email
}

Subscribe your ops team to the SNS topic. When an alarm fires, they’ll get an email immediately.

Application-Level Monitoring

Superset exposes Prometheus metrics. Scrape them with a monitoring tool like Datadog or New Relic. Look for:

Query latency (p50, p95, p99).
Cache hit rate.
Celery task queue depth.
Database connection pool utilisation.

These metrics tell you if your deployment is healthy or struggling.

Next Steps

You now have a production-ready Terraform module for Apache Superset. Here’s what to do next:

1. Customise for Your Environment

Edit variables.tf to match your organisation’s naming conventions, CIDR blocks, and resource sizes. For teams in Australia working on regulated systems, ensure your variable defaults align with your compliance requirements. If you’re working with PADISO on platform development across Australia, we can help you tune these settings for your specific workload.

2. Add Data Source Connections

Superset needs to connect to your data warehouses (Snowflake, BigQuery, Redshift, ClickHouse, etc.). Add those connections in the Superset UI or via the API. Consider using Terraform to manage them:

resource "superset_database" "snowflake" {
  database_name = "snowflake_warehouse"
  sqlalchemy_uri = "snowflake://<user>:<password>@<account_id>.us-east-1/<database>/<schema>"
  expose_in_sqllab = true
}

3. Implement Security Hardening

Beyond the networking and secrets we’ve covered, add:

SAML or OAuth authentication (instead of local user management).
Row-level security (RLS) to restrict data access by user.
API key management for programmatic access.
Regular security audits using tools like Snyk or Trivy.

For teams pursuing SOC 2 or ISO 27001 compliance, these controls are essential. The PADISO Security Audit service can help you identify gaps and prioritise fixes.

4. Version Control Your Module

Commit your Terraform code to a Git repository (GitHub, GitLab, Bitbucket). Use branch protection rules to require code review before merging. This creates an audit trail and prevents accidental changes.

git init
git add .
git commit -m "Initial Superset Terraform module"
git push origin main

5. Set Up CI/CD

Automate Terraform plan and apply using GitHub Actions, GitLab CI, or AWS CodePipeline:

name: Terraform

on:
  push:
    branches:
      - main
  pull_request:

jobs:
  terraform:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: hashicorp/setup-terraform@v2
      - run: terraform init
      - run: terraform plan
      - run: terraform apply -auto-approve

This ensures every change is reviewed and tested before it reaches production.

6. Document Your Deployment

Create a README.md in your module directory:

# Superset Terraform Module

Production-ready Apache Superset deployment on AWS.

## Prerequisites

- AWS account with appropriate permissions.
- Terraform >= 1.0.
- SSL certificate in ACM.

## Usage

```bash
terraform init
terraform plan -out=tfplan
terraform apply tfplan

Outputs

alb_dns_name: DNS name of the Application Load Balancer.
rds_endpoint: RDS cluster endpoint.
redis_endpoint: ElastiCache Redis endpoint.

Monitoring

View the CloudWatch dashboard:

aws cloudwatch get-dashboard --dashboard-name production-superset

Troubleshooting

…


Documentation is invaluable when onboarding new team members or returning to the code months later.

### 7. Consider Professional Support

If you're building a complex data platform or need help with compliance, consider working with experts. [PADISO offers platform engineering services](https://www.padiso.co/services/) across Australia, the US, Canada, and New Zealand. Whether you're in [San Francisco](https://www.padiso.co/platform-development-san-francisco/), [Boston](https://www.padiso.co/platform-development-boston/), [Seattle](https://www.padiso.co/platform-development-seattle/), or elsewhere, we can help you architect, deploy, and operate Superset at scale. We also provide fractional CTO leadership and co-build support for startups and enterprises modernising their data infrastructure.

---

## Summary

Apache Superset is a powerful open-source BI tool, but deploying it reliably requires careful infrastructure planning. This Terraform module gives you a repeatable, auditable, scalable foundation.

Key takeaways:

- **Networking**: Segment your infrastructure with VPCs, subnets, and security groups. Only expose the ALB to the internet.
- **Storage**: Use RDS for metadata, S3 for backups, and ElastiCache for caching. Enable encryption, versioning, and lifecycle policies.
- **Secrets**: Never hardcode credentials. Use AWS Secrets Manager and IAM roles.
- **Compute**: Run Superset on ECS Fargate with autoscaling. Let Terraform manage the scaling policies.
- **Operations**: Automate backups, cache maintenance, and log aggregation. Set up alarms for critical conditions.
- **Compliance**: Use Terraform to document your infrastructure. Every change is auditable, which is what SOC 2 and ISO 27001 auditors want to see.

With this module, you can deploy a production-grade Superset cluster in 15 minutes and operate it with confidence. Version-control your infrastructure, automate your deployments, and focus on delivering insights to your users.

For teams in Australia, the US, or elsewhere who need expert guidance, [PADISO](https://www.padiso.co/) specialises in platform engineering, AI automation, and compliance-ready infrastructure. [Book a call](https://www.padiso.co/platform-development-sydney/) to discuss your Superset deployment or broader data platform strategy.

Want to talk through your situation?

Book a 30-minute call with Kevin (Founder/CEO). No pitch — direct advice on what to do next.

Book a 30-min call