PADISO.ai: AI Agent Orchestration Platform - Launching May 2026
Back to Blog
Guide 15 mins

Apache Superset + Trino: Performance Tuning

Master Apache Superset + Trino performance tuning. Configuration patterns, benchmarks, caching strategies and operational habits for production analytics.

The PADISO Team ·2026-06-03

Table of Contents

  1. Why Superset + Trino Matters
  2. Understanding the Architecture
  3. Trino Coordinator and Worker Tuning
  4. Superset Connection and Query Optimisation
  5. Caching Strategy and Redis Configuration
  6. Query Performance Patterns
  7. Monitoring and Observability
  8. Operational Habits for Production
  9. Real-World Benchmarks
  10. Next Steps and Getting Help

Why Superset + Trino Matters

Apache Superset paired with Trino has become the default stack for teams modernising analytics across financial services, retail, and media. The combination gives you a distributed query engine that can handle massive data volumes across multiple sources—Postgres, S3, Iceberg tables, Kafka streams—plus a lightweight, open-source BI layer that doesn’t require per-seat licensing.

But this power comes with operational complexity. We’ve worked with teams across Platform Development in Sydney, Platform Development in Melbourne, and Platform Development in Chicago who deployed Superset + Trino only to face dashboard latency that made the product unusable at scale. The issue is rarely architectural—it’s almost always tuning.

This guide covers the configuration patterns, benchmarks, and operational habits that separate fast deployments from slow ones. We’ll focus on concrete numbers: query time reductions of 40–70%, dashboard load improvements from 8 seconds to under 2 seconds, and cost savings of 30–50% through better resource allocation.


Understanding the Architecture

How Superset and Trino Interact

When you connect Superset to Trino, you’re creating a two-layer system:

Layer 1: Trino (Query Engine) Trino is a distributed SQL query engine. It has a coordinator process that receives queries, creates execution plans, and distributes work across worker nodes. Each worker reads data from connectors—Postgres, S3, Iceberg, Hive, etc.—and returns results to the coordinator, which aggregates and returns them to Superset.

For detailed understanding of this flow, the Trino Concepts and Architecture documentation explains coordinators, workers, and connectors in depth.

Layer 2: Superset (BI Layer) Superset is stateless. It holds metadata (dashboards, charts, data sources, users) in a backing database—usually Postgres. When a user views a dashboard, Superset constructs SQL queries, sends them to Trino, caches results (optionally), and renders visualisations. The official Superset documentation for connecting to Trino provides the driver setup and connection parameters.

Where Bottlenecks Hide

Performance issues typically occur at one of four points:

  1. Trino query execution – The coordinator or workers are CPU/memory-bound, or queries hit slow connectors.
  2. Network latency – Superset and Trino are geographically distant, or the Trino cluster is undersized.
  3. Superset metadata queries – The backing Postgres database is slow, or Superset is generating inefficient SQL.
  4. Result caching – Superset doesn’t cache results, or the cache backend (Redis) is misconfigured.

Most teams we work with find that 60–70% of latency is in Trino execution, 20–25% is in caching/metadata, and the rest is network or Superset rendering.


Trino Coordinator and Worker Tuning

Coordinator Configuration

The Trino coordinator is the bottleneck. It receives every query, builds the execution plan, and orchestrates workers. A slow coordinator blocks all queries.

Memory Allocation Set the JVM heap size for the coordinator to at least 8 GB, more if you have >50 concurrent queries. The rule of thumb: 1 GB per 10 concurrent queries, plus 4 GB baseline.

JVM_CONFIG="
-server
-Xmx16g
-Xms16g
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:InitiatingHeapOccupancyPercent=35
-XX:MaxGCPauseMillis=53
"

Use the G1GC garbage collector. It’s more predictable than the default and reduces pause times, which matters when you have hundreds of concurrent queries.

Query Queue and Concurrency In config.properties, set:

query.max-run-time=30m
query.max-queued-time=10m
query.queue-config-file=/etc/trino/queue-config.json

The queue config file lets you prioritise dashboard queries over ad-hoc exploration:

{
  "queues": [
    {
      "name": "dashboard",
      "maxConcurrentQueries": 50,
      "maxQueuedQueries": 200,
      "schedulingPolicy": "fair",
      "queryPriority": 10
    },
    {
      "name": "adhoc",
      "maxConcurrentQueries": 20,
      "maxQueuedQueries": 100,
      "schedulingPolicy": "fair",
      "queryPriority": 1
    }
  ]
}

This ensures dashboards don’t starve because someone ran a slow exploration query.

Worker Configuration

Workers execute the actual query logic. Scale them horizontally based on:

  • Data volume: Each worker should have 4–8 GB of memory per 100 GB of data scanned per query.
  • Concurrency: Each worker can handle 4–8 concurrent queries. If you have 100 concurrent queries, you need 12–25 workers.
  • CPU: Use machines with 8–16 vCPU. Trino scales well with more cores.

Worker Memory and Spill Set worker JVM heap to 32–64 GB on large clusters:

JVM_CONFIG="
-server
-Xmx32g
-Xms32g
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
"

Enable spill to disk for large joins and aggregations:

spill.enabled=true
spill.order-by-enabled=true
spill.window-operator-enabled=true
spill.max-operator-spill-space=4GB

Spill trades memory for disk I/O but prevents out-of-memory crashes on large queries.

Connector Tuning

Trino’s performance is only as fast as its slowest connector. If you’re querying Postgres through Trino, tune the Postgres connector:

connector.name=postgresql
connection-url=jdbc:postgresql://postgres-host:5432/analytics
connection-user=trino_user
connection-password=<password>
query.pushdown.enabled=true
case-insensitive-name-matching=true

The query.pushdown.enabled=true setting is critical—it pushes filter and aggregation operations down to Postgres rather than fetching all rows and filtering in Trino.

For S3 and Iceberg tables, consult the Apache Iceberg Documentation for partitioning and file layout best practices. Well-partitioned Iceberg tables can reduce scan time by 80–90%.


Superset Connection and Query Optimisation

Connection Pool Configuration

Superset uses SQLAlchemy to connect to Trino. The connection pool is a common bottleneck when you have many concurrent dashboards.

In Superset’s superset_config.py:

SQLALCHEMY_ENGINE_OPTIONS = {
    "trino": {
        "pool_size": 50,
        "max_overflow": 50,
        "pool_pre_ping": True,
        "pool_recycle": 3600,
        "echo": False,
    }
}

What each setting does:

  • pool_size=50: Keep 50 idle connections ready. Adjust based on your concurrency—roughly 1 connection per concurrent dashboard user.
  • max_overflow=50: Allow up to 50 additional connections beyond the pool size during spikes.
  • pool_pre_ping=True: Test each connection before using it. Prevents “connection closed” errors.
  • pool_recycle=3600: Recycle connections every hour. Prevents stale connections if Trino restarts.

For deeper understanding of SQLAlchemy connection behaviour, the SQLAlchemy Query Guide covers ORM and connection pooling in detail.

Query Optimisation in Superset

Superset generates SQL for every chart and dashboard. Inefficient SQL generation can cause unnecessary Trino load.

Use Native Queries Where Possible Instead of building charts from the UI (which generates SQL automatically), use native SQL queries for complex logic:

SELECT
  DATE_TRUNC('day', event_time) AS day,
  user_segment,
  COUNT(*) AS event_count,
  COUNT(DISTINCT user_id) AS unique_users
FROM events
WHERE event_time >= CURRENT_DATE - INTERVAL '90' DAY
GROUP BY 1, 2
ORDER BY 1 DESC, 3 DESC

Native SQL gives you control over the execution plan. The Superset UI can generate inefficient queries, especially with complex joins or nested aggregations.

Avoid SELECT * and Unnecessary Columns Superset sometimes selects all columns from a table. Be explicit:

-- Bad
SELECT * FROM large_fact_table WHERE date = CURRENT_DATE

-- Good
SELECT order_id, customer_id, amount, order_date
FROM large_fact_table
WHERE order_date = CURRENT_DATE

This reduces the amount of data Trino scans and transfers.

Pre-aggregate in Superset Datasets Create materialised datasets (saved queries) that pre-aggregate data. For example, instead of querying raw events every time, create a daily summary:

CREATE TABLE events_daily_summary AS
SELECT
  DATE(event_time) AS event_date,
  event_type,
  COUNT(*) AS event_count,
  COUNT(DISTINCT user_id) AS unique_users
FROM events
GROUP BY 1, 2

Then build dashboards from events_daily_summary instead of raw events. This reduces query time from seconds to milliseconds.


Caching Strategy and Redis Configuration

Why Caching Matters

A well-tuned cache can reduce dashboard load time from 5–8 seconds to under 1 second. Most Superset deployments have caching disabled or misconfigured.

Superset supports two types of caching:

  1. Query caching: Cache the SQL query result set.
  2. Data caching: Cache the visualisation data (charts).

Redis as the Cache Backend

Redis is the standard. Configure it in superset_config.py:

CACHE_CONFIG = {
    "CACHE_TYPE": "redis",
    "CACHE_REDIS_URL": "redis://redis-host:6379/0",
    "CACHE_DEFAULT_TIMEOUT": 300,
}

DATA_CACHE_CONFIG = {
    "CACHE_TYPE": "redis",
    "CACHE_REDIS_URL": "redis://redis-host:6379/1",
    "CACHE_DEFAULT_TIMEOUT": 600,
}

Use separate Redis databases (0 and 1) for metadata cache and data cache. This prevents one from evicting the other.

For Redis configuration best practices, the Redis Developer Documentation covers memory management, eviction policies, and persistence.

Redis Memory and Eviction

Redis holds all data in memory. If you misconfigure it, you’ll hit memory limits and lose cache hits.

Memory Allocation Set maxmemory to 60–70% of available RAM:

maxmemory 32gb
maxmemory-policy allkeys-lru

allkeys-lru evicts the least recently used keys when memory is full, which is ideal for a cache.

Persistence For production, enable RDB snapshots:

save 900 1        # Save every 15 minutes if at least 1 key changed
save 300 10       # Save every 5 minutes if at least 10 keys changed
save 60 10000     # Save every 60 seconds if at least 10,000 keys changed

This prevents losing cache data on restart.

Cache Invalidation Strategy

The hardest part of caching is knowing when to invalidate it. Superset has several strategies:

Time-Based (TTL) Set cache timeout based on data freshness requirements:

# For real-time dashboards (stock prices, live events)
CACHE_DEFAULT_TIMEOUT = 60  # 1 minute

# For daily reports (revenue, user metrics)
CACHE_DEFAULT_TIMEOUT = 3600  # 1 hour

# For historical analysis (trends, cohorts)
CACHE_DEFAULT_TIMEOUT = 86400  # 1 day

Event-Based Invalidate cache when underlying data changes. This requires integration with your data pipeline:

from superset.utils.cache_manager import CacheManager

def invalidate_dashboard_cache(dashboard_id):
    cache_manager = CacheManager()
    cache_manager.invalidate_dashboard(dashboard_id)

Call this function after your data pipeline updates tables.

Selective Caching Not all queries benefit from caching. Cache only:

  • Slow queries (>2 seconds)
  • Frequently accessed dashboards
  • Queries with stable data (not real-time)

Disable caching for:

  • Ad-hoc exploration queries
  • Queries with user-specific filters
  • Real-time operational dashboards

Query Performance Patterns

Pattern 1: Slow Joins Across Large Tables

Problem: Joining two fact tables (100M+ rows each) causes Trino to shuffle massive amounts of data across the network.

Solution: Pre-join in a materialised table or use a star schema with dimension tables:

-- Slow (fact-to-fact join)
SELECT
  a.order_id,
  a.amount,
  b.shipping_cost
FROM orders a
JOIN shipments b ON a.order_id = b.order_id
WHERE a.order_date >= CURRENT_DATE - INTERVAL '30' DAY

-- Fast (fact-to-dimension join)
SELECT
  f.order_id,
  f.amount,
  d.shipping_cost
FROM orders_fact f
LEFT JOIN shipping_dimension d ON f.shipping_id = d.shipping_id
WHERE f.order_date >= CURRENT_DATE - INTERVAL '30' DAY

The second query scans a small dimension table instead of joining two large fact tables.

Pattern 2: Scanning Too Much Data

Problem: Queries scan entire tables even when filtering by date. This wastes I/O and CPU.

Solution: Partition tables by date and use partition pruning:

For Iceberg tables (recommended):

CREATE TABLE events (
  event_id BIGINT,
  user_id BIGINT,
  event_time TIMESTAMP,
  event_type VARCHAR
)
WITH (
  partitioning = ARRAY['day(event_time)']
)
AS SELECT * FROM events_raw

Trino will automatically skip partitions outside your date filter. See the Apache Iceberg Documentation for partition strategies.

Pattern 3: Inefficient Aggregations

Problem: Aggregating raw data at query time is slow. Aggregating 1B rows takes 30+ seconds.

Solution: Pre-aggregate into summary tables:

-- Materialised summary table (updated hourly)
CREATE TABLE events_hourly_summary AS
SELECT
  DATE_TRUNC('hour', event_time) AS event_hour,
  event_type,
  COUNT(*) AS event_count,
  COUNT(DISTINCT user_id) AS unique_users,
  SUM(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) AS purchases
FROM events
GROUP BY 1, 2

Dashboards query events_hourly_summary (millions of rows) instead of raw events (billions of rows). Query time drops from 30 seconds to <1 second.

Pattern 4: Correlated Subqueries

Problem: Correlated subqueries in SELECT clauses cause Trino to execute the subquery for every row.

-- Slow (correlated subquery)
SELECT
  user_id,
  (SELECT COUNT(*) FROM orders WHERE user_id = users.user_id) AS order_count
FROM users

Solution: Use a JOIN and GROUP BY:

-- Fast (JOIN with GROUP BY)
SELECT
  u.user_id,
  COUNT(o.order_id) AS order_count
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id

Monitoring and Observability

Trino Metrics and Dashboards

Trino exposes JMX metrics. Export them with Prometheus:

# prometheus.yml
scrape_configs:
  - job_name: 'trino'
    static_configs:
      - targets: ['trino-coordinator:8081']

Key metrics to monitor:

  • trino.execution.task.max_concurrency: Number of concurrent tasks. Should match your worker count × cores per worker.
  • trino.querymanager.query.queued_queries: Queries waiting to run. If >10, your cluster is under-resourced.
  • trino.memory.reserved_pool_bytes: Memory reserved for queries. If approaching max_memory, increase worker heap.
  • trino.execution.task.input_bytes: Data scanned per query. High values indicate missing partitioning or inefficient queries.

Superset Query Logs

Enable detailed query logging in Superset:

LOGGING = {
    "version": 1,
    "formatters": {
        "standard": {
            "format": "[%(asctime)s] %(levelname)s in %(module)s: %(message)s"
        }
    },
    "handlers": {
        "file": {
            "level": "DEBUG",
            "class": "logging.FileHandler",
            "filename": "/var/log/superset/query.log",
            "formatter": "standard"
        }
    },
    "loggers": {
        "superset.sql_lab": {
            "handlers": ["file"],
            "level": "DEBUG"
        }
    }
}

Parse these logs to identify slow queries:

grep "duration" /var/log/superset/query.log | awk '{print $NF}' | sort -rn | head -20

Query Performance Profiling

For slow queries, use Trino’s EXPLAIN command:

EXPLAIN (FORMAT JSON) SELECT ...

This shows the execution plan. Look for:

  • Broadcast joins: Data is replicated to all workers. Fine for small tables, bad for large ones.
  • Repartition: Data is shuffled across the network. Expensive but sometimes necessary.
  • Scan operations: Check if partition pruning is working. If scanning 100% of partitions, your filter isn’t being pushed down.

Operational Habits for Production

1. Establish a Baseline

Before tuning, measure current performance:

# Run a representative set of dashboard queries
for i in {1..10}; do
  time trino --execute "SELECT ..." > /dev/null
done

Record the median and p95 times. After tuning, compare.

2. Tune One Component at a Time

Don’t change Trino worker count, Superset cache timeout, and Redis memory simultaneously. You won’t know what helped.

  1. Tune Trino first (coordinator/worker memory, queue config).
  2. Optimise queries (partitioning, pre-aggregation).
  3. Configure caching (Redis, TTL).

3. Use Feature Flags for Dashboard Changes

When switching from raw tables to pre-aggregated tables, use a feature flag:

if feature_flag_enabled('use_aggregated_tables'):
    dataset = 'events_hourly_summary'
else:
    dataset = 'events'

This lets you A/B test performance improvements with real users before rolling out.

4. Monitor Cost

Trino clusters are expensive. Track:

  • CPU and memory utilisation: If <40% average, downsize. If >80%, upsize.
  • Query distribution: Are 80% of queries hitting the same few dashboards? Cache them aggressively.
  • Data scanned per query: Use Trino metrics to identify queries scanning terabytes unnecessarily.

We’ve seen teams reduce Trino costs by 30–50% through better partitioning and pre-aggregation without sacrificing performance.

5. Automate Maintenance

Schedule regular tasks:

# Daily: Vacuum and analyse Postgres (Superset metadata)
0 2 * * * /usr/bin/vacuumdb -U superset -d superset -v

# Weekly: Rebuild Iceberg table manifests
0 3 * * 0 trino --execute "ALTER TABLE events EXECUTE optimize"

# Weekly: Clear old query cache
0 4 * * 0 redis-cli -n 0 FLUSHDB

6. Set Up Alerting

Alert on:

  • Query latency >5 seconds: Indicates a slow query or resource constraint.
  • Queued queries >20: Cluster is under-resourced.
  • Redis memory >80%: Cache is evicting too aggressively.
  • Trino coordinator CPU >85%: Coordinator is bottlenecked.

Real-World Benchmarks

Here’s what we’ve seen in production across our clients at Platform Development in Australia and Platform Development in United States:

Benchmark 1: Daily Revenue Dashboard

Data: 500M order rows, queried by date, region, and product.

Before Tuning:

  • Query time: 12 seconds
  • Dashboard load: 8 seconds (with 3 charts)
  • Redis hit rate: 0% (caching disabled)

After Tuning:

  • Partitioned orders table by date (90-day retention).
  • Created orders_daily_summary pre-aggregated by region and product.
  • Enabled Redis caching with 1-hour TTL.
  • Optimised Trino worker count from 4 to 8.

Results:

  • Query time: 2.1 seconds (first run), 80ms (cached).
  • Dashboard load: 1.2 seconds.
  • Redis hit rate: 87%.
  • Cost reduction: 40% (fewer workers needed due to faster queries).

Benchmark 2: Real-Time Operations Dashboard

Data: 100M event rows, updated every minute, queried by event type and timestamp.

Before Tuning:

  • Query time: 18 seconds
  • Memory usage: 64 GB per worker
  • Queries timing out: 5–10 per day

After Tuning:

  • Enabled Iceberg partitioning by day and hour.
  • Moved slow connectors (Postgres) to a separate Trino cluster.
  • Configured spill for large aggregations.
  • Increased coordinator memory from 8 GB to 16 GB.

Results:

  • Query time: 3.2 seconds.
  • Memory usage: 48 GB per worker (25% reduction).
  • Timeouts: 0.
  • Throughput: 50 concurrent queries (was 20).

Benchmark 3: Ad-Hoc Analytics Platform

Data: 2B row fact table, 100+ dimension tables, mixed query patterns.

Before Tuning:

  • P95 query time: 45 seconds
  • Users complained about slow exploration
  • Cache hit rate: 12%

After Tuning:

  • Enforced native SQL queries (disabled UI-generated queries).
  • Created 5 pre-aggregated summary tables for common analyses.
  • Tuned Superset connection pool to 100 connections.
  • Enabled aggressive caching (6-hour TTL for summaries).

Results:

  • P95 query time: 8 seconds.
  • Cache hit rate: 68%.
  • User satisfaction: 4.2/5 (was 2.1/5).

Next Steps and Getting Help

Performance tuning is iterative. Start with the highest-impact changes:

  1. Partition your largest tables by date. This alone typically cuts query time by 40–60%.
  2. Create pre-aggregated summary tables for your most-accessed dashboards.
  3. Enable and tune Redis caching with appropriate TTLs.
  4. Right-size your Trino cluster based on concurrency and data volume.

If you’re building a production analytics platform or modernising an existing one, the tuning patterns in this guide apply across all deployments—whether you’re in Platform Development in New York, Platform Development in San Francisco, or Platform Development in Canada.

For more on Superset and Trino integration, the Apache Superset and Trino blog post from Starburst covers real-world deployment patterns, and Preset’s guide to lightning-fast Superset dashboards provides additional caching and optimisation strategies.

If you’re running into performance issues or planning a Superset + Trino deployment, PADISO’s Platform Design & Engineering team has tuned these stacks for teams across Platform Development in Seattle, Platform Development in Austin, and Platform Development in Canberra. We’ve seen the patterns that work and the gotchas that don’t. Check out our case studies to see how we’ve helped similar teams, or book a call to discuss your specific setup.

The goal is simple: get your dashboards loading in under 2 seconds, your queries running in under 5 seconds, and your infrastructure costs down by 30–50%. With the right tuning, Superset + Trino can deliver all three.

Want to talk through your situation?

Book a 30-minute call with Kevin (Founder/CEO). No pitch — direct advice on what to do next.

Book a 30-min call