Table of Contents
- Why Superset + Trino Matters
- Understanding the Architecture
- Trino Coordinator and Worker Tuning
- Superset Connection and Query Optimisation
- Caching Strategy and Redis Configuration
- Query Performance Patterns
- Monitoring and Observability
- Operational Habits for Production
- Real-World Benchmarks
- Next Steps and Getting Help
Why Superset + Trino Matters
Apache Superset paired with Trino has become the default stack for teams modernising analytics across financial services, retail, and media. The combination gives you a distributed query engine that can handle massive data volumes across multiple sources—Postgres, S3, Iceberg tables, Kafka streams—plus a lightweight, open-source BI layer that doesn’t require per-seat licensing.
But this power comes with operational complexity. We’ve worked with teams across Platform Development in Sydney, Platform Development in Melbourne, and Platform Development in Chicago who deployed Superset + Trino only to face dashboard latency that made the product unusable at scale. The issue is rarely architectural—it’s almost always tuning.
This guide covers the configuration patterns, benchmarks, and operational habits that separate fast deployments from slow ones. We’ll focus on concrete numbers: query time reductions of 40–70%, dashboard load improvements from 8 seconds to under 2 seconds, and cost savings of 30–50% through better resource allocation.
Understanding the Architecture
How Superset and Trino Interact
When you connect Superset to Trino, you’re creating a two-layer system:
Layer 1: Trino (Query Engine) Trino is a distributed SQL query engine. It has a coordinator process that receives queries, creates execution plans, and distributes work across worker nodes. Each worker reads data from connectors—Postgres, S3, Iceberg, Hive, etc.—and returns results to the coordinator, which aggregates and returns them to Superset.
For detailed understanding of this flow, the Trino Concepts and Architecture documentation explains coordinators, workers, and connectors in depth.
Layer 2: Superset (BI Layer) Superset is stateless. It holds metadata (dashboards, charts, data sources, users) in a backing database—usually Postgres. When a user views a dashboard, Superset constructs SQL queries, sends them to Trino, caches results (optionally), and renders visualisations. The official Superset documentation for connecting to Trino provides the driver setup and connection parameters.
Where Bottlenecks Hide
Performance issues typically occur at one of four points:
- Trino query execution – The coordinator or workers are CPU/memory-bound, or queries hit slow connectors.
- Network latency – Superset and Trino are geographically distant, or the Trino cluster is undersized.
- Superset metadata queries – The backing Postgres database is slow, or Superset is generating inefficient SQL.
- Result caching – Superset doesn’t cache results, or the cache backend (Redis) is misconfigured.
Most teams we work with find that 60–70% of latency is in Trino execution, 20–25% is in caching/metadata, and the rest is network or Superset rendering.
Trino Coordinator and Worker Tuning
Coordinator Configuration
The Trino coordinator is the bottleneck. It receives every query, builds the execution plan, and orchestrates workers. A slow coordinator blocks all queries.
Memory Allocation Set the JVM heap size for the coordinator to at least 8 GB, more if you have >50 concurrent queries. The rule of thumb: 1 GB per 10 concurrent queries, plus 4 GB baseline.
JVM_CONFIG="
-server
-Xmx16g
-Xms16g
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:InitiatingHeapOccupancyPercent=35
-XX:MaxGCPauseMillis=53
"
Use the G1GC garbage collector. It’s more predictable than the default and reduces pause times, which matters when you have hundreds of concurrent queries.
Query Queue and Concurrency
In config.properties, set:
query.max-run-time=30m
query.max-queued-time=10m
query.queue-config-file=/etc/trino/queue-config.json
The queue config file lets you prioritise dashboard queries over ad-hoc exploration:
{
"queues": [
{
"name": "dashboard",
"maxConcurrentQueries": 50,
"maxQueuedQueries": 200,
"schedulingPolicy": "fair",
"queryPriority": 10
},
{
"name": "adhoc",
"maxConcurrentQueries": 20,
"maxQueuedQueries": 100,
"schedulingPolicy": "fair",
"queryPriority": 1
}
]
}
This ensures dashboards don’t starve because someone ran a slow exploration query.
Worker Configuration
Workers execute the actual query logic. Scale them horizontally based on:
- Data volume: Each worker should have 4–8 GB of memory per 100 GB of data scanned per query.
- Concurrency: Each worker can handle 4–8 concurrent queries. If you have 100 concurrent queries, you need 12–25 workers.
- CPU: Use machines with 8–16 vCPU. Trino scales well with more cores.
Worker Memory and Spill Set worker JVM heap to 32–64 GB on large clusters:
JVM_CONFIG="
-server
-Xmx32g
-Xms32g
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
"
Enable spill to disk for large joins and aggregations:
spill.enabled=true
spill.order-by-enabled=true
spill.window-operator-enabled=true
spill.max-operator-spill-space=4GB
Spill trades memory for disk I/O but prevents out-of-memory crashes on large queries.
Connector Tuning
Trino’s performance is only as fast as its slowest connector. If you’re querying Postgres through Trino, tune the Postgres connector:
connector.name=postgresql
connection-url=jdbc:postgresql://postgres-host:5432/analytics
connection-user=trino_user
connection-password=<password>
query.pushdown.enabled=true
case-insensitive-name-matching=true
The query.pushdown.enabled=true setting is critical—it pushes filter and aggregation operations down to Postgres rather than fetching all rows and filtering in Trino.
For S3 and Iceberg tables, consult the Apache Iceberg Documentation for partitioning and file layout best practices. Well-partitioned Iceberg tables can reduce scan time by 80–90%.
Superset Connection and Query Optimisation
Connection Pool Configuration
Superset uses SQLAlchemy to connect to Trino. The connection pool is a common bottleneck when you have many concurrent dashboards.
In Superset’s superset_config.py:
SQLALCHEMY_ENGINE_OPTIONS = {
"trino": {
"pool_size": 50,
"max_overflow": 50,
"pool_pre_ping": True,
"pool_recycle": 3600,
"echo": False,
}
}
What each setting does:
pool_size=50: Keep 50 idle connections ready. Adjust based on your concurrency—roughly 1 connection per concurrent dashboard user.max_overflow=50: Allow up to 50 additional connections beyond the pool size during spikes.pool_pre_ping=True: Test each connection before using it. Prevents “connection closed” errors.pool_recycle=3600: Recycle connections every hour. Prevents stale connections if Trino restarts.
For deeper understanding of SQLAlchemy connection behaviour, the SQLAlchemy Query Guide covers ORM and connection pooling in detail.
Query Optimisation in Superset
Superset generates SQL for every chart and dashboard. Inefficient SQL generation can cause unnecessary Trino load.
Use Native Queries Where Possible Instead of building charts from the UI (which generates SQL automatically), use native SQL queries for complex logic:
SELECT
DATE_TRUNC('day', event_time) AS day,
user_segment,
COUNT(*) AS event_count,
COUNT(DISTINCT user_id) AS unique_users
FROM events
WHERE event_time >= CURRENT_DATE - INTERVAL '90' DAY
GROUP BY 1, 2
ORDER BY 1 DESC, 3 DESC
Native SQL gives you control over the execution plan. The Superset UI can generate inefficient queries, especially with complex joins or nested aggregations.
Avoid SELECT * and Unnecessary Columns Superset sometimes selects all columns from a table. Be explicit:
-- Bad
SELECT * FROM large_fact_table WHERE date = CURRENT_DATE
-- Good
SELECT order_id, customer_id, amount, order_date
FROM large_fact_table
WHERE order_date = CURRENT_DATE
This reduces the amount of data Trino scans and transfers.
Pre-aggregate in Superset Datasets Create materialised datasets (saved queries) that pre-aggregate data. For example, instead of querying raw events every time, create a daily summary:
CREATE TABLE events_daily_summary AS
SELECT
DATE(event_time) AS event_date,
event_type,
COUNT(*) AS event_count,
COUNT(DISTINCT user_id) AS unique_users
FROM events
GROUP BY 1, 2
Then build dashboards from events_daily_summary instead of raw events. This reduces query time from seconds to milliseconds.
Caching Strategy and Redis Configuration
Why Caching Matters
A well-tuned cache can reduce dashboard load time from 5–8 seconds to under 1 second. Most Superset deployments have caching disabled or misconfigured.
Superset supports two types of caching:
- Query caching: Cache the SQL query result set.
- Data caching: Cache the visualisation data (charts).
Redis as the Cache Backend
Redis is the standard. Configure it in superset_config.py:
CACHE_CONFIG = {
"CACHE_TYPE": "redis",
"CACHE_REDIS_URL": "redis://redis-host:6379/0",
"CACHE_DEFAULT_TIMEOUT": 300,
}
DATA_CACHE_CONFIG = {
"CACHE_TYPE": "redis",
"CACHE_REDIS_URL": "redis://redis-host:6379/1",
"CACHE_DEFAULT_TIMEOUT": 600,
}
Use separate Redis databases (0 and 1) for metadata cache and data cache. This prevents one from evicting the other.
For Redis configuration best practices, the Redis Developer Documentation covers memory management, eviction policies, and persistence.
Redis Memory and Eviction
Redis holds all data in memory. If you misconfigure it, you’ll hit memory limits and lose cache hits.
Memory Allocation
Set maxmemory to 60–70% of available RAM:
maxmemory 32gb
maxmemory-policy allkeys-lru
allkeys-lru evicts the least recently used keys when memory is full, which is ideal for a cache.
Persistence For production, enable RDB snapshots:
save 900 1 # Save every 15 minutes if at least 1 key changed
save 300 10 # Save every 5 minutes if at least 10 keys changed
save 60 10000 # Save every 60 seconds if at least 10,000 keys changed
This prevents losing cache data on restart.
Cache Invalidation Strategy
The hardest part of caching is knowing when to invalidate it. Superset has several strategies:
Time-Based (TTL) Set cache timeout based on data freshness requirements:
# For real-time dashboards (stock prices, live events)
CACHE_DEFAULT_TIMEOUT = 60 # 1 minute
# For daily reports (revenue, user metrics)
CACHE_DEFAULT_TIMEOUT = 3600 # 1 hour
# For historical analysis (trends, cohorts)
CACHE_DEFAULT_TIMEOUT = 86400 # 1 day
Event-Based Invalidate cache when underlying data changes. This requires integration with your data pipeline:
from superset.utils.cache_manager import CacheManager
def invalidate_dashboard_cache(dashboard_id):
cache_manager = CacheManager()
cache_manager.invalidate_dashboard(dashboard_id)
Call this function after your data pipeline updates tables.
Selective Caching Not all queries benefit from caching. Cache only:
- Slow queries (>2 seconds)
- Frequently accessed dashboards
- Queries with stable data (not real-time)
Disable caching for:
- Ad-hoc exploration queries
- Queries with user-specific filters
- Real-time operational dashboards
Query Performance Patterns
Pattern 1: Slow Joins Across Large Tables
Problem: Joining two fact tables (100M+ rows each) causes Trino to shuffle massive amounts of data across the network.
Solution: Pre-join in a materialised table or use a star schema with dimension tables:
-- Slow (fact-to-fact join)
SELECT
a.order_id,
a.amount,
b.shipping_cost
FROM orders a
JOIN shipments b ON a.order_id = b.order_id
WHERE a.order_date >= CURRENT_DATE - INTERVAL '30' DAY
-- Fast (fact-to-dimension join)
SELECT
f.order_id,
f.amount,
d.shipping_cost
FROM orders_fact f
LEFT JOIN shipping_dimension d ON f.shipping_id = d.shipping_id
WHERE f.order_date >= CURRENT_DATE - INTERVAL '30' DAY
The second query scans a small dimension table instead of joining two large fact tables.
Pattern 2: Scanning Too Much Data
Problem: Queries scan entire tables even when filtering by date. This wastes I/O and CPU.
Solution: Partition tables by date and use partition pruning:
For Iceberg tables (recommended):
CREATE TABLE events (
event_id BIGINT,
user_id BIGINT,
event_time TIMESTAMP,
event_type VARCHAR
)
WITH (
partitioning = ARRAY['day(event_time)']
)
AS SELECT * FROM events_raw
Trino will automatically skip partitions outside your date filter. See the Apache Iceberg Documentation for partition strategies.
Pattern 3: Inefficient Aggregations
Problem: Aggregating raw data at query time is slow. Aggregating 1B rows takes 30+ seconds.
Solution: Pre-aggregate into summary tables:
-- Materialised summary table (updated hourly)
CREATE TABLE events_hourly_summary AS
SELECT
DATE_TRUNC('hour', event_time) AS event_hour,
event_type,
COUNT(*) AS event_count,
COUNT(DISTINCT user_id) AS unique_users,
SUM(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) AS purchases
FROM events
GROUP BY 1, 2
Dashboards query events_hourly_summary (millions of rows) instead of raw events (billions of rows). Query time drops from 30 seconds to <1 second.
Pattern 4: Correlated Subqueries
Problem: Correlated subqueries in SELECT clauses cause Trino to execute the subquery for every row.
-- Slow (correlated subquery)
SELECT
user_id,
(SELECT COUNT(*) FROM orders WHERE user_id = users.user_id) AS order_count
FROM users
Solution: Use a JOIN and GROUP BY:
-- Fast (JOIN with GROUP BY)
SELECT
u.user_id,
COUNT(o.order_id) AS order_count
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id
Monitoring and Observability
Trino Metrics and Dashboards
Trino exposes JMX metrics. Export them with Prometheus:
# prometheus.yml
scrape_configs:
- job_name: 'trino'
static_configs:
- targets: ['trino-coordinator:8081']
Key metrics to monitor:
trino.execution.task.max_concurrency: Number of concurrent tasks. Should match your worker count × cores per worker.trino.querymanager.query.queued_queries: Queries waiting to run. If >10, your cluster is under-resourced.trino.memory.reserved_pool_bytes: Memory reserved for queries. If approachingmax_memory, increase worker heap.trino.execution.task.input_bytes: Data scanned per query. High values indicate missing partitioning or inefficient queries.
Superset Query Logs
Enable detailed query logging in Superset:
LOGGING = {
"version": 1,
"formatters": {
"standard": {
"format": "[%(asctime)s] %(levelname)s in %(module)s: %(message)s"
}
},
"handlers": {
"file": {
"level": "DEBUG",
"class": "logging.FileHandler",
"filename": "/var/log/superset/query.log",
"formatter": "standard"
}
},
"loggers": {
"superset.sql_lab": {
"handlers": ["file"],
"level": "DEBUG"
}
}
}
Parse these logs to identify slow queries:
grep "duration" /var/log/superset/query.log | awk '{print $NF}' | sort -rn | head -20
Query Performance Profiling
For slow queries, use Trino’s EXPLAIN command:
EXPLAIN (FORMAT JSON) SELECT ...
This shows the execution plan. Look for:
- Broadcast joins: Data is replicated to all workers. Fine for small tables, bad for large ones.
- Repartition: Data is shuffled across the network. Expensive but sometimes necessary.
- Scan operations: Check if partition pruning is working. If scanning 100% of partitions, your filter isn’t being pushed down.
Operational Habits for Production
1. Establish a Baseline
Before tuning, measure current performance:
# Run a representative set of dashboard queries
for i in {1..10}; do
time trino --execute "SELECT ..." > /dev/null
done
Record the median and p95 times. After tuning, compare.
2. Tune One Component at a Time
Don’t change Trino worker count, Superset cache timeout, and Redis memory simultaneously. You won’t know what helped.
- Tune Trino first (coordinator/worker memory, queue config).
- Optimise queries (partitioning, pre-aggregation).
- Configure caching (Redis, TTL).
3. Use Feature Flags for Dashboard Changes
When switching from raw tables to pre-aggregated tables, use a feature flag:
if feature_flag_enabled('use_aggregated_tables'):
dataset = 'events_hourly_summary'
else:
dataset = 'events'
This lets you A/B test performance improvements with real users before rolling out.
4. Monitor Cost
Trino clusters are expensive. Track:
- CPU and memory utilisation: If <40% average, downsize. If >80%, upsize.
- Query distribution: Are 80% of queries hitting the same few dashboards? Cache them aggressively.
- Data scanned per query: Use Trino metrics to identify queries scanning terabytes unnecessarily.
We’ve seen teams reduce Trino costs by 30–50% through better partitioning and pre-aggregation without sacrificing performance.
5. Automate Maintenance
Schedule regular tasks:
# Daily: Vacuum and analyse Postgres (Superset metadata)
0 2 * * * /usr/bin/vacuumdb -U superset -d superset -v
# Weekly: Rebuild Iceberg table manifests
0 3 * * 0 trino --execute "ALTER TABLE events EXECUTE optimize"
# Weekly: Clear old query cache
0 4 * * 0 redis-cli -n 0 FLUSHDB
6. Set Up Alerting
Alert on:
- Query latency >5 seconds: Indicates a slow query or resource constraint.
- Queued queries >20: Cluster is under-resourced.
- Redis memory >80%: Cache is evicting too aggressively.
- Trino coordinator CPU >85%: Coordinator is bottlenecked.
Real-World Benchmarks
Here’s what we’ve seen in production across our clients at Platform Development in Australia and Platform Development in United States:
Benchmark 1: Daily Revenue Dashboard
Data: 500M order rows, queried by date, region, and product.
Before Tuning:
- Query time: 12 seconds
- Dashboard load: 8 seconds (with 3 charts)
- Redis hit rate: 0% (caching disabled)
After Tuning:
- Partitioned orders table by date (90-day retention).
- Created
orders_daily_summarypre-aggregated by region and product. - Enabled Redis caching with 1-hour TTL.
- Optimised Trino worker count from 4 to 8.
Results:
- Query time: 2.1 seconds (first run), 80ms (cached).
- Dashboard load: 1.2 seconds.
- Redis hit rate: 87%.
- Cost reduction: 40% (fewer workers needed due to faster queries).
Benchmark 2: Real-Time Operations Dashboard
Data: 100M event rows, updated every minute, queried by event type and timestamp.
Before Tuning:
- Query time: 18 seconds
- Memory usage: 64 GB per worker
- Queries timing out: 5–10 per day
After Tuning:
- Enabled Iceberg partitioning by day and hour.
- Moved slow connectors (Postgres) to a separate Trino cluster.
- Configured spill for large aggregations.
- Increased coordinator memory from 8 GB to 16 GB.
Results:
- Query time: 3.2 seconds.
- Memory usage: 48 GB per worker (25% reduction).
- Timeouts: 0.
- Throughput: 50 concurrent queries (was 20).
Benchmark 3: Ad-Hoc Analytics Platform
Data: 2B row fact table, 100+ dimension tables, mixed query patterns.
Before Tuning:
- P95 query time: 45 seconds
- Users complained about slow exploration
- Cache hit rate: 12%
After Tuning:
- Enforced native SQL queries (disabled UI-generated queries).
- Created 5 pre-aggregated summary tables for common analyses.
- Tuned Superset connection pool to 100 connections.
- Enabled aggressive caching (6-hour TTL for summaries).
Results:
- P95 query time: 8 seconds.
- Cache hit rate: 68%.
- User satisfaction: 4.2/5 (was 2.1/5).
Next Steps and Getting Help
Performance tuning is iterative. Start with the highest-impact changes:
- Partition your largest tables by date. This alone typically cuts query time by 40–60%.
- Create pre-aggregated summary tables for your most-accessed dashboards.
- Enable and tune Redis caching with appropriate TTLs.
- Right-size your Trino cluster based on concurrency and data volume.
If you’re building a production analytics platform or modernising an existing one, the tuning patterns in this guide apply across all deployments—whether you’re in Platform Development in New York, Platform Development in San Francisco, or Platform Development in Canada.
For more on Superset and Trino integration, the Apache Superset and Trino blog post from Starburst covers real-world deployment patterns, and Preset’s guide to lightning-fast Superset dashboards provides additional caching and optimisation strategies.
If you’re running into performance issues or planning a Superset + Trino deployment, PADISO’s Platform Design & Engineering team has tuned these stacks for teams across Platform Development in Seattle, Platform Development in Austin, and Platform Development in Canberra. We’ve seen the patterns that work and the gotchas that don’t. Check out our case studies to see how we’ve helped similar teams, or book a call to discuss your specific setup.
The goal is simple: get your dashboards loading in under 2 seconds, your queries running in under 5 seconds, and your infrastructure costs down by 30–50%. With the right tuning, Superset + Trino can deliver all three.