Table of Contents
- Why Apache Superset for Logistics Tracking
- Core Data Model for Logistics Operations
- Essential Metrics and KPIs
- Dashboard Architecture and Layout
- Building Drilldown Patterns
- Schema Patterns That Survive Scale
- Performance Optimisation for Real-Time Tracking
- Security and Access Control
- Deployment and Maintenance
- Next Steps: From Dashboard to Operational Intelligence
Why Apache Superset for Logistics Tracking
Logistics operations generate enormous volumes of structured data: shipment events, vehicle locations, delivery attempts, warehouse movements, and customer handoffs. Without visibility into this data, you’re flying blind—missing SLAs, unable to diagnose delays, and losing money to inefficiency.
Apache Superset is purpose-built for this challenge. It connects directly to your operational databases and data warehouses, transforms raw events into interactive dashboards, and gives your team the visibility they need to act. Unlike traditional BI tools that charge per seat and require months of setup, Superset is open-source, embeddable, and deploys in weeks.
For logistics operations at scale, the difference matters. A freight company tracking 10,000 shipments daily needs dashboards that respond in milliseconds, not seconds. A logistics operator managing multiple warehouses needs to drill from regional summaries into individual orders in a single click. A fleet manager needs real-time visibility into vehicle status, delivery windows, and route efficiency.
This guide provides a production-ready reference dashboard set for logistics tracking. You’ll get the data model, the metrics, the SQL patterns, and the schema design decisions that survive growth from 100 shipments per day to 100,000.
Core Data Model for Logistics Operations
The Shipment Fact Table
Every logistics dashboard starts with a single source of truth: the shipment. A shipment is a unit of work—a package, pallet, or container moving from origin to destination. Your fact table should capture:
shipments (fact table)
shipment_id(primary key): Unique identifiershipment_date(timestamp): When the shipment was createdorigin_location_id(foreign key): Where it starteddestination_location_id(foreign key): Where it’s goingcarrier_id(foreign key): Who’s moving itvehicle_id(foreign key, optional): Which vehiclestatus_id(foreign key): Current status (in transit, delivered, delayed, etc.)planned_delivery_date(date): Expected arrivalactual_delivery_date(date, nullable): When it actually arrivedweight_kg(decimal): Physical weightvolume_m3(decimal): Physical volumedeclared_value(decimal): Declared value for insurancerevenue_amount(decimal): Revenue from this shipmentcreated_at(timestamp): Record creation timeupdated_at(timestamp): Last update time
This structure is denormalised intentionally. You could split it further (shipment header, shipment lines), but for a reference dashboard, this single fact table handles 80% of use cases.
Dimension Tables
Dimensions provide context. Build these as slowly-changing dimensions (SCD Type 2 if location names or carrier details change over time):
locations (dimension)
location_id(primary key)location_name(text): Warehouse, depot, or customer namelocation_type(text): warehouse, depot, customer, portcity(text)state_province(text)country(text)latitude(decimal)longitude(decimal)timezone(text): Critical for time-based reportingvalid_from(date)valid_to(date)
carriers (dimension)
carrier_id(primary key)carrier_name(text)carrier_type(text): road, air, rail, seaservice_level(text): express, standard, economycost_per_kg(decimal)cost_per_m3(decimal)sla_hours(integer): Service level agreement in hours
statuses (dimension)
status_id(primary key)status_name(text): pickup_scheduled, in_transit, out_for_delivery, delivered, exceptionstatus_category(text): active, completed, failedsort_order(integer): For ordered visualisations
vehicles (dimension, optional)
vehicle_id(primary key)vehicle_registration(text)vehicle_type(text): van, truck, trailercapacity_kg(decimal)capacity_m3(decimal)carrier_id(foreign key)
Events Table (Optional but Powerful)
For real-time tracking, build a separate events table that captures every state change:
shipment_events (fact table)
event_id(primary key)shipment_id(foreign key)event_timestamp(timestamp): Precise time of eventevent_type(text): pickup_started, in_transit, delivery_attempted, delivered, exceptionlocation_id(foreign key, nullable): Where the event occurredvehicle_id(foreign key, nullable): Which vehiclenotes(text, nullable): Exception detailscreated_at(timestamp)
This table grows quickly (10–100 rows per shipment), but it enables precise tracking and exception detection. If your shipment fact table updates in place, the events table is immutable—append-only. This matters for audit trails and root-cause analysis.
Essential Metrics and KPIs
On-Time Delivery
Definition: Percentage of shipments delivered on or before the planned delivery date.
SELECT
ROUND(100.0 * SUM(CASE WHEN actual_delivery_date <= planned_delivery_date THEN 1 ELSE 0 END) / COUNT(*), 2) AS on_time_pct
FROM shipments
WHERE actual_delivery_date IS NOT NULL
AND shipment_date >= DATE_TRUNC('month', CURRENT_DATE)
This is your primary SLA metric. Track it by carrier, by destination region, and by service level. When it drops below target, you need to act.
Average Delivery Time
Definition: Days from shipment creation to delivery.
SELECT
ROUND(AVG(EXTRACT(DAY FROM (actual_delivery_date - shipment_date))), 2) AS avg_days
FROM shipments
WHERE actual_delivery_date IS NOT NULL
Track this by carrier and route (origin–destination pair). It reveals systematic delays and inefficiencies.
Shipments in Transit
Definition: Count of active shipments (not yet delivered).
SELECT COUNT(*) AS shipments_in_transit
FROM shipments
WHERE actual_delivery_date IS NULL
AND status_id NOT IN (SELECT status_id FROM statuses WHERE status_category = 'failed')
This is a real-time metric. Use it to monitor network load and capacity utilisation.
Revenue per Shipment
Definition: Total revenue divided by shipment count.
SELECT
ROUND(SUM(revenue_amount) / COUNT(*), 2) AS revenue_per_shipment,
ROUND(SUM(revenue_amount), 0) AS total_revenue
FROM shipments
WHERE shipment_date >= DATE_TRUNC('month', CURRENT_DATE)
Break this down by carrier and service level. It shows which routes and services are most profitable.
Utilisation Rate
Definition: Percentage of vehicle capacity used (by weight or volume).
SELECT
v.vehicle_id,
v.vehicle_registration,
ROUND(100.0 * SUM(s.weight_kg) / v.capacity_kg, 2) AS weight_utilisation_pct,
ROUND(100.0 * SUM(s.volume_m3) / v.capacity_m3, 2) AS volume_utilisation_pct
FROM shipments s
JOIN vehicles v ON s.vehicle_id = v.vehicle_id
WHERE s.shipment_date >= DATE_TRUNC('day', CURRENT_DATE)
GROUP BY v.vehicle_id, v.vehicle_registration, v.capacity_kg, v.capacity_m3
Low utilisation signals inefficient routing. High utilisation (>85%) signals capacity risk.
Exception Rate
Definition: Percentage of shipments with exceptions (delays, failed delivery attempts, damage).
SELECT
ROUND(100.0 * SUM(CASE WHEN status_category = 'failed' THEN 1 ELSE 0 END) / COUNT(*), 2) AS exception_rate_pct
FROM shipments s
JOIN statuses st ON s.status_id = st.status_id
WHERE s.shipment_date >= DATE_TRUNC('month', CURRENT_DATE)
Track this by exception type (failed delivery, damaged, lost, returned). Each type requires different action.
Dashboard Architecture and Layout
Dashboard 1: Operations Overview (Real-Time)
This is your command centre. Update every 5 minutes.
Top Row (KPIs)
- Shipments in transit (big number)
- On-time delivery % this month (big number, green/red)
- Average delivery time (big number)
- Active exceptions (big number, red if >0)
Middle Row (Trends)
- Shipments by status (stacked bar, last 7 days)
- On-time % by carrier (line chart, last 30 days)
- Revenue trend (area chart, last 30 days)
Bottom Row (Drill-Down)
- Shipments by destination (map or table, top 20 locations)
- Utilisation by vehicle (table, sortable)
- Exceptions by type (horizontal bar chart)
This dashboard should load in <2 seconds. Use Superset’s native filters to enable filtering by date range, carrier, and location without rebuilding the entire dashboard.
Dashboard 2: Carrier Performance
Compare carriers across multiple dimensions.
Metrics
- On-time delivery % by carrier
- Average delivery time by carrier
- Cost per shipment by carrier
- Exception rate by carrier
- Volume handled by carrier (shipment count and weight)
Breakdown
- Performance by service level (express vs. standard)
- Performance by route (origin–destination pairs)
- Trend over time (month-on-month)
This dashboard drives contract negotiations and carrier selection. Make it sortable and exportable.
Dashboard 3: Route and Regional Analysis
Understand performance by geography.
Structure
- Origin region (dropdown filter)
- Destination region (dropdown filter)
- Metrics: on-time %, average days, exception rate, revenue
- Heatmap: origin × destination matrix showing on-time %
- Time series: trend for selected route
This dashboard reveals bottlenecks. If Sydney-to-Melbourne has 85% on-time but Sydney-to-Perth has 60%, you have a routing problem.
Dashboard 4: Warehouse and Location Health
Track performance at each node in your network.
Metrics per Location
- Shipments processed (inbound and outbound)
- Average dwell time (time spent at location)
- On-time % for shipments originating here
- Exception rate
- Utilisation (inbound vs. capacity)
Visualisation
- Table with all locations, sortable
- Map showing locations colour-coded by on-time %
- Trend for selected location
Building Drilldown Patterns
Pattern 1: Summary to Detail
Start with a high-level metric, drill into the data behind it.
Dashboard Layer 1: Regional on-time % (single number: 87%) Click to Layer 2: On-time % by carrier in that region (5 rows) Click to Layer 3: Shipments from that carrier in that region (list of 50–200 shipments) Click to Layer 4: Details of a single shipment (events timeline, exception notes)
Implement this using Superset’s cross-filter feature. When you click a carrier name in Layer 2, it filters all downstream visualisations in Layer 3.
Pattern 2: Exception Triage
Quickly identify and investigate problems.
Dashboard 1: Exception rate by type (bar chart: 12% delayed, 4% failed delivery, 2% damaged) Click to Dashboard 2: List of delayed shipments (with age, current location, planned vs. actual delivery) Click to Details: Single shipment with full event history
Use colour coding: red for >24 hours late, yellow for 12–24 hours, green for on-track. This guides operator attention.
Pattern 3: Performance Comparison
Compare two entities (carriers, routes, time periods) side by side.
Dashboard: Carrier comparison
- Filter by two carriers (dropdown)
- Show metrics side by side: on-time %, avg days, cost per shipment, exception rate
- Show trend lines for both carriers
- Show shipment count (to weight results by volume)
This pattern requires calculated fields. In Superset, use Custom SQL or Calculated Columns to compute metrics on the fly.
Schema Patterns That Survive Scale
Pattern 1: Fact Table Partitioning
As your shipment table grows to millions of rows, query performance degrades unless you partition strategically.
Partition by shipment_date (month)
Most queries filter on date range. Partitioning by month ensures that a query for “last 7 days” only scans relevant partitions.
-- PostgreSQL example
CREATE TABLE shipments (
shipment_id BIGINT,
shipment_date TIMESTAMP,
...
) PARTITION BY RANGE (shipment_date) (
PARTITION p_2024_01 VALUES FROM ('2024-01-01') TO ('2024-02-01'),
PARTITION p_2024_02 VALUES FROM ('2024-02-01') TO ('2024-03-01'),
...
);
This reduces query time from 30 seconds to 2 seconds on a 100M-row table.
Pattern 2: Denormalised Metrics Columns
Instead of calculating metrics on every query, pre-calculate and store them.
-- Add these columns to shipments table
ALTER TABLE shipments ADD COLUMN (
days_to_delivery INT, -- actual_delivery_date - shipment_date
is_on_time BOOLEAN, -- actual_delivery_date <= planned_delivery_date
is_exception BOOLEAN, -- status_category = 'failed'
hours_late INT -- EXTRACT(HOUR FROM (actual_delivery_date - planned_delivery_date))
);
Populate these via an ETL job after each shipment update. Queries become instant:
SELECT
ROUND(100.0 * SUM(CASE WHEN is_on_time THEN 1 ELSE 0 END) / COUNT(*), 2) AS on_time_pct
FROM shipments
WHERE shipment_date >= DATE_TRUNC('month', CURRENT_DATE);
Pattern 3: Slowly Changing Dimension (SCD Type 2)
When carrier names or location details change, you need historical accuracy.
CREATE TABLE carriers_scd (
carrier_id INT,
carrier_name VARCHAR,
carrier_type VARCHAR,
cost_per_kg DECIMAL,
valid_from DATE,
valid_to DATE,
is_current BOOLEAN
);
-- When carrier details change:
UPDATE carriers_scd SET valid_to = CURRENT_DATE - 1, is_current = FALSE WHERE carrier_id = 5;
INSERT INTO carriers_scd VALUES (5, 'New Name', 'road', 0.50, CURRENT_DATE, '9999-12-31', TRUE);
Join on both carrier_id and valid_from/valid_to to get the correct carrier details for each shipment.
Pattern 4: Aggregate Tables (Pre-Aggregation)
For dashboards querying millions of rows, pre-aggregate to a summary table.
CREATE TABLE shipments_daily_summary AS
SELECT
shipment_date,
carrier_id,
origin_location_id,
destination_location_id,
COUNT(*) AS shipment_count,
ROUND(AVG(days_to_delivery), 2) AS avg_days,
ROUND(100.0 * SUM(CASE WHEN is_on_time THEN 1 ELSE 0 END) / COUNT(*), 2) AS on_time_pct,
SUM(revenue_amount) AS revenue_total,
SUM(CASE WHEN is_exception THEN 1 ELSE 0 END) AS exception_count
FROM shipments
GROUP BY shipment_date, carrier_id, origin_location_id, destination_location_id;
Query this summary table instead of the raw fact table. A query that took 10 seconds now takes 100ms.
Refresh this table nightly via a scheduled ETL job (or hourly for real-time dashboards).
Pattern 5: Columnar Storage
If your database supports it, use columnar storage (Parquet, ORC) for the fact table.
Apache Superset works seamlessly with columnar formats. Queries that scan specific columns run 10–100x faster than row-oriented storage.
Performance Optimisation for Real-Time Tracking
Caching Strategy
Superset caches query results. Configure cache timeouts based on data freshness requirements:
- Real-time metrics (shipments in transit, active exceptions): 5-minute cache
- Daily metrics (on-time %, revenue): 1-hour cache
- Historical trends (30-day on-time %, monthly revenue): 24-hour cache
In Superset, set this per chart via Advanced → Cache Timeout.
Index Strategy
Create indexes on columns used in filters and joins:
CREATE INDEX idx_shipments_date ON shipments(shipment_date);
CREATE INDEX idx_shipments_carrier ON shipments(carrier_id);
CREATE INDEX idx_shipments_location ON shipments(origin_location_id, destination_location_id);
CREATE INDEX idx_shipments_status ON shipments(status_id);
CREATE INDEX idx_shipments_vehicle ON shipments(vehicle_id);
For the events table (which grows quickly):
CREATE INDEX idx_events_shipment_time ON shipment_events(shipment_id, event_timestamp);
CREATE INDEX idx_events_type ON shipment_events(event_type);
Database Connection Pooling
Don’t open a new database connection for each dashboard query. Use connection pooling (built into most databases and Superset).
For PostgreSQL, configure in sqlalchemy_uri:
postgresql://user:password@host:5432/logistics?sslmode=require&pool_size=20&max_overflow=40
This allows up to 20 concurrent connections plus 40 overflow connections. Adjust based on expected concurrent dashboard users.
Query Optimisation
For slow queries, use EXPLAIN ANALYZE to understand the execution plan:
EXPLAIN ANALYZE
SELECT
DATE_TRUNC('day', shipment_date) AS day,
carrier_id,
COUNT(*) AS shipment_count,
ROUND(100.0 * SUM(CASE WHEN is_on_time THEN 1 ELSE 0 END) / COUNT(*), 2) AS on_time_pct
FROM shipments
WHERE shipment_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY DATE_TRUNC('day', shipment_date), carrier_id
ORDER BY day DESC;
Look for sequential scans on large tables. If you see them, ensure indexes exist and are being used.
Security and Access Control
Row-Level Security
Logistics operators often need different views of data. A carrier should only see their own shipments. A regional manager should only see their region.
Implement this via Superset’s row-level security (RLS) feature:
- Create a
userstable mapping users to allowed carriers/regions:
CREATE TABLE user_permissions (
user_id INT,
carrier_id INT,
region_id INT,
created_at TIMESTAMP
);
- In Superset, create a dataset with RLS:
SELECT s.* FROM shipments s
JOIN user_permissions up ON (s.carrier_id = up.carrier_id OR up.carrier_id IS NULL)
WHERE up.user_id = {{ current_user_id() }}
- Superset automatically applies this filter to every query from that user.
Audit Logging
Log all dashboard access and data exports for compliance:
CREATE TABLE superset_audit_log (
log_id BIGINT PRIMARY KEY,
user_id INT,
action VARCHAR, -- 'view_dashboard', 'export_data', 'edit_chart'
resource_id INT,
resource_type VARCHAR, -- 'dashboard', 'chart', 'dataset'
timestamp TIMESTAMP,
ip_address VARCHAR
);
Enable this in Superset’s configuration. It’s essential for SOC 2 compliance.
API Authentication
If you embed Superset dashboards in external applications, use API authentication tokens:
curl -X POST 'https://your-superset.com/api/v1/security/login' \
-H 'Content-Type: application/json' \
-d '{"username": "user", "password": "pass"}'
Store tokens securely (never in client-side code) and rotate them regularly.
Deployment and Maintenance
Local Development Setup
For development, use Docker:
git clone https://github.com/apache/superset.git
cd superset
docker-compose -f docker-compose.yml up
Superset runs on http://localhost:8088. Create a test database connection to your staging database.
Production Deployment
For production, use a managed Superset service or deploy on Kubernetes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: superset
spec:
replicas: 3
selector:
matchLabels:
app: superset
template:
metadata:
labels:
app: superset
spec:
containers:
- name: superset
image: apache/superset:latest
ports:
- containerPort: 8088
env:
- name: SUPERSET_SECRET_KEY
valueFrom:
secretKeyRef:
name: superset-secrets
key: secret-key
- name: SQLALCHEMY_DATABASE_URI
valueFrom:
secretKeyRef:
name: superset-secrets
key: db-uri
Use a managed PostgreSQL instance for Superset’s metadata store (not SQLite). Use Redis for caching and async task queues.
Backup and Disaster Recovery
Regularly export dashboard definitions:
# Export all dashboards as JSON
curl -X GET 'https://your-superset.com/api/v1/dashboard/' \
-H 'Authorization: Bearer YOUR_TOKEN' > dashboards_backup.json
Store this in version control. If Superset is compromised, you can rebuild from the backup.
Monitoring and Alerting
Monitor Superset health and query performance:
-- Slow query log
SELECT
user_id,
query_text,
execution_time_ms,
timestamp
FROM superset_query_log
WHERE execution_time_ms > 5000 -- Queries slower than 5 seconds
ORDER BY timestamp DESC
LIMIT 100;
Set up alerts if:
- Superset is unreachable (HTTP 503)
- Query execution time exceeds 10 seconds (indicates database issues)
- Cache hit rate drops below 70% (indicates insufficient caching)
Next Steps: From Dashboard to Operational Intelligence
A dashboard is a starting point, not an endpoint. Once you have visibility, the next step is action.
Automated Alerts
Instead of waiting for operators to check the dashboard, push alerts to them:
-- Alert if on-time % drops below 80% for any carrier
SELECT
carrier_id,
ROUND(100.0 * SUM(CASE WHEN is_on_time THEN 1 ELSE 0 END) / COUNT(*), 2) AS on_time_pct
FROM shipments
WHERE shipment_date >= CURRENT_DATE - INTERVAL '1 day'
GROUP BY carrier_id
HAVING ROUND(100.0 * SUM(CASE WHEN is_on_time THEN 1 ELSE 0 END) / COUNT(*), 2) < 80;
When this query returns rows, trigger a Slack message or email. Operators respond immediately instead of discovering problems hours later.
Predictive Analytics
Use historical data to predict future delays:
-- Machine learning: predict delivery date based on origin, destination, carrier, weight
SELECT
origin_location_id,
destination_location_id,
carrier_id,
ROUND(AVG(days_to_delivery), 1) AS predicted_days,
ROUND(STDDEV(days_to_delivery), 1) AS std_dev,
COUNT(*) AS sample_size
FROM shipments
WHERE shipment_date >= CURRENT_DATE - INTERVAL '90 days'
AND actual_delivery_date IS NOT NULL
GROUP BY origin_location_id, destination_location_id, carrier_id;
When a new shipment arrives, compare its actual delivery date to the predicted range. If it’s trending late, flag it for intervention.
Optimisation Opportunities
Use dashboards to identify quick wins:
- Carrier consolidation: If Carrier A and Carrier B serve the same routes but A is cheaper and faster, switch volume to A.
- Route optimisation: If Sydney-to-Melbourne is congested, test a detour via Canberra.
- Capacity planning: If utilisation is trending up, invest in additional vehicles before you hit 100%.
- Warehouse placement: If a regional warehouse has high dwell time, consider splitting it into two smaller hubs.
Each of these decisions should be backed by data from your dashboards.
Fractional CTO Support
Building and maintaining production dashboards requires technical expertise. If you’re a logistics operator without an in-house data team, consider engaging fractional CTO support.
For operations in Australia, PADISO offers platform development and fractional CTO advisory across key logistics hubs. In Brisbane, where logistics and resources services are concentrated, platform engineering teams specialise in fleet and telematics data platforms. In Chicago, where trading and logistics are critical, platform engineering focuses on low-latency data platforms and embedded Superset analytics. Dallas teams handle enterprise data consolidation and Superset replacing per-seat BI. For remote operations in Darwin with intermittent connectivity, edge and sovereign AU hosting are available.
A fractional CTO can architect your data model, optimise queries, and ensure your dashboards scale as your business grows. They can also guide SOC 2 compliance (critical if you handle customer data) and integrate agentic AI for exception handling and route optimisation.
Real-Time Tracking and AI Orchestration
Once you have the dashboard foundation, the next step is automating responses to exceptions. Instead of operators manually investigating delays, use AI agents to orchestrate responses:
- Route optimisation agent: When a shipment is trending late, automatically suggest alternative routes or carriers.
- Customer notification agent: When a delivery is delayed, automatically notify the customer with a revised ETA.
- Exception triage agent: When a shipment has an exception, automatically escalate to the right team (carrier, warehouse, customer service).
These agents run on top of your Superset dashboards and operational databases. They see the same data as your operators but act in milliseconds instead of minutes.
Summary
Apache Superset is a powerful, open-source tool for logistics tracking. This guide provides:
- A production-ready data model (shipments, locations, carriers, statuses, events)
- Essential KPIs (on-time %, delivery time, utilisation, exceptions)
- Dashboard architecture (operations overview, carrier performance, route analysis, location health)
- Drilldown patterns (summary to detail, exception triage, performance comparison)
- Schema patterns that scale (partitioning, denormalisation, SCD, pre-aggregation, columnar storage)
- Performance optimisation (caching, indexing, connection pooling, query tuning)
- Security and compliance (row-level security, audit logging, API authentication)
- Deployment guidance (local development, production Kubernetes, backup and recovery)
Start with the operations overview dashboard. Get your team comfortable with the data. Then add carrier performance, route analysis, and location health dashboards. Once you have visibility, layer in alerts, predictions, and AI-driven automation.
The reference dashboard set in this guide handles 10,000 to 100,000 shipments per day. If you scale beyond that, apply the schema patterns (partitioning, pre-aggregation, columnar storage) to maintain sub-second query performance.
For technical support building and scaling these dashboards, PADISO’s platform engineering teams across Australia and the United States specialise in logistics data platforms and embedded analytics. In Atlanta, where payments and logistics are critical, teams focus on real-time fraud and risk pipelines alongside logistics tracking. Calgary teams handle operational and historian data platforms for energy and logistics. For international operations, Hamilton and Tauranga teams support agritech, logistics, and supply-chain data platforms across New Zealand.
Your logistics dashboards are only as good as the data behind them. Invest in the schema, the ETL, and the infrastructure. The returns—faster decisions, happier customers, lower costs—compound quickly.
Start building today. Your operations team will thank you.