PADISO.ai: AI Agent Orchestration Platform - Launching May 2026
Back to Blog
Guide 25 mins

Apache Superset for Australian Hospitals: A D23.io Reference Deployment

Deploy Apache Superset across Australian hospital groups. Reference architecture for clinical KPIs, ED wait times, bed management, and theatre utilisation.

The PADISO Team ·2026-04-17

Table of Contents

  1. Why Apache Superset for Australian Hospitals
  2. Reference Architecture Overview
  3. Data Ingestion: HL7 v2 and FHIR Patterns
  4. Clinical KPI Dashboards
  5. ED Wait Times and Bed Management
  6. Theatre Utilisation and Surgical Scheduling
  7. Security, Compliance, and Data Governance
  8. Implementation Timeline and Cost
  9. Deployment and Scaling Across Hospital Groups
  10. Next Steps: Getting Started

Why Apache Superset for Australian Hospitals

Australian hospital groups face a critical challenge: scattered clinical data across legacy systems, insufficient visibility into operational metrics, and limited tools for real-time decision-making. Emergency departments overflow, theatre schedules slip, bed occupancy becomes opaque, and clinical teams lack actionable dashboards. The result is delayed patient care, wasted resources, and compliance risk.

Apache Superset addresses this directly. It’s an open-source business intelligence and data visualisation platform that connects to any data warehouse or database—whether that’s your existing health information system, a data lake, or a modern cloud platform. Unlike proprietary BI vendors charging per-user licensing fees, Superset runs on your infrastructure, scales horizontally, and costs a fraction of enterprise alternatives.

For Australian hospital groups—especially those deploying D23.io’s managed Superset offering—the value is concrete: clinical teams see ED wait times in real time, bed managers optimise occupancy across wards, theatre schedulers predict utilisation bottlenecks, and executives track quality and safety KPIs without manual reporting.

Apache Superset Official Documentation provides the technical foundation, whilst Apache Superset Implementation for Healthcare - SolDevelo demonstrates proven patterns for healthcare deployments. Community discussions in the Healthcare Users Discussion - Apache Superset GitHub show Australian hospital teams already running production Superset instances across multiple sites.

This guide walks you through a reference architecture built for Australian hospitals: how to ingest HL7 v2 and FHIR data, build clinical dashboards, monitor ED operations, track theatre utilisation, and maintain compliance with data governance frameworks.


Reference Architecture Overview

A production Superset deployment for a hospital group spans three layers: data ingestion, semantic layer, and presentation.

Layer 1: Data Sources and Ingestion

Most Australian hospitals run multiple systems: an electronic health record (EHR), pathology lab systems, imaging systems, and bed management platforms. These systems rarely talk to each other natively. The first step is centralisation.

Your data ingestion layer sits between source systems and Superset. It collects data via:

  • HL7 v2 feeds from legacy EHR systems (the standard for Australian hospital messaging)
  • FHIR APIs from newer cloud-based health systems
  • Direct database connections to pathology and imaging platforms
  • CSV or parquet exports from bed management and theatre scheduling systems

This data lands in a central data warehouse—typically a PostgreSQL database, Snowflake, or BigQuery instance. The warehouse normalises data into a consistent schema: patient demographics, admissions, encounters, clinical observations, lab results, imaging orders, and theatre bookings.

D23.io’s managed Superset service handles this plumbing. You provide source system credentials and data requirements; D23.io builds the ingestion pipeline, manages schema evolution, and ensures data freshness (typically hourly or real-time for critical metrics).

Layer 2: Semantic Layer and Data Models

Raw warehouse data isn’t actionable. A semantic layer translates it into business logic that clinical and operational teams understand.

In Superset, this means defining datasets and virtual tables. A dataset for “ED Presentations” might combine patient demographics, triage data, and disposition records. A dataset for “Theatre Utilisation” might calculate room occupancy, turnover time, and scheduled versus actual procedure duration.

The semantic layer also includes:

  • Calculated fields: Length of stay, time-to-treatment, theatre utilisation percentage
  • Filters and permissions: Ensuring clinicians see only their ward or department data
  • Aggregations: Pre-computed metrics for performance (e.g., daily ED arrivals by triage category)

Layer 3: Dashboards and User Interfaces

Superset’s front-end renders dashboards—interactive, filterable visualisations that clinical and operational teams use daily.

For a hospital group, typical dashboards include:

  • ED Operations Dashboard: Real-time wait times, triage queue, disposition by hour
  • Bed Management Dashboard: Occupancy by ward, discharge forecast, bed turnover
  • Theatre Utilisation Dashboard: Room occupancy, turnover metrics, scheduled versus actual start times
  • Clinical Quality Dashboard: Infection rates, readmission rates, length of stay by diagnosis
  • Executive Dashboard: Aggregate KPIs across the group, trend analysis, benchmarking

Users interact with dashboards via filters (e.g., “show me ED data for the last 7 days”), drill-downs (e.g., “click on a triage category to see individual presentations”), and exports (e.g., “download this week’s theatre utilisation data”).

For non-technical users, Agentic AI + Apache Superset: Letting Claude Query Your Dashboards shows how to layer AI agents on top of Superset, allowing clinicians to ask natural-language questions like “What was our median ED wait time last week?” without touching the dashboard directly.


Data Ingestion: HL7 v2 and FHIR Patterns

HL7 v2 and FHIR are the lingua franca of healthcare data exchange. Most Australian hospitals emit HL7 v2 messages; newer systems also support FHIR APIs. A robust ingestion pipeline handles both.

HL7 v2 Ingestion

HL7 v2 is a text-based standard for clinical messaging. A typical flow:

  1. EHR System emits HL7 v2 messages (e.g., ADT—admission, discharge, transfer messages) to a message broker or SFTP endpoint.
  2. Ingestion Service (running on your infrastructure or D23.io-managed) listens for messages, parses them, and validates structure.
  3. Transformation converts HL7 segments into relational tables. For example, an ADT message becomes rows in admissions, patients, and encounters tables.
  4. Data Warehouse receives transformed data, applies business logic (e.g., calculate length of stay), and indexes for query performance.
  5. Superset connects to the warehouse and renders dashboards.

Key HL7 v2 segments for hospital operations:

  • PID (Patient Identification): Demographics, MRN, date of birth
  • PV1 (Patient Visit): Admission date, ward, bed, discharge date
  • OBX (Observation/Result): Lab results, vital signs, clinical observations
  • ORC (Order Common): Lab orders, imaging orders, medication orders
  • RXA (Pharmacy Administration): Medication administration records

D23.io’s ingestion service includes HL7 v2 parsers for all common Australian hospital message types. If your EHR emits non-standard variants, D23.io customises the parser (a 1–2 week task).

FHIR API Ingestion

FHIR (Fast Healthcare Interoperability Resources) is a modern REST API standard. Newer Australian hospital systems—especially those running cloud platforms—expose FHIR endpoints.

A FHIR ingestion pattern:

  1. Superset or Ingestion Service calls FHIR API endpoints (e.g., /Patient, /Encounter, /Observation).
  2. Pagination and Incremental Sync ensures you fetch new and updated records without re-downloading the entire dataset.
  3. FHIR-to-Relational Mapping converts FHIR JSON resources into warehouse tables. For example, a FHIR Encounter resource becomes rows in an encounters table with columns for patient ID, admission date, discharge date, and ward.
  4. Incremental Loads run hourly or on-demand, ensuring dashboards reflect current data.

FHIR advantages over HL7 v2:

  • Standards-based: Less custom parsing; more portable across systems.
  • API-first: Easier to integrate with modern cloud platforms.
  • Richer data models: FHIR resources include structured codes, references, and extensions.

For Australian hospitals using Hospitals - Australian Digital Health Agency guidance, FHIR is increasingly the preferred standard for new integrations.

Hybrid Ingestion Strategy

Most hospital groups run a mix: legacy systems emit HL7 v2; newer systems expose FHIR APIs. A production ingestion pipeline handles both:

  • HL7 v2 listener on port 2575 (standard HL7 port) receives messages from EHR.
  • FHIR API client polls cloud-based systems hourly.
  • Unified transformation layer normalises both into a common warehouse schema.
  • Idempotency and deduplication ensure the same patient or encounter isn’t loaded twice.

D23.io’s managed service includes this hybrid approach out of the box. You specify your data sources; D23.io builds and operates the pipeline.


Clinical KPI Dashboards

Clinical KPIs are the lifeblood of hospital operations. A Superset dashboard makes them visible and actionable.

Core Clinical Metrics

Length of Stay (LOS) is a fundamental measure. It reflects efficiency, resource utilisation, and clinical outcomes. A typical LOS dashboard shows:

  • Average LOS by ward: Identify wards with longer stays (e.g., ICU, surgical wards).
  • Average LOS by diagnosis: Benchmark against clinical guidelines (e.g., uncomplicated pneumonia should average 4–5 days).
  • LOS trend: Week-on-week change; identify if recent admissions are staying longer.
  • LOS by discharge disposition: Patients discharged home, transferred, or deceased have different LOS profiles.

Calculation in Superset:

LOS = DATEDIFF(day, admission_date, discharge_date)
Average LOS = AVG(LOS) grouped by ward, diagnosis, or time period

Readmission Rate is a quality and safety metric. A readmission within 28 days of discharge signals potential clinical or discharge-planning issues.

Readmissions = COUNT(DISTINCT patient_id) WHERE readmission_date <= discharge_date + 28 days
Readmission Rate = Readmissions / Total Discharges

A readmission dashboard shows:

  • Readmission rate by ward or service: Identify problem areas.
  • Readmission rate by diagnosis: Some diagnoses naturally have higher readmission rates; others warrant investigation.
  • Readmission rate trend: Monthly trend to detect deterioration.
  • Readmission reasons: If captured in your EHR, drill down to see why patients returned.

Mortality Rate is a critical safety metric. Hospital-standardised mortality rate (HSMR) compares observed deaths to expected deaths adjusted for case mix.

Observed Deaths = COUNT(*) WHERE discharge_disposition = 'Deceased'
Expected Deaths = SUM(risk_of_death_score) for all admissions
HSMR = Observed Deaths / Expected Deaths

HSMR dashboards show:

  • HSMR by ward or service: Identify outliers.
  • HSMR by diagnosis or procedure: Benchmark against national data.
  • HSMR trend: Monthly or quarterly tracking.

Infection Rates (e.g., hospital-acquired infection, HAI) are tracked by many Australian hospitals. A typical metric:

HAI Cases = COUNT(DISTINCT patient_id) WHERE infection_onset_date > admission_date + 2 days
HAI Rate = HAI Cases / Total Admissions * 1000 (per 1000 admissions)

Building the KPI Dashboard in Superset

In Superset, you’d create a dataset combining admissions, discharges, diagnoses, and outcomes:

SELECT
  patient_id,
  admission_date,
  discharge_date,
  ward,
  primary_diagnosis,
  discharge_disposition,
  DATEDIFF(day, admission_date, discharge_date) AS los,
  CASE WHEN discharge_disposition = 'Deceased' THEN 1 ELSE 0 END AS died,
  CASE WHEN readmission_date IS NOT NULL AND readmission_date <= discharge_date + 28 THEN 1 ELSE 0 END AS readmitted
FROM admissions
JOIN patients USING (patient_id)
LEFT JOIN readmissions USING (patient_id, discharge_date)

Then build charts:

  • Table: Average LOS by ward (sortable, filterable).
  • Time Series: LOS trend over 12 months, with a trend line.
  • Pie Chart: Discharge dispositions (home, transfer, deceased).
  • Heatmap: Readmission rate by ward and age group.

For detailed guidance on building these dashboards, The $50K D23.io Consulting Engagement: What’s Inside walks through a real hospital group’s dashboard build, including semantic layer design and performance optimisation.


ED Wait Times and Bed Management

Emergency departments are the busiest, most visible part of a hospital. Wait times are a public metric and a source of patient frustration. Bed management—knowing which beds are available, where patients are located, and when they’ll discharge—is operational gold.

ED Wait Time Metrics

Wait time has several definitions, depending on the point in the patient journey:

  • Triage Wait: Time from arrival to triage assessment.
  • Treatment Wait: Time from triage to first clinical contact (physician or nurse).
  • Disposition Wait: Time from decision to admit/discharge to actual bed assignment or discharge.
  • Total ED Length of Stay: Time from arrival to discharge or admission.

Australian hospitals typically track these via the Hospitals and Health Services - Australian Government Department of Health framework. A Superset dashboard for ED operations might show:

Triage Wait (minutes) = DATEDIFF(minute, arrival_time, triage_time)
Treatment Wait (minutes) = DATEDIFF(minute, triage_time, first_clinical_contact_time)
Total ED LOS (minutes) = DATEDIFF(minute, arrival_time, departure_time)

Real-Time ED Dashboard

A real-time ED dashboard in Superset displays:

  1. Current ED Census: Number of patients in ED right now, broken down by triage category (resuscitation, emergency, urgent, semi-urgent, non-urgent).
  2. Current Wait Times: Median and 90th percentile wait times for each triage category.
  3. Triage Queue: Number of patients waiting for triage assessment.
  4. Bed Availability: Number of treatment spaces available, by type (resus, acute, waiting area).
  5. Disposition Queue: Patients ready for admission but waiting for a bed; patients ready for discharge but waiting for transport.
  6. Ambulance Arrivals (Last Hour): Number of ambulance arrivals in the last hour; helps predict future demand.
  7. Trends: 24-hour view of ED census, showing peak times and trends.

Superset refreshes this data every 5–15 minutes, depending on your EHR’s data export frequency. For truly real-time visibility, Data Analytics in Healthcare - Australian Digital Health Agency recommends direct database connections or event-streaming architectures (e.g., Kafka).

Bed Management and Occupancy

Bed management is about knowing where every patient is, predicting discharges, and optimising occupancy.

Key metrics:

  • Occupancy Rate: (Occupied Beds / Total Beds) * 100. Targets vary by ward: ICU typically runs 80–90%, general wards 85–95%.
  • Bed Turnover Time: Time between discharge and next admission. Shorter is better (faster cleaning, faster next patient).
  • Average Occupancy by Ward: Identify underutilised or overcrowded wards.
  • Discharge Forecast: Predict which patients will discharge in the next 24–48 hours, freeing up beds.
  • Admission Forecast: Predict incoming admissions (e.g., from ED, scheduled surgery), helping bed managers allocate capacity.

A bed management dashboard in Superset:

SELECT
  ward,
  COUNT(DISTINCT bed_id) AS total_beds,
  COUNT(CASE WHEN patient_id IS NOT NULL THEN 1 END) AS occupied_beds,
  ROUND(100.0 * COUNT(CASE WHEN patient_id IS NOT NULL THEN 1 END) / COUNT(DISTINCT bed_id), 1) AS occupancy_pct,
  DATEDIFF(minute, discharge_time, next_admission_time) AS turnover_minutes
FROM beds
LEFT JOIN admissions USING (bed_id)
GROUP BY ward
ORDER BY occupancy_pct DESC

This query shows occupancy by ward, sorted by occupancy percentage. Bed managers can see at a glance which wards are full and which have capacity.

For discharge forecasting, add a calculated field:

Days to Discharge = DATEDIFF(day, TODAY(), expected_discharge_date)

Filter for patients with “Days to Discharge” < 2 to see who’s leaving soon, freeing up beds.


Theatre Utilisation and Surgical Scheduling

Theatres (operating rooms) are high-cost, high-value assets. Optimising their utilisation—ensuring they’re scheduled, staffed, and supplied efficiently—directly impacts hospital revenue and patient access to surgery.

Theatre Utilisation Metrics

Utilisation Percentage is the primary metric:

Scheduled Minutes = SUM(scheduled_duration) for all procedures in a theatre, per day
Actual Minutes = SUM(actual_duration) for all procedures in a theatre, per day
Utilisation % = (Actual Minutes / Scheduled Minutes) * 100

A typical target is 75–85% utilisation. Below 75% suggests under-scheduling or inefficiency; above 85% risks staff burnout and safety issues.

Turnover Time is the interval between end of one procedure and start of the next:

Turnover Time (minutes) = DATEDIFF(minute, end_time_procedure_1, start_time_procedure_2)

Australian theatre best practice targets 20–30 minutes for standard turnover. Longer times suggest cleaning, restocking, or scheduling delays.

On-Time Start Rate tracks how often procedures start on schedule:

On-Time Starts = COUNT(*) WHERE actual_start_time <= scheduled_start_time + 15 minutes
On-Time Start Rate = On-Time Starts / Total Procedures * 100

Targets are typically 80–90%. Delays cascade, affecting subsequent procedures and staff schedules.

Theatre Availability is the percentage of time a theatre is available for scheduling (not blocked for maintenance, deep clean, or emergency cases):

Available Minutes = Total Minutes - Maintenance Minutes - Emergency Block Minutes
Availability % = (Available Minutes / Total Minutes) * 100

Theatre Utilisation Dashboard

A Superset dashboard for theatre managers shows:

  1. Daily Utilisation by Theatre: A table or heatmap showing each theatre’s utilisation percentage for the current week.
  2. Utilisation Trend: A line chart tracking utilisation over 12 weeks, with a target line (e.g., 80%).
  3. Turnover Time Distribution: A histogram showing turnover times; identify theatres with long turnovers.
  4. On-Time Start Rate: A gauge or progress bar showing the percentage of procedures starting on time.
  5. Procedure Duration vs. Scheduled: A scatter plot comparing actual versus scheduled duration; identify procedures that consistently overrun.
  6. Theatre Availability: A table showing each theatre’s availability percentage, highlighting maintenance blocks.
  7. Utilisation by Surgical Specialty: Break down utilisation by specialty (e.g., orthopaedics, cardiothoracic); identify which specialties are over- or under-utilising theatres.

Example SQL for the utilisation table:

SELECT
  theatre_id,
  DATE(procedure_date) AS date,
  COUNT(*) AS num_procedures,
  SUM(DATEDIFF(minute, actual_start_time, actual_end_time)) AS actual_minutes,
  SUM(DATEDIFF(minute, scheduled_start_time, scheduled_end_time)) AS scheduled_minutes,
  ROUND(100.0 * SUM(DATEDIFF(minute, actual_start_time, actual_end_time)) / SUM(DATEDIFF(minute, scheduled_start_time, scheduled_end_time)), 1) AS utilisation_pct,
  ROUND(AVG(DATEDIFF(minute, actual_end_time, next_start_time)), 0) AS avg_turnover_minutes
FROM procedures
GROUP BY theatre_id, DATE(procedure_date)
ORDER BY date DESC, utilisation_pct DESC

This query shows daily utilisation and turnover for each theatre. Theatre managers can spot patterns—e.g., “Theatre 3 always has long turnovers on Tuesdays”—and investigate root causes.

Predictive Analytics for Theatre Scheduling

Beyond dashboards, machine learning can predict procedure duration and optimise scheduling. For example:

  • Duration Prediction: Given a procedure type, patient age, and surgeon, predict how long the procedure will take. Use this to schedule procedures more accurately and reduce overruns.
  • Demand Forecasting: Predict the number of emergency procedures in the next week, helping theatre managers reserve capacity.
  • Staff Scheduling: Predict which procedures will need additional staff (e.g., complex cases), ensuring adequate staffing.

D23.io’s AI & Agents Automation service can build these models on top of your Superset data, delivering predictions directly into your dashboards.


Security, Compliance, and Data Governance

Hospital data is highly sensitive. Patient records, clinical observations, and diagnoses are protected health information (PHI) under Australian privacy law and health sector regulations.

Data Security in Superset

Superset itself doesn’t store patient data—it queries your data warehouse. But Superset must authenticate users, authorise access, and encrypt connections.

Authentication: Superset supports multiple authentication methods:

  • LDAP/Active Directory: Integrate with your hospital’s directory service; users log in with their hospital credentials.
  • SAML/OAuth: Integrate with enterprise SSO providers (e.g., Okta, Azure AD).
  • Local Database: Superset manages user accounts directly (less common for hospitals; requires password management).

D23.io’s managed Superset service includes LDAP or SAML setup, ensuring users authenticate via your existing hospital identity provider.

Row-Level Security (RLS): Superset can restrict which rows of data a user sees. For example:

  • A ward nurse sees only patients on their ward.
  • A clinical director sees all patients in their department.
  • An executive sees aggregate data across the hospital.

RLS is configured via SQL filters in the Superset dataset:

WHERE ward = '{user_ward}' OR user_role = 'Executive'

Column-Level Security: Superset can hide sensitive columns from certain users. For example, hide patient names from non-clinical staff, or hide cost data from clinical users.

Compliance Frameworks

Australian hospitals must comply with:

  • Privacy Act 1988 (Cth): Governs collection, use, and disclosure of personal information, including health information.
  • Australian Consumer Law: Protects consumers’ rights regarding goods and services.
  • State Health Legislation: Each state has health legislation (e.g., Health Services Act in NSW, Health Complaints Act in Victoria).
  • NHMRC Guidelines: The National Health and Medical Research Council publishes guidelines on data governance and research ethics.

For data analytics platforms like Superset, key compliance requirements are:

  • Data Minimisation: Collect and retain only data necessary for the analysis.
  • Purpose Limitation: Use data only for the stated purpose (e.g., operational dashboards, not research without consent).
  • Access Control: Restrict access to authorised users; log all access.
  • Data Retention: Delete data when no longer needed; follow your hospital’s retention policy.
  • Breach Notification: If data is breached, notify affected individuals and regulators within 30 days (Privacy Act).

D23.io’s managed Superset service includes audit logging, access control configuration, and compliance documentation, helping you meet these requirements.

Audit Logging and Compliance Reporting

Superset logs all user actions: who logged in, which dashboards they viewed, what filters they applied, what data they exported. This audit trail is essential for compliance investigations and breach response.

Superset’s audit log includes:

  • User: Who performed the action.
  • Timestamp: When the action occurred.
  • Action: What they did (e.g., “viewed dashboard”, “exported data”, “ran query”).
  • Resource: Which dashboard, chart, or dataset they accessed.
  • Details: Any parameters (e.g., filters applied).

You can export audit logs to your security information and event management (SIEM) system for centralised monitoring.

Data Governance and Metadata

Data governance ensures data quality, consistency, and trustworthiness. In Superset, this means:

  • Data Dictionary: Document every field in every dataset. What does it mean? How is it calculated? Who owns it?
  • Data Lineage: Track where data comes from. For example, “LOS is calculated from admission_date and discharge_date, which come from the EHR admissions table.”
  • Data Quality Checks: Validate data regularly. For example, “discharge_date should be >= admission_date”; flag violations.
  • Change Management: When you update a dataset or calculation, document the change and notify users who depend on it.

Superset includes a metadata panel for each dataset, where you can add descriptions, tags, and owner information. Combine this with a data dictionary (e.g., a shared spreadsheet or wiki) to create a comprehensive data governance framework.


Implementation Timeline and Cost

A typical Apache Superset deployment for a hospital group follows this timeline:

Phase 1: Discovery and Planning (Weeks 1–2)

  • Data Source Audit: Identify all systems that will feed Superset (EHR, bed management, theatre scheduling, etc.).
  • Requirements Gathering: Interview clinical and operational teams to understand what dashboards they need.
  • Architecture Design: Plan the data warehouse schema, ingestion pipeline, and Superset configuration.
  • Cost Estimation: Estimate infrastructure costs, D23.io service fees, and internal resource requirements.

Deliverables: Architecture document, dashboard requirements list, implementation plan.

Phase 2: Infrastructure Setup (Weeks 2–4)

  • Data Warehouse Provisioning: Set up PostgreSQL, Snowflake, or BigQuery instance; configure backups and monitoring.
  • Ingestion Pipeline Development: Build HL7 v2 and FHIR parsers; set up data transformations.
  • Superset Installation: Deploy Superset on your infrastructure (on-premises or cloud); configure authentication (LDAP/SAML).
  • Security Configuration: Set up row-level security, audit logging, and encryption.

Deliverables: Data warehouse schema, ingestion pipeline code, Superset instance, security documentation.

Phase 3: Data Integration and Semantic Layer (Weeks 4–6)

  • Data Source Connections: Connect Superset to the data warehouse; test data freshness.
  • Dataset Definition: Create Superset datasets for ED, bed management, theatre, and clinical KPIs.
  • Calculated Fields: Define LOS, readmission rate, theatre utilisation, and other metrics.
  • Performance Tuning: Index the data warehouse; optimise queries for dashboard performance.

Deliverables: Superset datasets, calculated fields, query performance reports.

Phase 4: Dashboard Development (Weeks 6–8)

  • ED Operations Dashboard: Build real-time ED wait time and census visualisations.
  • Bed Management Dashboard: Build occupancy and discharge forecast visualisations.
  • Theatre Utilisation Dashboard: Build utilisation, turnover, and on-time start visualisations.
  • Clinical KPI Dashboard: Build LOS, readmission, mortality, and infection rate visualisations.
  • Executive Dashboard: Build aggregate KPIs and trend analysis.

Deliverables: Five production dashboards, user acceptance testing (UAT).

Phase 5: Training and Rollout (Weeks 8–10)

  • User Training: Train ED nurses, bed managers, theatre schedulers, and executives on dashboard use.
  • Documentation: Create user guides, FAQs, and troubleshooting documentation.
  • Pilot Rollout: Deploy dashboards to a pilot ward or department; gather feedback.
  • Full Rollout: Deploy dashboards hospital-wide; monitor usage and address issues.

Deliverables: Training materials, user documentation, rollout plan.

Total Timeline: 10 Weeks

This is a typical timeline for a single hospital. For a hospital group with multiple sites, add 2–4 weeks for:

  • Federated identity management (single sign-on across sites).
  • Data integration from multiple EHR instances.
  • Customisation for site-specific workflows.

Cost Breakdown

For a hospital group (e.g., 5 hospitals, 1000+ beds):

  • D23.io Managed Superset Service: $50,000–$100,000 for the 10-week implementation, including architecture, ingestion pipeline, dashboard development, and training. (See The $50K D23.io Consulting Engagement: What’s Inside for a detailed breakdown of a typical engagement.)
  • Infrastructure: $5,000–$15,000 per year for data warehouse hosting (PostgreSQL on AWS, Snowflake, or BigQuery).
  • Superset Hosting: $2,000–$5,000 per year for Superset application server (on-premises or cloud).
  • Internal Resources: 1–2 FTE from your IT team for ongoing maintenance, data quality, and user support.

Total First-Year Cost: $60,000–$130,000 (implementation + infrastructure). Ongoing Annual Cost: $10,000–$20,000 (infrastructure + maintenance).

Compare this to proprietary BI vendors (Tableau, Power BI, Qlik), which charge per-user licensing fees ($50–$200 per user per month). For a hospital group with 500 dashboard users, proprietary BI costs $300,000–$1.2M per year—10x higher.


Deployment and Scaling Across Hospital Groups

A single-hospital Superset deployment is straightforward. Scaling to a hospital group—multiple sites, multiple EHR systems, federated identity—requires careful architecture.

Multi-Site Architecture

For a hospital group with 5–10 sites, consider:

  1. Centralised Data Warehouse: All sites feed data to a single, centralised data warehouse. This enables group-wide dashboards and cross-site benchmarking.
  2. Local Data Feeds: Each site’s EHR connects to the central warehouse via a secure, encrypted connection (VPN or AWS PrivateLink).
  3. Single Superset Instance: One Superset instance serves all users across all sites. Users authenticate via federated identity (LDAP or SAML), and row-level security restricts access by site.
  4. Backup and Disaster Recovery: The central data warehouse is replicated to a secondary region for disaster recovery.

Federated Identity and Access Control

With multiple sites, you need a single identity provider (IdP) that all users authenticate against. Options:

  • Active Directory (AD): If your hospital group runs Windows and AD, integrate Superset with AD via LDAP.
  • Azure AD: If you’re using Microsoft 365, Azure AD is a natural choice. Integrate Superset via SAML or OAuth.
  • Okta: A third-party IdP; works with any organisation; more flexible than AD but requires a separate subscription.

Once users are authenticated, row-level security restricts which data they see. For example:

WHERE site_id IN (SELECT site_id FROM user_site_assignments WHERE user_id = '{current_user_id}')

This ensures a nurse at Site A sees only Site A data, whilst a group executive sees all sites.

Data Integration Challenges

Multiple sites often have different EHR systems or different versions of the same system. Data integration becomes complex:

  • Schema Differences: Site A’s EHR has a ward field; Site B’s has a unit field. The ingestion pipeline must map both to a common ward field in the warehouse.
  • Data Quality Differences: Site A captures admission time; Site B captures only admission date. The warehouse must handle both.
  • Timing Differences: Site A’s EHR updates in real-time; Site B’s updates hourly. The warehouse must reconcile these differences.

D23.io’s data integration team handles these challenges. They audit each site’s EHR, design a unified warehouse schema, and build site-specific ingestion mappings.

Performance and Scalability

As the hospital group grows—more sites, more data, more users—Superset must scale:

  • Query Caching: Superset caches query results; frequently-accessed dashboards load instantly.
  • Data Warehouse Scaling: Modern data warehouses (Snowflake, BigQuery) scale horizontally; add compute as needed.
  • Superset Scaling: Run multiple Superset instances behind a load balancer; each instance serves a subset of users.
  • Query Optimisation: Index the data warehouse; pre-compute aggregations for heavy dashboards.

For a hospital group with 500+ dashboard users and 100+ dashboards, expect:

  • Query Response Time: < 5 seconds for most dashboards (after caching).
  • Concurrent Users: 50–100 simultaneous users without performance degradation.
  • Data Freshness: Hourly or real-time, depending on the metric.

Next Steps: Getting Started

If you’re a hospital group or health service considering Apache Superset, here’s how to begin:

Step 1: Assess Your Current State

  • Data Audit: List all systems that could feed Superset (EHR, bed management, theatre scheduling, pathology, imaging).
  • Stakeholder Interviews: Talk to ED nurses, bed managers, theatre schedulers, and executives. What dashboards would help them?
  • Technical Assessment: Do you have a data warehouse? Can your EHR emit HL7 v2 or FHIR? What’s your current BI platform (if any)?

Step 2: Define Your MVP

Don’t try to build 20 dashboards at once. Start with your highest-impact use case:

  • ED Operations: Real-time wait times and bed availability.
  • Bed Management: Occupancy and discharge forecast.
  • Theatre Utilisation: Daily utilisation and turnover metrics.
  • Clinical KPIs: LOS, readmission, mortality trends.

Pick one; build it well; then expand.

Step 3: Engage a Partner

Apache Superset is open-source and free, but implementing it requires expertise in:

  • Healthcare Data Standards (HL7 v2, FHIR).
  • Data Warehousing (schema design, ETL).
  • Business Intelligence (dashboard design, user experience).
  • Healthcare Compliance (privacy, audit logging, data governance).

PADISO is a Sydney-based venture studio and AI digital agency that partners with ambitious teams. We’ve helped hospital groups across Australia deploy Superset and other data platforms. Our AI Agency Services Sydney offering includes data strategy, platform engineering, and operational support. We can also provide AI Automation Agency Sydney services to layer AI agents on top of your dashboards, allowing clinicians to query data naturally.

For a detailed breakdown of what a typical engagement looks like, see The $50K D23.io Consulting Engagement: What’s Inside. We’ve also published case studies of hospital deployments in AI Agency Case Studies Sydney.

Step 4: Plan Your Rollout

Once you’ve chosen a partner, plan the implementation:

  • Phase 1: Discovery, architecture design, cost estimation (2 weeks).
  • Phase 2: Infrastructure setup, ingestion pipeline, Superset deployment (2–3 weeks).
  • Phase 3: Data integration, semantic layer, performance tuning (2 weeks).
  • Phase 4: Dashboard development, UAT, refinement (2–3 weeks).
  • Phase 5: Training, documentation, rollout (2 weeks).

Total: 10–12 weeks for a single hospital; 14–16 weeks for a hospital group.

Step 5: Sustain and Evolve

After rollout, your work isn’t done. You need:

  • Ongoing Support: Bug fixes, performance tuning, user training.
  • Data Quality: Regular audits; fix data issues as they arise.
  • Governance: Update your data dictionary; manage access as staff change.
  • Evolution: Add new dashboards as new needs emerge; integrate new data sources.

PADISO provides AI Agency Support Sydney and AI Agency Maintenance Sydney to handle this ongoing work. We can also help with AI Agency Project Management Sydney to keep your Superset programme on track.


Conclusion

Apache Superset is a powerful, cost-effective platform for hospital data analytics. Combined with D23.io’s managed service and PADISO’s partnership approach, it delivers concrete outcomes: ED wait times visible in real time, bed managers optimising occupancy, theatre schedulers tracking utilisation, and executives monitoring clinical quality.

The reference architecture outlined in this guide—data ingestion from HL7 v2 and FHIR sources, semantic layer with calculated metrics, dashboards for ED, bed management, theatre, and clinical KPIs—is proven across Australian hospital groups.

If you’re ready to deploy Superset, start with a clear MVP, engage a partner with healthcare expertise, and plan for 10–12 weeks to rollout. The result: actionable data, faster decisions, and better patient care.

Ready to start? Contact PADISO to discuss your hospital’s data strategy. We’re based in Sydney and work with hospital groups across Australia. Let’s talk about how Superset can transform your operations.