Guide 26 mins

Apache Superset for Insurance Claims: A Reference Dashboard Set

Pre-built Superset dashboards for insurance claims: data model, KPIs, drilldown patterns, and schemas that scale. Reference architecture inside.

The PADISO Team ·2026-06-10

Why Apache Superset for Insurance Claims
The Core Data Model for Claims
Reference Dashboard Architecture
Dashboard 1: Claims Intake & Triage
Dashboard 2: Claims Processing Performance
Dashboard 3: Reserves & Financial Impact
Dashboard 4: Claims Fraud Detection & Risk
Dashboard 5: Claimant Experience & Outcomes
Schema Patterns That Scale
Implementation Roadmap
Security, Audit & Compliance
Next Steps

Why Apache Superset for Insurance Claims

Insurance claims operations generate hundreds of thousands of data points daily: intake timestamps, claimant details, coverage limits, damage assessments, repair vendor quotes, settlement amounts, and resolution timelines. Most claims teams rely on ad-hoc spreadsheets, legacy BI tools locked behind per-seat licensing, or fragmented Tableau instances that no one maintains.

Apache Superset offers a different path. It’s open-source, embeddable, and built for modern data stacks. Unlike traditional BI platforms, Superset works seamlessly with cloud data warehouses (Snowflake, BigQuery, Redshift) and doesn’t require expensive seat licenses. For insurance claims teams, this means:

Real-time claims visibility across intake, processing, and settlement without waiting for nightly batch refreshes.
Embedded analytics that live inside your claims management system, so adjusters and managers see data in context, not in a separate tool.
Scalable schema design that survives 10x growth without collapsing under query latency or storage bloat.
Audit-ready dashboards that track SLA compliance, fraud signals, and reserve adequacy for regulators.

This guide provides a pre-built reference architecture: five production-ready dashboards, the underlying data model, and the schema patterns that have survived scale in live insurance operations.

The Core Data Model for Claims

Before building dashboards, you need a clean, normalised data model. Most insurance claims data lives in relational tables (PostgreSQL, Oracle) or semi-structured event logs (Kafka, S3). The reference model here assumes you’ve already loaded claims data into a cloud data warehouse (Snowflake, BigQuery, or ClickHouse) and are now building analytics on top.

The Star Schema Foundation

The most robust architecture for claims analytics is a star schema: a central fact table (claims events) surrounded by dimension tables (claimants, policies, adjusters, vendors, coverage types). This design keeps queries fast and dashboards responsive, even at scale.

Core Fact Table: fact_claims_events

Each row represents a state change in a claim’s lifecycle:

claim_id (PK)
event_timestamp
event_type (intake, assignment, inspection, quote_received, settlement_approved, closed)
claimant_id (FK)
policy_id (FK)
adjuster_id (FK)
vendor_id (FK)
coverage_type_id (FK)
reserve_amount_usd
incurred_amount_usd
settlement_amount_usd
days_to_event
fraud_score
fraud_flag
data_loaded_at

This structure allows you to:

Track how long claims spend in each state (intake → assignment → inspection → settlement).
Correlate fraud signals with claimant history and vendor patterns.
Aggregate reserves and incurred amounts at any level (adjuster, region, policy line).
Build SLA dashboards (e.g., ”% of claims assigned within 24 hours”).

Dimension Tables

dim_claimants: Claimant demographics, claim history, fraud flags, geographic region.
dim_policies: Policy number, coverage type, limit, deductible, underwriting date, policy holder.
dim_adjusters: Adjuster name, team, region, experience level, caseload.
dim_vendors: Repair shop, medical provider, appraiser; quality ratings, historical settlement variance.
dim_coverage: Line of business (auto, property, workers’ comp), coverage code, limit rules.
dim_dates: Standard date dimension (day, week, month, quarter, year, fiscal period).

This normalised design ensures that when you drill into a dashboard, you’re not duplicating data or creating ambiguity. A single claimant record stays in one place; the fact table references it by ID.

Incremental Load Strategy

Insurance claims data arrives in waves:

Intake events (new claims, usually within minutes of notification).
Processing events (assignments, inspections, quotes, within hours to days).
Settlement events (approvals, closures, within weeks to months).
Reopens and adjustments (amendments, appeals, can happen years later).

Your data pipeline should capture all of these as events. Use a timestamp-based incremental load (e.g., “load all events where event_timestamp > last_load_time”) rather than a full refresh. This keeps your warehouse fresh without re-processing historical data.

For claims that span years (especially workers’ comp and property claims), consider a slowly changing dimension strategy for dim_claimants and dim_adjusters. Track effective dates so that if a claimant’s fraud flag changes or an adjuster changes teams, you can still report historical accuracy.

Reference Dashboard Architecture

The five dashboards below follow a hierarchy:

Intake & Triage (operations view): Are claims flowing in? Are they being assigned promptly?
Processing Performance (management view): How fast are we closing claims? Where are bottlenecks?
Reserves & Financial Impact (finance view): What’s our reserve adequacy? Are we over/under-reserving?
Fraud Detection & Risk (compliance view): Which claims are high-risk? Which vendors/claimants show patterns?
Claimant Experience & Outcomes (customer view): How long do claimants wait? What’s our settlement ratio?

Each dashboard is designed to:

Load in under 3 seconds (via pre-aggregated tables or ClickHouse, not raw scans).
Drill down without leaving the dashboard (click a region, see that region’s claims; click a vendor, see all claims assigned to that vendor).
Export to CSV or PDF for board reports, regulatory filings, and audits.
Embed in your claims system via Superset’s REST API, so adjusters see data without leaving their workflow.

For organisations pursuing SOC 2 or ISO 27001 compliance, these dashboards also serve as audit trails: you can log who viewed what, when, and what filters they applied. Superset integrates with Vanta to automate compliance evidence collection.

Dashboard 1: Claims Intake & Triage

Purpose: Operations teams need to see claims flowing in, understand intake velocity, and spot bottlenecks in early assignment.

Key Metrics:

New claims today/this week (count, trend vs. last week).
Average time to assignment (from intake to adjuster assignment).
Claims by coverage type (auto, property, workers’ comp, etc.).
Claims by triage category (routine, urgent, complex, fraud-suspected).
Assignment queue depth (unassigned claims waiting for an adjuster).

Charts & Drilldowns:

Intake Velocity (Time Series)
- X-axis: Date (hourly or daily granularity).
- Y-axis: Count of new claims.
- Overlay: 7-day and 30-day moving average.
- Drill: Click a date range to see claims by coverage type or region.
Time to Assignment (Histogram)
- X-axis: Hours from intake to assignment (0–24, 24–48, 48–72, etc.).
- Y-axis: Count of claims.
- Target line: SLA (e.g., 24 hours).
- Drill: Click a bucket to see individual claims and why they delayed.
Intake by Coverage Type (Stacked Bar)
- X-axis: Week or month.
- Y-axis: Count of claims.
- Stack: Auto, Property, Workers’ Comp, Other.
- Drill: Click a segment to see that coverage type’s intake detail (region, claimant type, etc.).
Assignment Queue (Big Number + Gauge)
- Display: Current count of unassigned claims.
- Gauge: % of target (e.g., “15 unassigned / 50 target = 30%”).
- Trend: Change from yesterday.
- Drill: Click to see queue by adjuster capacity (how many can each adjuster take?).
Triage Breakdown (Donut)
- Segments: Routine, Urgent, Complex, Fraud-Suspected.
- Drill: Click a segment to see those claims in a detail table (claimant, coverage, assigned adjuster).

Filters (top of dashboard):

Date range (intake date).
Coverage type (multi-select).
Region (multi-select).
Triage category (multi-select).
Assigned/Unassigned toggle.

Underlying Query Pattern:

SELECT
  DATE(event_timestamp) AS intake_date,
  coverage_type,
  triage_category,
  COUNT(DISTINCT claim_id) AS new_claims,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY days_to_assignment) AS median_days_to_assignment,
  COUNT(CASE WHEN days_to_assignment <= 1 THEN 1 END) / COUNT(*) AS pct_assigned_within_24h
FROM fact_claims_events
WHERE event_type = 'intake'
  AND event_timestamp >= DATE_TRUNC('month', NOW())
GROUP BY 1, 2, 3
ORDER BY intake_date DESC;

This query is fast because it aggregates at the intake event level only, not scanning the entire fact table.

Dashboard 2: Claims Processing Performance

Purpose: Management and finance teams need to understand claims cycle time, resolution rates, and where bottlenecks occur.

Key Metrics:

Average days to close (from intake to settlement).
% claims closed within SLA (e.g., 30, 60, 90 days).
Claims by processing stage (intake, assignment, inspection, quote, settlement, closed).
Settlement rate (% of claims that reach settlement vs. denial/withdrawal).
Reopens (claims reopened after closure, % of closed claims).

Charts & Drilldowns:

Claims Cycle Time (Funnel)
- Stages: Intake → Assignment → Inspection → Quote Received → Settlement Approved → Closed.
- Width of each stage: Count of claims in that stage.
- Attrition: Show drop-off at each stage (e.g., “1000 intakes → 950 assignments → 920 inspections”).
- Drill: Click a stage to see claims stuck there (why aren’t they moving forward?).
Days to Close (Box Plot or Violin Plot)
- X-axis: Coverage type or region.
- Y-axis: Distribution of days to close (median, quartiles, outliers).
- Target line: SLA (e.g., 45 days).
- Drill: Click a box to see individual claims and their timeline.
SLA Compliance (Stacked Bar)
- X-axis: Month.
- Y-axis: Count of claims.
- Stack: ≤30 days (green), 31–60 days (yellow), 61–90 days (orange), >90 days (red).
- Trend: % on-time month-over-month.
- Drill: Click a month to see claims by region or adjuster.
Settlement Rate (KPI + Trend)
- Big number: % of claims that settled (vs. denied, withdrawn, litigated).
- Trend: Change from last month.
- Breakdown: Pie chart of outcomes (Settled, Denied, Withdrawn, Litigated, Pending).
- Drill: Click each outcome to see those claims.
Reopens (Time Series)
- X-axis: Month of closure.
- Y-axis: Count of reopened claims.
- Overlay: % of closed claims that reopened (reopen rate).
- Drill: Click a month to see reopened claims by reason (appeal, new injury, etc.).

Filters:

Date range (closure date or intake date).
Coverage type.
Region.
Adjuster.
Settlement outcome.

Underlying Query Pattern:

WITH claim_timeline AS (
  SELECT
    claim_id,
    MIN(CASE WHEN event_type = 'intake' THEN event_timestamp END) AS intake_date,
    MIN(CASE WHEN event_type = 'assignment' THEN event_timestamp END) AS assignment_date,
    MIN(CASE WHEN event_type = 'inspection' THEN event_timestamp END) AS inspection_date,
    MIN(CASE WHEN event_type = 'settlement_approved' THEN event_timestamp END) AS settlement_date,
    MAX(CASE WHEN event_type = 'closed' THEN event_timestamp END) AS close_date,
    DATEDIFF(day, MIN(CASE WHEN event_type = 'intake' THEN event_timestamp END), MAX(CASE WHEN event_type = 'closed' THEN event_timestamp END)) AS days_to_close,
    coverage_type,
    region
  FROM fact_claims_events
  GROUP BY claim_id, coverage_type, region
)
SELECT
  DATE_TRUNC('month', close_date) AS close_month,
  coverage_type,
  region,
  COUNT(*) AS claims_closed,
  AVG(days_to_close) AS avg_days_to_close,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY days_to_close) AS median_days_to_close,
  COUNT(CASE WHEN days_to_close <= 30 THEN 1 END) / COUNT(*) AS pct_closed_within_30d
FROM claim_timeline
WHERE close_date IS NOT NULL
GROUP BY 1, 2, 3
ORDER BY close_month DESC;

This pre-aggregates by month, coverage, and region, so dashboard filters don’t require full table scans.

Dashboard 3: Reserves & Financial Impact

Purpose: Finance and actuarial teams need to track reserve adequacy, understand settlement patterns, and forecast financial exposure.

Key Metrics:

Total reserves outstanding (sum of all open claims’ reserve amounts).
Incurred but not reported (IBNR) (estimated reserves for claims not yet reported).
Average settlement vs. reserve (settlement amount / reserve amount).
Reserve accuracy (actual settlement / initial reserve; >1 means under-reserved).
Outstanding exposure by coverage type (total reserves by line of business).

Charts & Drilldowns:

Reserves Trend (Time Series)
- X-axis: Month.
- Y-axis: Total outstanding reserves (USD).
- Overlay: Total incurred amount (reserves + settled).
- Overlay: IBNR estimate (actuarial calculation).
- Drill: Click a month to see reserves by coverage type or region.
Reserve Accuracy (Scatter Plot)
- X-axis: Initial reserve (USD).
- Y-axis: Actual settlement (USD).
- Diagonal line: Perfect reserve (settlement = reserve).
- Points above line: Under-reserved claims (settled > reserved).
- Points below line: Over-reserved claims (settled < reserved).
- Drill: Click a point to see that claim’s timeline and notes.
Outstanding Exposure by Coverage (Stacked Bar)
- X-axis: Month.
- Y-axis: Total reserves (USD).
- Stack: Auto, Property, Workers’ Comp, Other.
- Drill: Click a segment to see that coverage type’s claims by severity (large, medium, small).
Settlement Distribution (Histogram)
- X-axis: Settlement amount (USD, binned: $0–$5K, $5K–$10K, $10K–$50K, $50K–$100K, $100K+).
- Y-axis: Count of claims.
- Overlay: Average reserve for each bin (shows reserve accuracy by severity).
- Drill: Click a bin to see those claims.
Reserve Runoff (Waterfall)
- Start: Total reserves at start of period.
- Flows: New claims added, settlements paid, reserve adjustments.
- End: Total reserves at end of period.
- Drill: Click each flow to see detail.

Filters:

Date range (reserve date).
Coverage type.
Region.
Claim severity (by reserve amount).
Status (open, pending, settled).

Underlying Query Pattern:

SELECT
  DATE_TRUNC('month', event_timestamp) AS reserve_month,
  coverage_type,
  region,
  SUM(CASE WHEN event_type IN ('intake', 'assignment', 'inspection') THEN reserve_amount_usd ELSE 0 END) AS total_open_reserves,
  SUM(CASE WHEN event_type = 'settlement_approved' THEN settlement_amount_usd ELSE 0 END) AS total_settled,
  SUM(incurred_amount_usd) AS total_incurred,
  AVG(CASE WHEN event_type = 'closed' THEN settlement_amount_usd / NULLIF(reserve_amount_usd, 0) ELSE NULL END) AS avg_settlement_to_reserve_ratio
FROM fact_claims_events
GROUP BY 1, 2, 3
ORDER BY reserve_month DESC;

For IBNR estimation, you’d typically use an actuarial model (chain-ladder, Bornhuetter-Ferguson) and load results into a separate table; the dashboard then joins it for comparison.

Dashboard 4: Claims Fraud Detection & Risk

Purpose: Compliance and fraud investigation teams need to identify high-risk claims, spot patterns, and prioritise investigations.

Key Metrics:

High-risk claims (count and % of total).
Fraud score distribution (histogram of fraud_score values).
Claimant fraud history (repeat claimants, previous fraud flags).
Vendor risk patterns (vendors with high settlement variance, frequent high-risk claims).
Geographic fraud concentration (regions with elevated fraud rates).

Charts & Drilldowns:

High-Risk Claims Alert (Big Number + Gauge)
- Display: Count of claims with fraud_flag = true or fraud_score > threshold (e.g., >70).
- Gauge: % of total claims.
- Trend: Change from last week (are we catching more or fewer?).
- Drill: Click to see those claims in a detail table (claimant, adjuster, coverage, reason for flag).
Fraud Score Distribution (Histogram)
- X-axis: Fraud score (0–100).
- Y-axis: Count of claims.
- Highlight: Scores >70 (high-risk zone).
- Drill: Click a bucket to see those claims.
Repeat Claimants (Bar Chart)
- X-axis: Claimant (or claimant ID).
- Y-axis: Count of claims filed by that claimant.
- Color: Fraud flag (red if any claim flagged, green if none).
- Drill: Click a claimant to see their full history (all claims, fraud flags, outcomes).
Vendor Risk Analysis (Bubble Chart)
- X-axis: Count of claims assigned to vendor.
- Y-axis: Average settlement / reserve ratio (high ratio = vendor tends to inflate claims).
- Bubble size: Average settlement amount.
- Color: % of claims from that vendor that are flagged as high-risk.
- Drill: Click a vendor to see all their claims.
Geographic Fraud Heatmap
- Map of regions (state or postcode level).
- Color intensity: Fraud rate (% of claims flagged as high-risk).
- Drill: Click a region to see claims in that region.

Filters:

Date range (intake date).
Fraud score threshold.
Coverage type.
Claimant history (first-time, repeat).
Vendor (multi-select).

Underlying Query Pattern:

WITH claimant_history AS (
  SELECT
    claimant_id,
    COUNT(DISTINCT claim_id) AS total_claims_filed,
    COUNT(CASE WHEN fraud_flag = true THEN 1 END) AS fraud_flagged_claims,
    MAX(fraud_score) AS max_fraud_score
  FROM fact_claims_events
  GROUP BY claimant_id
)
SELECT
  fce.claim_id,
  fce.claimant_id,
  fce.coverage_type,
  fce.fraud_score,
  fce.fraud_flag,
  ch.total_claims_filed,
  ch.fraud_flagged_claims,
  CASE WHEN ch.total_claims_filed >= 3 AND ch.fraud_flagged_claims >= 1 THEN 'High-Risk Repeat' 
       WHEN fce.fraud_score > 70 THEN 'High-Risk Single' 
       ELSE 'Standard' END AS risk_category
FROM fact_claims_events fce
JOIN claimant_history ch ON fce.claimant_id = ch.claimant_id
WHERE fce.event_type = 'intake'
ORDER BY fce.fraud_score DESC;

This query enriches each claim with claimant history, enabling pattern detection.

Dashboard 5: Claimant Experience & Outcomes

Purpose: Customer service and executive teams need to understand claimant satisfaction, resolution speed, and fairness of outcomes.

Key Metrics:

Claimant satisfaction score (if available from surveys; otherwise use proxy: settlement amount / claim value).
Average time to first contact (from intake to first adjuster contact).
Time to settlement (from intake to settlement approval).
Appeal rate (% of settled claims that are appealed).
Claimant communication frequency (number of status updates per claim).

Charts & Drilldowns:

Claimant Satisfaction Trend (Line Chart)
- X-axis: Month.
- Y-axis: Average satisfaction score (or proxy metric).
- Target line: Target satisfaction (e.g., 4.0 / 5.0).
- Trend: Is satisfaction improving or declining?
- Drill: Click a month to see satisfaction by coverage type or region.
Time to First Contact (Histogram)
- X-axis: Hours from intake to first adjuster contact (0–4, 4–8, 8–24, 24–48, 48+).
- Y-axis: Count of claims.
- Target line: SLA (e.g., 24 hours).
- Drill: Click a bucket to see those claims and why they delayed.
Time to Settlement by Coverage (Box Plot)
- X-axis: Coverage type (auto, property, workers’ comp, etc.).
- Y-axis: Days from intake to settlement.
- Boxes show: Median, quartiles, outliers.
- Drill: Click a box to see claims in that coverage type.
Appeal Rate (KPI + Trend)
- Big number: % of settled claims that are appealed.
- Trend: Change from last month.
- Breakdown: Pie chart of appeal outcomes (Upheld, Overturned, Pending).
- Drill: Click each outcome to see those appeals.
Communication Frequency (Scatter Plot)
- X-axis: Days to settlement.
- Y-axis: Number of status updates sent to claimant.
- Color: Satisfaction score (if available) or appeal status.
- Drill: Click a point to see that claim’s communication log.

Filters:

Date range (settlement date).
Coverage type.
Region.
Claimant demographics (age, claim value, etc.).
Outcome (settled, denied, appealed).

Underlying Query Pattern:

WITH claim_communication AS (
  SELECT
    claim_id,
    COUNT(*) AS total_events,
    COUNT(CASE WHEN event_type IN ('contact_log', 'status_update') THEN 1 END) AS communication_count,
    MIN(CASE WHEN event_type = 'assignment' THEN event_timestamp END) AS first_contact_date,
    MAX(CASE WHEN event_type = 'settlement_approved' THEN event_timestamp END) AS settlement_date
  FROM fact_claims_events
  GROUP BY claim_id
)
SELECT
  fce.claim_id,
  fce.claimant_id,
  fce.coverage_type,
  fce.region,
  DATEDIFF(hour, MIN(CASE WHEN fce.event_type = 'intake' THEN fce.event_timestamp END), cc.first_contact_date) AS hours_to_first_contact,
  DATEDIFF(day, MIN(CASE WHEN fce.event_type = 'intake' THEN fce.event_timestamp END), cc.settlement_date) AS days_to_settlement,
  cc.communication_count,
  CASE WHEN fce.event_type = 'appeal' THEN 1 ELSE 0 END AS appeal_flag
FROM fact_claims_events fce
JOIN claim_communication cc ON fce.claim_id = cc.claim_id
WHERE fce.event_type IN ('intake', 'settlement_approved', 'appeal')
GROUP BY fce.claim_id, fce.claimant_id, fce.coverage_type, fce.region, cc.first_contact_date, cc.settlement_date, cc.communication_count
ORDER BY fce.claim_id;

This query links intake, contact, and settlement events to compute experience metrics.

Schema Patterns That Scale

Once you’ve built dashboards on a few thousand claims, you’ll hit scale challenges: queries slow down, storage grows, and incremental loads take longer. Here are patterns that survive 10x, 100x, and 1000x growth.

Pattern 1: Event-Based Fact Table with Partitioning

Don’t store claims as rows; store events. Each event is a state change:

CREATE TABLE fact_claims_events (
  claim_id STRING NOT NULL,
  event_id STRING NOT NULL,  -- unique identifier for this event
  event_timestamp TIMESTAMP NOT NULL,
  event_type STRING NOT NULL,  -- intake, assignment, inspection, etc.
  claimant_id STRING NOT NULL,
  policy_id STRING NOT NULL,
  adjuster_id STRING,
  vendor_id STRING,
  coverage_type_id INT,
  region STRING,
  reserve_amount_usd DECIMAL(12, 2),
  incurred_amount_usd DECIMAL(12, 2),
  settlement_amount_usd DECIMAL(12, 2),
  fraud_score INT,
  fraud_flag BOOLEAN,
  data_loaded_at TIMESTAMP NOT NULL
)
PARTITION BY DATE(event_timestamp)
CLUSTER BY claim_id, claimant_id, region
;

Why this works:

Partitioning by date means queries that filter on event_timestamp >= DATE_TRUNC('month', NOW()) only scan recent partitions, not the entire table.
Clustering by claim_id, claimant_id, region means related rows are physically stored together, speeding up joins and aggregations.
Event-based structure means you can replay history (e.g., “show me all claims as they were on 2024-01-15”) without storing snapshots.

At scale (billions of events), this pattern keeps query latency under 3 seconds even without pre-aggregation.

Pattern 2: Incremental Aggregation Tables

For dashboards that need sub-second response, pre-aggregate high-cardinality dimensions:

CREATE TABLE agg_claims_daily (
  event_date DATE NOT NULL,
  coverage_type STRING NOT NULL,
  region STRING NOT NULL,
  adjuster_id STRING,
  new_claims_count INT,
  avg_days_to_assignment DECIMAL(5, 2),
  pct_assigned_within_24h DECIMAL(5, 2),
  total_reserve_usd DECIMAL(15, 2),
  total_incurred_usd DECIMAL(15, 2),
  data_loaded_at TIMESTAMP NOT NULL
)
PARTITION BY event_date
;

Load this table nightly from fact_claims_events using a simple GROUP BY:

INSERT INTO agg_claims_daily
SELECT
  DATE(event_timestamp) AS event_date,
  coverage_type,
  region,
  adjuster_id,
  COUNT(DISTINCT CASE WHEN event_type = 'intake' THEN claim_id END) AS new_claims_count,
  AVG(CASE WHEN event_type = 'assignment' THEN days_to_assignment END) AS avg_days_to_assignment,
  COUNT(CASE WHEN event_type = 'assignment' AND days_to_assignment <= 1 THEN 1 END) / NULLIF(COUNT(CASE WHEN event_type = 'assignment' THEN 1 END), 0) AS pct_assigned_within_24h,
  SUM(CASE WHEN event_type IN ('intake', 'assignment') THEN reserve_amount_usd ELSE 0 END) AS total_reserve_usd,
  SUM(incurred_amount_usd) AS total_incurred_usd,
  NOW() AS data_loaded_at
FROM fact_claims_events
WHERE DATE(event_timestamp) = CURRENT_DATE - 1
GROUP BY 1, 2, 3, 4
;

Dashboard queries then hit agg_claims_daily (much smaller) instead of fact_claims_events. Filters like “coverage_type = ‘auto’ AND region = ‘NSW’” now scan a tiny table and return in milliseconds.

Pattern 3: Slowly Changing Dimensions

Adjusters, claimants, and vendors change attributes over time (adjuster changes team, claimant’s fraud flag updates, vendor’s quality rating changes). Use Type 2 SCD:

CREATE TABLE dim_adjusters (
  adjuster_id STRING NOT NULL,
  adjuster_name STRING,
  team_id STRING,
  team_name STRING,
  region STRING,
  experience_level STRING,
  valid_from DATE NOT NULL,
  valid_to DATE,  -- NULL = currently active
  is_current BOOLEAN
);

When an adjuster changes teams, insert a new row with valid_from = TODAY() and update the old row’s valid_to = YESTERDAY(). This preserves history: you can report “which adjuster handled this claim?” accurately even if they’ve changed teams since.

Pattern 4: Surrogate Keys for Fact Tables

Instead of storing dimension IDs directly in the fact table, use surrogate keys:

CREATE TABLE fact_claims_events (
  claim_event_sk INT,  -- surrogate key
  claim_id STRING,
  event_timestamp TIMESTAMP,
  adjuster_sk INT,  -- FK to dim_adjusters
  vendor_sk INT,    -- FK to dim_vendors
  coverage_type_sk INT,  -- FK to dim_coverage
  ...
);

Surrogate keys are small integers, much faster to join than string IDs. They also decouple the fact table from dimension changes (if a dimension ID format changes, the fact table is unaffected).

Pattern 5: Deduplication & Late-Arriving Facts

Insurance systems often receive duplicate events (a claim intake event arrives twice, 5 minutes apart). Use a deduplication layer:

WITH raw_events AS (
  SELECT * FROM fact_claims_events_staging
),
deduped AS (
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY claim_id, event_type, event_timestamp ORDER BY data_loaded_at DESC) AS rn
  FROM raw_events
)
SELECT * FROM deduped WHERE rn = 1;

For late-arriving facts (e.g., a settlement amount arrives 3 days after the settlement event), use a merge strategy:

MERGE INTO fact_claims_events fce
USING fact_claims_events_staging stg
ON fce.claim_id = stg.claim_id AND fce.event_type = stg.event_type
WHEN MATCHED AND stg.settlement_amount_usd IS NOT NULL THEN
  UPDATE SET settlement_amount_usd = stg.settlement_amount_usd
WHEN NOT MATCHED THEN
  INSERT (claim_id, event_type, event_timestamp, ...) VALUES (stg.claim_id, stg.event_type, stg.event_timestamp, ...);

This ensures your fact table is always correct, even if data arrives out of order.

Implementation Roadmap

Building this from scratch takes 8–12 weeks. Here’s a realistic timeline:

Phase 1: Data Foundation (Weeks 1–3)

Extract claims data from your claims management system (usually a relational database: Oracle, SQL Server, or Salesforce).
Load into cloud warehouse (Snowflake, BigQuery, or ClickHouse). Use a managed ETL tool like Fivetran, dbt, or Stitch.
Build the core fact table (fact_claims_events). Start with a simple schema: claim_id, event_timestamp, event_type, reserve_amount, settlement_amount.
Build dimension tables (claimants, adjusters, policies, vendors). Start minimal: just IDs and names. Enrich later.
Test data quality: Run reconciliation queries to ensure fact table totals match source system.

Deliverable: A clean fact table with 6–12 months of historical claims data.

Phase 2: Superset Setup & Dashboard 1 (Weeks 4–6)

Install Superset (open-source, on Kubernetes or a single VM).
Connect to your warehouse (add a database connection in Superset UI).
Build Dashboard 1: Intake & Triage (3–4 days).
- Create charts for intake velocity, time to assignment, coverage breakdown.
- Add filters (date range, coverage type, region).
- Test with live data.
Embed in your claims system (optional, but high-value):
- Use Superset’s REST API to embed dashboards in iframes.
- Configure row-level security (RLS) so adjusters only see their own claims.
Set up scheduled refreshes (e.g., refresh every 4 hours).

Deliverable: Dashboard 1 live, showing real-time intake data. Adjusters and ops teams can see queue depth without leaving their system.

Phase 3: Dashboards 2–5 (Weeks 7–10)

Dashboard 2: Claims Processing Performance (2–3 days).
- Funnel, cycle time distribution, SLA compliance.
Dashboard 3: Reserves & Financial Impact (2–3 days).
- Reserves trend, settlement accuracy, exposure by coverage.
Dashboard 4: Fraud Detection & Risk (2–3 days).
- High-risk claims, fraud score distribution, repeat claimants, vendor risk.
- Integrate fraud scoring model (if you have one) or use simple rules (e.g., fraud_score = settlement_amount / reserve_amount * 100).
Dashboard 5: Claimant Experience & Outcomes (2–3 days).
- Satisfaction proxy, time to first contact, appeal rate.

Deliverable: Five dashboards, each with 5–6 charts, live and interactive.

Phase 4: Optimisation & Scale (Weeks 11–12)

Monitor query performance. Which dashboards are slow? Use Superset’s query logs to identify bottlenecks.
Build aggregation tables (agg_claims_daily, agg_claims_monthly) for slow dashboards.
Implement caching (Superset has built-in caching; configure TTL based on data freshness SLA).
Set up alerting: Create Superset alerts (e.g., “if queue depth > 100, notify ops team”).
Document dashboards: Add descriptions, filter guidance, and data lineage so users understand what they’re looking at.

Deliverable: Dashboards load in <3 seconds. No manual query tuning needed.

Security, Audit & Compliance

Insurance claims data is sensitive: personally identifiable information (claimants), financial data (reserves, settlements), and potentially fraud investigation details. Superset must be secured.

Authentication & Authorisation

Enable SSO (SAML, OAuth, LDAP) so users log in with their corporate identity.
Implement row-level security (RLS): Adjusters only see their own claims; managers see their team’s claims; finance sees all. Use Superset’s RLS filters:

FILTER: claim_id IN (SELECT claim_id FROM claims WHERE adjuster_id = {{ current_user_id }})

Audit access: Enable Superset’s audit logs. Log who viewed which dashboard, when, and what filters they applied. Export logs to your SIEM (Splunk, ELK) for compliance.

Data Encryption

Encrypt in transit: Use TLS 1.2+ for all connections (Superset to warehouse, browser to Superset).
Encrypt at rest: Enable encryption on your warehouse (Snowflake, BigQuery, ClickHouse all support it).
Mask sensitive fields: In Superset, use column masking to hide full claimant names or phone numbers from non-compliance users. Show only “Claimant ID” or “[REDACTED]”.

Audit Readiness

If you’re pursuing SOC 2 or ISO 27001 compliance via Vanta, Superset integrates well:

Audit logs: Superset logs all dashboard views, edits, and SQL queries. Export to your audit trail.
Access controls: Document who has access to what (dashboard, data, database connection). Superset’s RLS and role-based access control (RBAC) provide the evidence.
Change tracking: Use version control (Git) for your Superset dashboard definitions (JSON). Track who changed what, when.

Compliance with Regulatory Frameworks

For Australian insurers, consider:

APRA CPS 234 (Information Security): Superset dashboards should not expose unencrypted PII. Use masking.
ASIC RG 271 (Breach reporting): Ensure audit logs capture access to sensitive claims data. If there’s a breach, you can show regulators who accessed what.
Privacy Act 1988: Claimant data is personal information. Ensure Superset access is logged and justified.

For guidance on AI and compliance in Australian insurance, see PADISO’s AI for Insurance Sydney service, which covers APRA and LIF compliance.

Next Steps

You now have a reference architecture for insurance claims analytics. Here’s how to move forward:

1. Assess Your Current Data

Do you have claims data in a central warehouse (Snowflake, BigQuery, ClickHouse)?
If not, start with a cloud data warehouse. Snowflake and BigQuery are industry-standard for insurance.
Estimate data volume: How many claims per month? How many years of history do you need?

2. Pilot with Dashboard 1

Build the Intake & Triage dashboard first. It’s the simplest and highest-value for operations teams.
Use this to validate your data quality and Superset setup.
If this works, expand to the other dashboards.

3. Embed in Your Claims System

Don’t build Superset as a separate tool. Embed dashboards inside your claims management system (Salesforce, Guidewire, or custom system).
This keeps adjusters in their workflow and dramatically increases adoption.
Use Superset’s REST API and iframe embedding.

4. Automate Data Pipelines

Set up nightly or hourly incremental loads from your source system to the warehouse.
Use dbt or Airflow to orchestrate: extract → transform → load → refresh Superset cache.
Monitor pipeline health (alert if a load fails).

5. Build Fraud Scoring

Dashboard 4 (Fraud Detection) is most valuable when backed by a fraud scoring model.
Start simple: fraud_score = (settlement_amount / reserve_amount) * 100. Flag scores >120% as high-risk.
Evolve to machine learning: train a model on historical fraud flags to predict high-risk claims.

6. Measure Impact

Track SLA compliance before and after (are claims closing faster?).
Track reserve accuracy (are actual settlements closer to reserves?).
Track fraud detection (are high-risk claims being caught earlier?).
Use these metrics to justify continued investment in analytics.

7. Consider Managed Services

If you lack in-house data engineering or Superset expertise, consider a partner. PADISO’s platform development team has built Superset dashboards for insurance, health, and financial services in Sydney and across Australia. We can:

Design and build your data model (fact and dimension tables).
Build and optimise Superset dashboards.
Implement RLS and audit logging for compliance.
Integrate with your existing claims system.
Provide fractional CTO support to guide your data strategy.

For Australian insurers, PADISO’s AI for Insurance Sydney team also covers claims automation, fraud detection models, and APRA compliance.

Conclusion

Apache Superset is a powerful, open-source platform for insurance claims analytics. With the reference architecture in this guide—five pre-built dashboards, a scalable data model, and schema patterns that survive 10x growth—you can build a claims analytics system in 8–12 weeks.

The key to success is:

Start with a clean fact table (events, not snapshots).
Build dashboards incrementally, validating data quality at each step.
Embed in your claims system so users see data in context.
Optimise for scale using partitioning, aggregation tables, and caching.
Secure and audit to meet APRA, ASIC, and privacy requirements.

If you’re building this in Sydney or Australia and need technical leadership, platform engineering, or compliance guidance, PADISO can help. We’ve shipped Superset analytics for insurance, health, and financial services. We also provide fractional CTO services for scale-ups and PE-backed companies modernising their tech stacks.

Start with Dashboard 1. Ship it in 4 weeks. Measure the impact. Then build the rest.

Want to talk through your situation?

Book a 30-minute call with Kevin (Founder/CEO). No pitch - direct advice on what to do next.

Book a 30-min call

Apache Superset for Insurance Claims: A Reference Dashboard Set

Table of Contents

Why Apache Superset for Insurance Claims

The Core Data Model for Claims

The Star Schema Foundation

Incremental Load Strategy

Reference Dashboard Architecture

Dashboard 1: Claims Intake & Triage

Dashboard 2: Claims Processing Performance

Dashboard 3: Reserves & Financial Impact

Dashboard 4: Claims Fraud Detection & Risk

Dashboard 5: Claimant Experience & Outcomes

Schema Patterns That Scale

Pattern 1: Event-Based Fact Table with Partitioning

Pattern 2: Incremental Aggregation Tables

Pattern 3: Slowly Changing Dimensions

Pattern 4: Surrogate Keys for Fact Tables

Pattern 5: Deduplication & Late-Arriving Facts

Implementation Roadmap

Phase 1: Data Foundation (Weeks 1–3)

Phase 2: Superset Setup & Dashboard 1 (Weeks 4–6)

Phase 3: Dashboards 2–5 (Weeks 7–10)

Phase 4: Optimisation & Scale (Weeks 11–12)

Security, Audit & Compliance

Authentication & Authorisation

Data Encryption

Audit Readiness

Compliance with Regulatory Frameworks

Next Steps

1. Assess Your Current Data

2. Pilot with Dashboard 1

3. Embed in Your Claims System

4. Automate Data Pipelines

5. Build Fraud Scoring

6. Measure Impact

7. Consider Managed Services

Conclusion

Want to talk through your situation?