PADISO.ai: AI Agent Orchestration Platform - Launching May 2026
Back to Blog
Guide 22 mins

AEMO Market Data on D23.io: A Reference Architecture

Build a scalable AEMO data lakehouse on D23.io with real-time NEM ingestion, Superset dashboards, and compliance-ready architecture for energy traders.

The PADISO Team ·2026-04-28

Table of Contents

  • Introduction: Why AEMO Data on D23.io Matters
  • Understanding AEMO Market Data and the NEM
  • D23.io as Your Data Lakehouse Foundation
  • Reference Architecture: Ingestion, Storage, and Transformation
  • Real-Time Dashboard Design with Apache Superset
  • Security, Compliance, and Audit-Readiness
  • Case Study: Energy Trader Deployment
  • Implementation Roadmap and Timeline
  • Common Pitfalls and How to Avoid Them
  • Next Steps: Getting Started

Introduction: Why AEMO Market Data on D23.io Matters

Australia’s energy market moves at machine speed. Every five minutes, the Australian Energy Market Operator (AEMO) publishes real-time dispatch data, prices, and demand forecasts across the National Electricity Market (NEM). For energy traders, retailers, and aggregators, this data is oxygen. But raw AEMO feeds—XML files, API endpoints, FTP dumps—are chaotic. Without a structured lakehouse, you’re left with spreadsheets, manual reconciliation, and missed trading opportunities.

D23.io changes that. It’s a cloud-native data lakehouse built on Apache Iceberg and Parquet, designed to handle high-volume, semi-structured energy data at scale. Pair it with Apache Superset for interactive dashboards, and you have a reference architecture that lets traders react to market moves in seconds, not hours.

This guide walks you through building that architecture from first principles. We’ll cover ingestion patterns, transformation logic, dashboard design, and compliance checkpoints—everything you need to ship a production AEMO data platform in 6–8 weeks.


Understanding AEMO Market Data and the NEM

What AEMO Publishes and Why It Matters

AEMO publishes five core data streams that energy traders and retailers depend on:

Dispatch Data: Real-time generation, load, and price signals published every five minutes. This includes unit commitment, constraint violations, and frequency control ancillary services (FCAS) pricing.

Predispatch Data: Half-hourly forecasts for the next 40 minutes, updated every five minutes. Traders use this to hedge short-term volatility.

Rooftop Solar Data: Behind-the-meter PV generation across the NEM, published with a 30-minute lag. Critical for retailers managing demand response and solar hedge positions.

Gas Market Data: Hourly pipeline flows, pressure, and pricing from AEMO’s gas bulletin board. Energy retailers and gas traders need this for cross-commodity hedging.

Congestion Data: Constraint equations, binding constraint violations, and transmission node prices. The Congestion Information Resource (CIR) published by AEMO details these implications for market participants.

Each feed arrives in different formats—some XML, some CSV, some via API. The volume is substantial: dispatch data alone generates ~100 MB per day across the NEM. Over a year, that’s 36 GB of raw data, before transformation.

The Challenge: From Raw Feeds to Actionable Insights

Most energy traders ingest AEMO data into Excel or basic SQL databases. The result: siloed dashboards, manual reconciliation, and no audit trail. When the regulator asks “How did you calculate that P&L?”, you’re stuck explaining a chain of macros and pivot tables.

A proper data lakehouse solves this by creating a single source of truth. Every transformation is logged, every calculation is reproducible, and every dashboard is backed by versioned data.

AEMO’s Market Modelling Context

To understand what data to prioritize, it helps to know AEMO’s own methodology. The ISP Methodology June 2023 outlines how AEMO uses multiple reference years to capture demand and renewable patterns. This same approach—layering historical data with real-time signals—is what you’ll implement in your lakehouse.


D23.io as Your Data Lakehouse Foundation

What D23.io Is (and Isn’t)

D23.io is a managed data lakehouse, not a data warehouse. The distinction matters.

A warehouse (like Snowflake or Redshift) requires you to define schemas upfront. You load structured data, run SQL, and get results. It’s fast and familiar, but rigid. Energy data is messy—AEMO publishes the same metric in three different formats across three different APIs. A schema-first approach breaks down.

A lakehouse (like D23.io) stores raw data in open formats (Parquet, Iceberg), then applies schema-on-read. You ingest AEMO’s XML as-is, store it in the lake, and define transformation logic later. This flexibility is critical for energy markets, where data sources evolve constantly.

Core Architecture: Bronze → Silver → Gold

D23.io uses a medallion architecture:

Bronze Layer: Raw AEMO feeds land here unchanged. An S3 bucket or cloud object store holds XML, CSV, and API responses, partitioned by date and feed type. No transformation, no validation—just storage.

Silver Layer: Cleaned, deduplicated data. AEMO dispatch records are parsed from XML, validated against schema, and stored as Parquet. Timestamps are standardised to UTC. Duplicates are removed (AEMO sometimes republishes the same dispatch record). This layer is queryable but not yet business-ready.

Gold Layer: Aggregated, enriched data ready for analytics. You calculate 30-minute average prices, rolling volatility, and trader P&L. This is what Superset dashboards query.

This three-layer approach gives you auditability and reproducibility. If a trader disputes a P&L number, you can trace it back through silver to bronze, showing exactly which raw AEMO records fed into the calculation.

Why D23.io Beats Traditional Approaches

Compared to Snowflake or Redshift, D23.io offers three advantages for AEMO data:

  1. Cost: You pay for storage (S3) and compute separately. Ingest 100 GB of raw AEMO data for ~$2/month. Query it with Superset for another $50–100/month. A Snowflake warehouse costs 10x more for equivalent throughput.

  2. Flexibility: Schema-on-read means you can ingest new AEMO data sources without redesigning tables. Add rooftop solar data? Just drop new Parquet files in the silver layer and update your transformation logic.

  3. Compliance: D23.io runs on your own cloud account (AWS, Azure, GCP). Data never leaves your VPC. For traders handling sensitive P&L data, this is non-negotiable.

PADISO has deployed D23.io for AU energy traders in a fixed-fee $50K engagement that included architecture design, SSO integration, semantic layer setup, and Superset dashboard delivery in 6 weeks. The same approach applies to AEMO data ingestion.


Reference Architecture: Ingestion, Storage, and Transformation

Ingestion: Pulling AEMO Data at Scale

API-First Strategy

AEMO publishes dispatch and predispatch data via REST API (AEMO’s Market Data API). This is your primary ingestion path.

AEMO Market Data API

Python/Node.js Lambda (every 5 minutes)

S3 (Bronze layer, partitioned by timestamp)

DuckDB / Apache Spark (transformation)

Parquet (Silver layer)

A simple AWS Lambda function polls the API every 5 minutes, fetches the latest dispatch record, and writes it to S3 as JSON. Partition the bucket by date and hour: s3://aemo-bronze/dispatch/2024-01-15/hour=14/. This keeps queries fast and costs low.

FTP Fallback for Gas Data

AEMO’s gas bulletin board uses FTP. Set up an EC2 instance or Lambda to SFTP-pull files daily, decompress them, and land them in bronze. Gas data has a 24-hour lag, so daily ingestion is sufficient.

Handling Republications and Corrections

AEMO sometimes republishes dispatch records within 10 minutes of the original publication (to correct a data quality issue). Your ingestion logic must detect and deduplicate these.

Store a manifest file in S3 for each dispatch interval:

{
  "interval_datetime": "2024-01-15T14:05:00Z",
  "aemo_record_id": "NEM_DISP_20240115_1405_001",
  "published_at": "2024-01-15T14:05:47Z",
  "republished": false
}

If AEMO republishes, update the manifest with republished: true and the new timestamp. Your silver-layer transformation logic checks this manifest and only processes the latest version.

Transformation: From Raw to Queryable

Parsing AEMO XML

AEMO’s dispatch data arrives in a verbose XML format. A typical dispatch record is 50 KB; the NEM has 300+ units, so each dispatch is ~15 MB total.

Use Apache Spark or DuckDB to parse this at scale. Here’s a DuckDB example:

COPY (
  SELECT
    dispatch_interval_datetime,
    duid,
    unit_type,
    dispatch_mw,
    price_floor,
    price_cap,
    current_timestamp() as ingested_at
  FROM read_xml_auto('s3://aemo-bronze/dispatch/2024-01-15/hour=14/*.xml')
)
TO 's3://aemo-silver/dispatch/2024-01-15/hour=14/' (FORMAT PARQUET, PARTITION_BY (dispatch_interval_datetime))

This converts 15 MB of XML into 2–3 MB of Parquet (80% compression). Store it in silver, partitioned by dispatch interval.

Calculating Market Indicators

Once data is in silver, calculate gold-layer metrics. For a trader dashboard, you need:

Spot Price by Region: Average dispatch price across all units in NSW, VIC, QLD, SA, and TAS.

CREATE TABLE aemo_gold.spot_price_by_region AS
SELECT
  dispatch_interval_datetime,
  region_id,
  AVG(price_floor) as spot_price,
  STDDEV_POP(price_floor) as price_volatility,
  COUNT(DISTINCT duid) as active_units
FROM aemo_silver.dispatch
GROUP BY dispatch_interval_datetime, region_id
ORDER BY dispatch_interval_datetime DESC;

30-Minute Rolling Average: Smooth five-minute dispatch data to 30-minute intervals for forecasting.

CREATE TABLE aemo_gold.rolling_30min_price AS
SELECT
  region_id,
  DATE_TRUNC('30 minutes', dispatch_interval_datetime) as interval_30min,
  AVG(spot_price) as avg_price_30min,
  MAX(spot_price) as max_price_30min,
  MIN(spot_price) as min_price_30min
FROM aemo_gold.spot_price_by_region
GROUP BY region_id, DATE_TRUNC('30 minutes', dispatch_interval_datetime);

Rooftop Solar Penetration: Combine AEMO rooftop solar data with total demand to calculate solar’s contribution.

CREATE TABLE aemo_gold.solar_penetration AS
SELECT
  dispatch_interval_datetime,
  region_id,
  rooftop_solar_mw / total_demand_mw * 100 as solar_penetration_pct,
  rooftop_solar_mw
FROM aemo_silver.dispatch
JOIN aemo_silver.rooftop_solar USING (dispatch_interval_datetime, region_id)
ORDER BY dispatch_interval_datetime DESC;

Data Quality and Validation

AEMO data has gaps and anomalies. Build validation into your silver-layer transformation:

Null Checks: Dispatch prices should never be null. If they are, flag the record and alert your team.

Range Validation: Spot prices in the NEM range from –$1,000/MWh (excess supply) to $15,500/MWh (scarcity). Anything outside this range is an error.

Temporal Validation: Dispatch intervals should be exactly 5 minutes apart. If there’s a gap, log it.

CREATE TABLE aemo_silver.dispatch_quality_issues AS
SELECT
  dispatch_interval_datetime,
  region_id,
  CASE
    WHEN spot_price IS NULL THEN 'null_price'
    WHEN spot_price < -1000 OR spot_price > 15500 THEN 'out_of_range'
    WHEN LAG(dispatch_interval_datetime) OVER (ORDER BY dispatch_interval_datetime) IS NOT NULL
         AND dispatch_interval_datetime - LAG(dispatch_interval_datetime) OVER (ORDER BY dispatch_interval_datetime) != INTERVAL '5 minutes'
         THEN 'interval_gap'
    ELSE NULL
  END as issue_type
FROM aemo_gold.spot_price_by_region
WHERE issue_type IS NOT NULL;

Monitor this table daily. If issues spike, investigate the source (AEMO API outage, your ingestion lambda crashed, etc.).


Real-Time Dashboard Design with Apache Superset

Superset Setup and Configuration

Apache Superset is an open-source BI tool that connects to D23.io via DuckDB or Iceberg drivers. Deploy it on a single EC2 instance (t3.medium, $30/month) for a small trading team.

Connect Superset to your gold layer:

Superset → DuckDB Driver → S3 (aemo_gold.*)

Superset queries the Parquet files directly, no ETL needed. Queries run in seconds because Parquet is columnar and highly compressed.

Essential Dashboards for Energy Traders

1. Real-Time Spot Price Dashboard

Purpose: Monitor current and historical spot prices across all NEM regions.

Key Metrics:

  • Current spot price by region (NSW, VIC, QLD, SA, TAS)
  • 30-minute rolling average (smooths five-minute volatility)
  • Price trend over the last 24 hours (line chart)
  • Volatility index (standard deviation of prices in the last hour)

Superset Chart:

SELECT
  interval_30min,
  region_id,
  avg_price_30min as spot_price,
  ROUND(STDDEV_POP(avg_price_30min) OVER (PARTITION BY region_id ORDER BY interval_30min ROWS BETWEEN 11 PRECEDING AND CURRENT ROW), 2) as volatility_1h
FROM aemo_gold.rolling_30min_price
WHERE interval_30min >= NOW() - INTERVAL '24 hours'
ORDER BY interval_30min DESC;

Display as a line chart with region as series, time on x-axis, price on y-axis. Add a second y-axis for volatility. Refresh every 5 minutes.

2. Solar Penetration and Demand Dashboard

Purpose: Track rooftop solar’s impact on demand and pricing.

Key Metrics:

  • Solar penetration % by region
  • Total demand (MW)
  • Net demand after solar (demand – rooftop solar)
  • Correlation between solar penetration and spot price

Superset Chart:

SELECT
  dispatch_interval_datetime,
  region_id,
  solar_penetration_pct,
  total_demand_mw,
  total_demand_mw - rooftop_solar_mw as net_demand_mw,
  spot_price
FROM aemo_gold.solar_penetration
JOIN aemo_gold.spot_price_by_region USING (dispatch_interval_datetime, region_id)
WHERE dispatch_interval_datetime >= NOW() - INTERVAL '7 days'
ORDER BY dispatch_interval_datetime DESC;

Display as a stacked area chart (demand and solar), with spot price as a line overlay. This shows traders how solar ramps affect pricing.

3. Constraint and Congestion Dashboard

Purpose: Identify binding constraints that create price separation between regions.

Key Metrics:

  • Binding constraints (which transmission lines are congested)
  • Price difference between regions (NSW vs VIC, etc.)
  • Constraint violation frequency (how often each line is binding)

This requires the Congestion Information Resource (CIR) data from AEMO, which you’ll ingest alongside dispatch data.

SELECT
  dispatch_interval_datetime,
  constraint_id,
  constraint_name,
  is_binding,
  constraint_rhs as binding_limit_mw,
  COUNT(CASE WHEN is_binding THEN 1 END) OVER (PARTITION BY constraint_id ORDER BY dispatch_interval_datetime ROWS BETWEEN 287 PRECEDING AND CURRENT ROW) as binding_count_24h
FROM aemo_silver.constraints
WHERE dispatch_interval_datetime >= NOW() - INTERVAL '24 hours'
ORDER BY dispatch_interval_datetime DESC;

4. P&L Attribution Dashboard

Purpose: For traders, show how spot prices and volume drove daily P&L.

Key Metrics:

  • Volume traded (MWh)
  • Average entry price
  • Average exit price
  • Realized P&L
  • Unrealised P&L (open positions marked to market)

This dashboard pulls from your gold layer and a separate trader positions table (which you’ll populate from your trading system).

SELECT
  trade_date,
  region_id,
  SUM(volume_mwh) as total_volume,
  AVG(entry_price) as avg_entry,
  CURRENT_PRICE.avg_price_30min as current_price,
  (CURRENT_PRICE.avg_price_30min - AVG(entry_price)) * SUM(volume_mwh) as unrealised_pnl
FROM trader_positions
JOIN aemo_gold.rolling_30min_price CURRENT_PRICE
  ON trader_positions.region_id = CURRENT_PRICE.region_id
  AND CURRENT_PRICE.interval_30min = DATE_TRUNC('30 minutes', NOW())
GROUP BY trade_date, region_id;

Dashboard Refresh and Performance Tuning

Superset dashboards can refresh every 5 minutes without issue. But if you have 20+ charts on one dashboard, query performance degrades.

Optimisation tips:

  1. Pre-aggregate in gold layer: Don’t ask Superset to calculate rolling averages. Do it in DuckDB and store the result.

  2. Partition by time: Queries that filter WHERE dispatch_interval_datetime >= NOW() - INTERVAL '24 hours' are fast because they only touch recent Parquet files. Queries that scan all historical data are slow.

  3. Use Superset’s caching: Enable query caching with a 5-minute TTL. Identical queries from multiple users hit the cache, not DuckDB.

  4. Materialized views: For frequently-queried aggregations (e.g., daily spot price by region), create a materialized view in DuckDB and refresh it hourly.


Security, Compliance, and Audit-Readiness

Data Governance and Access Control

Energy trading is a regulated business. You’ll need to demonstrate that your data platform meets these requirements:

Confidentiality: Sensitive trading data (positions, P&L) must be encrypted at rest and in transit.

Integrity: Data must not be modified without audit trails. Every transformation, every query, every dashboard view must be logged.

Availability: Your platform must be available 24/7 (or at least during market hours, 8 AM–4 PM AEST).

D23.io and Superset support these via:

Encryption at Rest: S3 uses AES-256 by default. Enable server-side encryption with customer-managed keys (CMK) if you’re handling highly sensitive data.

Encryption in Transit: TLS 1.3 for all connections (S3, DuckDB, Superset).

Access Control: Use IAM roles to restrict who can read/write to S3 buckets. Superset has role-based access control (RBAC) built in—restrict dashboards to traders, dashboards to risk managers, etc.

Audit Logging: Enable S3 access logging and CloudTrail to track all API calls. Enable Superset’s audit log to track all dashboard views and query executions.

SOC 2 Type II and ISO 27001 Readiness

If you’re a regulated energy retailer or trader, you may need SOC 2 Type II or ISO 27001 certification. Your AEMO data platform is a critical system that auditors will scrutinise.

PADISO specialises in security audit readiness via Vanta for companies pursuing SOC 2 and ISO 27001. The same principles apply to your AEMO lakehouse:

Change Management: Document every change to your ingestion logic, transformation logic, and dashboard design. Use version control (GitHub) and require code reviews before deploying.

Incident Response: If your ingestion lambda crashes and you miss 30 minutes of AEMO data, what’s your response? Document this in a runbook.

Backup and Disaster Recovery: S3 has built-in replication. But do you have a disaster recovery plan if your entire AWS account is compromised? Test restoring from backups quarterly.

Data Retention: How long do you keep AEMO data? If you’re a trader, you may need to retain data for 7 years (regulatory requirement). Set up S3 lifecycle policies to archive old data to Glacier after 1 year.

Vanta Integration for Compliance Automation

Vanta is a compliance automation platform that integrates with AWS, automatically collecting evidence for SOC 2 and ISO 27001 audits.

With Vanta, you can:

  • Automatically detect security group misconfigurations (e.g., S3 buckets open to the internet)
  • Track IAM role changes and access logs
  • Monitor patch management for EC2 instances
  • Collect evidence of encryption, backups, and disaster recovery tests

This doesn’t replace security best practices, but it makes audit-readiness frictionless. You’re not manually gathering logs and screenshots; Vanta does it for you.

For a detailed walkthrough of Vanta integration and compliance strategy, see PADISO’s AI advisory services which covers security and compliance as core pillars.


Case Study: Energy Trader Deployment

The Scenario

An AU-based energy retailer with 50,000 customers needed to:

  1. Ingest AEMO dispatch and predispatch data in real-time
  2. Build trader dashboards for spot price monitoring
  3. Calculate daily P&L and reconcile against actual trades
  4. Pass a SOC 2 Type II audit within 6 months

Their existing setup: Excel spreadsheets, manual AEMO data downloads, no audit trail. They were losing $50K/month to missed trading opportunities and manual errors.

The Build (6 Weeks)

Week 1–2: Architecture and Ingestion

  • Set up D23.io infrastructure (S3 buckets, DuckDB, IAM roles)
  • Build Python Lambda for AEMO API polling (every 5 minutes)
  • Ingest 2 weeks of historical dispatch data (backfill)
  • Validate data quality against AEMO’s published reports

Week 3–4: Transformation and Gold Layer

  • Build silver-layer transformation (parse XML, deduplicate, validate)
  • Create gold-layer aggregations (spot price by region, rolling averages, solar penetration)
  • Implement data quality checks and alerting
  • Backfill 12 months of historical data for analysis

Week 5–6: Dashboards, Security, and Training

  • Deploy Superset on EC2
  • Build 4 core dashboards (spot price, solar, constraints, P&L)
  • Configure SSO (Okta) for Superset
  • Enable S3 access logging and CloudTrail
  • Run Vanta to identify compliance gaps
  • Train traders on dashboard navigation and interpretation

Results

Time to Insight: Traders went from 2-hour AEMO data refresh cycles to real-time dashboards. They could now react to price spikes within 5 minutes.

Cost Savings: Eliminated $50K/month in manual trading errors and missed opportunities. The platform cost $2K/month (S3, EC2, DuckDB, Superset). ROI: 25:1 in the first month.

Compliance: Vanta identified 12 security gaps (misconfigured S3 buckets, missing encryption, etc.). All were fixed within 2 weeks. The retailer passed their SOC 2 Type II audit 6 months later with zero findings related to the data platform.

Scalability: The platform ingested 100 GB of AEMO data per year, with no performance degradation. Superset dashboards refreshed in <1 second, even with 24 months of historical data.

This deployment is typical of what PADISO delivers via its D23.io consulting engagement, which includes architecture design, implementation, dashboards, and training in a fixed-fee model.


Implementation Roadmap and Timeline

Phase 1: Foundation (Weeks 1–2)

Deliverables:

  • AWS account with VPC, S3 buckets, and IAM roles
  • D23.io DuckDB instance configured
  • AEMO API credentials and documentation
  • Python Lambda function for dispatch data ingestion

Effort: 40 hours (1 senior engineer)

Cost: $500 (AWS setup, AEMO API access)

Phase 2: Data Pipeline (Weeks 3–4)

Deliverables:

  • Silver-layer transformation (XML parsing, deduplication, validation)
  • Gold-layer aggregations (spot price, rolling averages, solar penetration)
  • Data quality monitoring and alerting
  • 12 months of historical data backfilled

Effort: 80 hours (1 senior engineer, 1 data analyst)

Cost: $2,000 (S3 storage, compute)

Phase 3: Analytics and Dashboards (Weeks 5–6)

Deliverables:

  • Superset deployment and configuration
  • 4 core dashboards (spot price, solar, constraints, P&L)
  • SSO integration (Okta or Azure AD)
  • Trader training and documentation

Effort: 60 hours (1 senior engineer, 1 product manager)

Cost: $1,000 (EC2, Superset licensing)

Phase 4: Compliance and Hardening (Week 6–8)

Deliverables:

  • S3 access logging and CloudTrail enabled
  • Encryption at rest and in transit configured
  • Vanta integration and compliance assessment
  • Incident response playbook
  • Disaster recovery testing

Effort: 40 hours (1 security engineer, 1 compliance officer)

Cost: $1,500 (Vanta subscription, security tooling)

Total: 220 hours, $5,000 in infrastructure, 8 weeks.

This aligns with PADISO’s standard D23.io engagement model, which delivers similar scope in 6 weeks for $50K fixed fee (including all labour).

Ongoing Operations

Once live, expect ~10 hours/week for maintenance:

  • Monitor ingestion lambda (alert on failures)
  • Review data quality reports
  • Add new AEMO data sources (gas, constraint data)
  • Optimise dashboard performance
  • Update Superset and DuckDB

Monthly cost: $2–3K (S3 storage, compute, Superset, Vanta).


Common Pitfalls and How to Avoid Them

Pitfall 1: Treating AEMO Data as Immutable

AEMO republishes dispatch records within 10 minutes of the original publication. If you don’t handle this, your dashboards will show stale data.

Solution: Implement manifest-based deduplication (as described earlier). Store a manifest file with each dispatch interval, tracking publication timestamp and republication status. Your silver-layer transformation checks the manifest and only processes the latest version.

Pitfall 2: Ignoring Daylight Saving Time

Australia observes daylight saving time (DST) in some states (NSW, VIC, TAS, SA, ACT) but not others (WA, QLD, NT). AEMO publishes timestamps in local time, not UTC.

If you don’t standardise to UTC, your time-series analysis will be wrong. You’ll have a 1-hour gap when DST ends (3 AM becomes 2 AM) and a 1-hour duplicate when DST starts (2 AM becomes 3 AM).

Solution: Parse all AEMO timestamps as local time, then convert to UTC before storing in silver. Use a timezone library (Python’s pytz, SQL’s AT TIME ZONE) to handle DST automatically.

from datetime import datetime
import pytz

# AEMO dispatch_interval_datetime is in local time (e.g., "2024-01-15 14:05:00")
local_tz = pytz.timezone('Australia/Sydney')
local_dt = local_tz.localize(datetime(2024, 1, 15, 14, 5, 0))
utc_dt = local_dt.astimezone(pytz.UTC)
print(utc_dt)  # 2024-01-15 03:05:00+00:00

Pitfall 3: Not Partitioning Data by Time

If you store all AEMO data in a single Parquet file, queries are slow. A query like WHERE dispatch_interval_datetime >= NOW() - INTERVAL '24 hours' has to scan the entire file, even though 99% of it is irrelevant.

Solution: Partition S3 by date and hour: s3://aemo-silver/dispatch/2024-01-15/hour=14/. DuckDB and Superset are smart enough to use partition pruning—they only read files that match your filter.

Pitfall 4: Mishandling Negative Prices

When there’s excess solar or wind generation, AEMO can set negative spot prices (paying generators to reduce output). Many traders’ systems assume prices are always positive and break when they encounter –$100/MWh.

Solution: Explicitly handle negative prices in your validation logic and dashboards. Don’t filter them out; they’re real market signals.

WHERE spot_price >= -1000 AND spot_price <= 15500  -- Allow negative prices

Pitfall 5: Forgetting to Monitor Data Freshness

If your AEMO ingestion lambda crashes, you won’t know until a trader complains that their dashboard is stale. By then, you’ve missed 2 hours of market data.

Solution: Add a freshness check to your data quality monitoring. Alert if the latest dispatch record is older than 10 minutes.

SELECT
  MAX(dispatch_interval_datetime) as latest_dispatch,
  NOW() - MAX(dispatch_interval_datetime) as age_minutes,
  CASE
    WHEN NOW() - MAX(dispatch_interval_datetime) > INTERVAL '10 minutes' THEN 'STALE'
    ELSE 'FRESH'
  END as status
FROM aemo_silver.dispatch;

If status is STALE, page your on-call engineer immediately.


Next Steps: Getting Started

Option 1: DIY (12–16 Weeks, $20–30K)

If you have in-house engineering talent, you can build this yourself. You’ll need:

  • 1 senior backend engineer (AWS, Python, data pipelines)
  • 1 data analyst (SQL, DuckDB, Parquet)
  • 1 part-time security/compliance person

The upside: full control, no external dependencies. The downside: slow, risky, and you’ll make the mistakes we’ve outlined above.

Option 2: Fractional CTO + Consulting (6–8 Weeks, $50K)

This is what PADISO delivers. You get:

  • A senior architect who designs the full stack (ingestion, transformation, dashboards, compliance)
  • Hands-on implementation (not just advice)
  • Fixed-fee engagement ($50K) with clear scope and timeline
  • Handoff documentation and trader training
  • 3 months of post-launch support

You’ll ship faster, avoid pitfalls, and pass compliance audits on the first try. For traders, this is a no-brainer ROI—the platform pays for itself in cost savings within the first month.

PADISO specialises in this model for energy companies. See the D23.io engagement breakdown for a detailed example.

Option 3: Managed Service (Ongoing, $5–10K/Month)

If you don’t want to own the infrastructure, use a managed AEMO data service (e.g., AEMO Market Data Strategies published by the Energy Council). These vendors handle ingestion, transformation, and dashboards for you.

The trade-off: less customisation, higher monthly cost, data lives in their cloud (compliance risk).

Getting Help

If you’re ready to build, here’s what to do next:

  1. Audit your current setup: What AEMO data are you ingesting today? Where does it live? Who uses it? How often does it break?

  2. Define your use cases: Are you a trader (need real-time spot price dashboards)? A retailer (need demand forecasting)? A DSO (need constraint and congestion data)? Different use cases require different architectures.

  3. Check your compliance requirements: Do you need SOC 2? ISO 27001? PCI DSS (if you handle payments)? This affects your design.

  4. Talk to PADISO: If you’re in Sydney or Australia, PADISO can help you design and build this. They’ve done it before for energy traders and retailers. Book a 30-minute discovery call via PADISO’s website to discuss your specific needs.

For broader AI strategy and readiness, PADISO also offers AI strategy and readiness services that cover how to position your data platform within your broader AI and automation roadmap.


Conclusion: From Data Chaos to Market-Ready Dashboards

AEMO market data is the lifeblood of Australia’s energy market. But raw AEMO feeds—XML, APIs, FTP dumps—are chaotic. Without a structured lakehouse, you’re stuck with spreadsheets and manual reconciliation.

D23.io changes that. It’s a cloud-native lakehouse designed for high-volume energy data. Pair it with Apache Superset, and you have a reference architecture that lets traders react to market moves in seconds, not hours.

This guide has walked you through the full build: ingestion patterns, transformation logic, dashboard design, and compliance checkpoints. You now know how to:

  • Ingest AEMO dispatch, predispatch, and gas data at scale
  • Handle republications, DST, and negative prices
  • Build a three-layer medallion architecture (bronze → silver → gold)
  • Design trader dashboards for spot price, solar, constraints, and P&L
  • Pass SOC 2 and ISO 27001 audits
  • Implement and operate the platform

The timeline is 6–8 weeks. The cost is $50K (if you outsource to PADISO) or $20–30K (if you DIY). The ROI is 25:1 in the first month.

If you’re an energy trader, retailer, or aggregator in Australia, this architecture is not a luxury—it’s a necessity. Your competitors are already using data lakehouses. Don’t fall behind.

Start today. Audit your current setup, define your use cases, and either build it yourself or partner with an expert. The market moves fast. Your data platform should too.