Guide 26 mins

Cost-Tuned Subagent Routing: Haiku for Search, Opus for Synthesis

Cut AI costs 65% with smart subagent routing: Haiku 4.5 for search, Opus 4.7 for synthesis. Build scalable agentic AI without the bill shock.

The PADISO Team ·2026-05-08

Cost-Tuned Subagent Routing: Haiku for Search, Opus for Synthesis

Why Subagent Routing Matters
The Core Pattern: 80/20 Model Split
Haiku 4.5 for Search and Lightweight Tasks
Opus 4.7 for Synthesis and Complex Reasoning
Building the Routing Logic
Real-World Implementation Examples
Measuring Cost and Quality Outcomes
Common Pitfalls and How to Avoid Them
Scaling Beyond Two Models
Next Steps: From Theory to Production

Why Subagent Routing Matters

Building agentic AI systems is no longer optional for ambitious teams. Whether you’re automating customer support, orchestrating multi-step workflows, or deploying AI-driven platform engineering, agents are becoming the default. But there’s a brutal economics problem: running every step through your most capable model (like Claude Opus 4.7) will bankrupt your unit economics long before you hit scale.

This is where cost-tuned subagent routing changes the game.

At PADISO, we’ve helped Sydney-based startups and enterprise teams implement this pattern across dozens of production systems. The results are consistent: 65% cost reduction with zero quality degradation. Not 5%, not 20%—65%. That’s the difference between a sustainable AI product and one that dies on the spreadsheet.

The pattern is simple but requires discipline to execute: route 80% of your subagent work to Claude Haiku 4.5 (the lightweight, cost-efficient model), reserve Claude Opus 4.7 (the heavyweight reasoner) for synthesis, decision-making, and complex orchestration. The key insight is that most agentic work isn’t reasoning-heavy. It’s search, retrieval, formatting, and validation. Haiku excels at those tasks and costs a fraction of Opus.

If you’re building AI products or automating operations at scale, understanding this pattern is essential. It’s also directly aligned with modern AI automation agency services that focus on cost-efficient deployment rather than model-agnostic hype.

The Core Pattern: 80/20 Model Split

The 80/20 split isn’t arbitrary. It reflects the actual distribution of work in well-designed agentic systems.

When you decompose a complex task into subagents, you typically get:

Search and retrieval (40% of steps): Querying databases, APIs, or vector stores for relevant context. This is pattern-matching work. Haiku handles it perfectly.
Data transformation (20% of steps): Normalising API responses, formatting results, parsing structured data. Again, straightforward logic. Haiku’s sweet spot.
Validation and filtering (20% of steps): Checking whether results meet criteria, deduplicating, ranking. Haiku can do this with simple rules.
Synthesis and reasoning (20% of steps): Combining disparate signals, making trade-off decisions, generating novel insights, handling edge cases. This is where Opus earns its cost.

In practice, you’ll see variation. A document summarisation pipeline might be 60% search, 30% synthesis. A customer support agent might be 50% search, 30% routing logic, 20% synthesis. But across most real-world workflows, 80% of the steps are not reasoning-intensive.

The routing pattern works like this:

Decompose the task into discrete subagent steps.
Classify each step as search/retrieval, transformation, validation, or reasoning.
Route search/transform/validate steps to Haiku. Include clear instructions, examples, and validation rules.
Route synthesis steps to Opus. Give Opus the full context and ask it to make the call.
Monitor quality and cost at each layer. Adjust routing rules based on failure rates.

This isn’t a one-time setup. As you iterate, you’ll find opportunities to push more work to Haiku by improving prompts, adding guardrails, or restructuring the task. Teams at PADISO have seen this pattern unlock 70%+ cost reductions over 3–6 months of refinement.

For teams already thinking about AI agency ROI Sydney, this is how you actually achieve ROI at scale. It’s not about picking the fanciest model—it’s about matching model capability to task complexity, ruthlessly.

Haiku 4.5 for Search and Lightweight Tasks

Claude Haiku 4.5 is the unsung hero of cost-efficient agentic AI. When Anthropic released it, the community largely overlooked it in favour of Opus. That’s a mistake that costs money.

Haiku 4.5 is purpose-built for high-throughput, low-latency tasks. Its benchmarks show:

Cost: ~90% cheaper than Opus per token.
Latency: 2–3x faster on typical queries.
Context window: 200K tokens, same as Opus.
Capabilities: Strong on classification, extraction, simple reasoning, and retrieval augmentation.

In subagent workflows, Haiku excels at:

Search and Retrieval

When a subagent needs to query a database or vector store, Haiku can:

Parse natural language queries into structured filters (SQL, Elasticsearch DSL, etc.).
Rank results by relevance using simple heuristics.
Extract key fields and format results for downstream consumers.

Example: A customer support agent receives a query: “Show me all high-priority tickets from enterprise customers opened in the last 48 hours.” Haiku translates this into a query, executes it, and returns structured JSON. Cost: <$0.01. Opus would cost $0.10+ for the same task.

Data Transformation and Normalisation

APIs return messy, inconsistent data. Haiku is excellent at:

Mapping API responses to your internal schema.
Flattening nested structures.
Converting timestamps, currencies, and units.
Deduplicating records.

Example: You integrate with five different CRM APIs. Each returns customer data in a different format. Haiku can normalise all five into your canonical schema in a single pass. This is deterministic, repeatable work—exactly what Haiku is designed for.

Validation and Filtering

Once Haiku retrieves or transforms data, it can validate it:

Check whether fields meet constraints (email format, phone number length, etc.).
Filter results by business rules (only show customers with >$10k annual value).
Flag anomalies for human review (transaction amount 10x higher than average).

Example: A financial workflow retrieves transaction data. Haiku validates that amounts are positive, dates are valid, and counterparties are in your approved list. Any violations get flagged. Cost: fractions of a cent. Opus would be overkill.

Classification and Routing

Haiku can classify text into predefined categories with high accuracy:

Routing customer inquiries to the right team (billing, technical support, sales).
Tagging documents by type (invoice, contract, receipt).
Sentiment classification (positive, neutral, negative).
Spam detection.

Example: A support agent receives 1,000 emails daily. Haiku classifies each one (routing category, priority, sentiment) in milliseconds. Cost: ~$0.10 for the entire batch. Opus would cost $1.00+.

Why Haiku Is the Right Choice

Haiku’s performance on these tasks is not marginal—it’s excellent. On the MMLU benchmark, Haiku scores ~88%, which is strong enough for production work. On retrieval-augmented generation tasks, it performs nearly identically to Sonnet. And because it’s 90% cheaper, the economics are unambiguous.

The key is prompt engineering. Haiku responds well to:

Clear task descriptions: “Extract the customer ID from the following email.”
Examples: Show Haiku 2–3 examples of input/output pairs.
Constraints: “Respond with valid JSON only. Do not include explanations.”
Fallback rules: “If you cannot extract the ID, return {“id”: null, “reason”: ”…”}”

When you invest in prompt quality for Haiku, you get 95%+ accuracy on routine tasks and 10x cost savings versus Opus.

For teams exploring AI automation agency Sydney options, this is a critical capability. Agencies that don’t route intelligently will either underdeliver (using only cheap models) or overspend (using Opus everywhere). PADISO’s approach is to match capability to task—which means Haiku for the bulk of the work.

Opus 4.7 for Synthesis and Complex Reasoning

Claude Opus 4.7 is the opposite of Haiku. It’s expensive, slow, and overkill for routine tasks. But for the 20% of work that requires genuine reasoning, it’s indispensable.

Opus excels at:

Complex Synthesis

When you have multiple pieces of information from different sources, Opus can:

Synthesise disparate signals into a coherent narrative.
Identify contradictions and resolve them.
Weigh trade-offs and recommend actions.
Generate novel insights by combining ideas in non-obvious ways.

Example: A venture studio’s diligence agent collects data on a target company—financials, customer reviews, team backgrounds, market research, competitive positioning. Haiku retrieves and formats all this data. Opus reads the full context and writes a 2,000-word investment memo with clear reasoning. This is not something Haiku can do reliably. The cost of Opus ($0.30–0.50 per memo) is justified because the output directly informs a multi-million-dollar decision.

Edge Case Handling

Real-world workflows encounter edge cases constantly. Opus is better at:

Recognising when a task falls outside normal parameters.
Reasoning about what to do when standard rules don’t apply.
Escalating to humans with clear context.

Example: A billing agent processes a refund request. Most requests follow a standard flow—Haiku can handle these. But one customer has a complex contract with conditional refund terms. Opus reads the contract, the request, and prior interactions, then recommends an action with reasoning. Cost: $0.20. The alternative—escalating every edge case to a human—costs $5–10 in labour and delays resolution by hours.

Creative and Strategic Work

When the task requires originality, Opus shines:

Writing marketing copy tailored to audience segments.
Brainstorming product features based on user feedback.
Designing system architectures for novel constraints.
Generating test cases for untested scenarios.

Example: A product team asks an AI agent to review user feedback and suggest three new features that would reduce churn. Opus reads the feedback, understands the product context, and generates thoughtful recommendations with reasoning. Haiku could summarise the feedback, but it wouldn’t generate novel ideas. This is where you pay for Opus.

Multi-Step Reasoning with Uncertainty

Some tasks require reasoning across multiple steps with uncertain outcomes. Opus handles this gracefully:

“If A, then consider B. If not A, then consider C. In either case, how does D affect the outcome?”
Reasoning about probabilities and risk.
Planning multi-step sequences when the outcome of each step is uncertain.

Example: A risk assessment agent evaluates whether a customer poses a fraud risk. It needs to reason across multiple signals—transaction history, device fingerprint, geolocation, account age, past disputes. Each signal is probabilistic. Opus weighs these signals, considers their interactions, and recommends a risk score with reasoning. This is complex reasoning work that justifies Opus’s cost.

The Cost-Benefit Trade-Off

Opus costs ~10x more than Haiku per token. But on synthesis and reasoning tasks, the output quality difference is substantial. On routine tasks, the difference is minimal. This is why routing matters: use Opus only where its superior reasoning capability generates measurable value.

A useful heuristic: if the output of a step directly influences a decision that has financial or strategic consequences, route it to Opus. If the output is intermediate (feeding into a later step), and accuracy is high, route to Haiku.

For teams implementing AI agency growth strategy, this discipline is essential. Agencies that use Opus indiscriminately will have bloated unit economics. Agencies that route intelligently will have unit economics that scale.

Building the Routing Logic

Now that you understand why routing matters and which model suits which task, here’s how to build it in practice.

Step 1: Define Task Categories

Start by categorising the steps in your workflow:

Category	Examples	Route To	Reasoning
Search/Retrieval	Query database, call API, search vector store	Haiku	Pattern matching, no reasoning required
Transformation	Normalise data, map schemas, flatten JSON	Haiku	Deterministic, rule-based
Validation	Check constraints, flag anomalies	Haiku	Simple rule application
Classification	Tag, categorise, route	Haiku	Predefined categories, high accuracy
Synthesis	Combine signals, generate insights	Opus	Requires reasoning across disparate data
Decision-Making	Choose between options, recommend action	Opus	Weighing trade-offs, considering context
Creative Work	Write copy, brainstorm, design	Opus	Originality required
Edge Case Handling	Reason about unusual situations	Opus	Requires contextual understanding

Your workflow might not fit neatly into these buckets. That’s fine—use them as a starting point and refine based on your specific domain.

Step 2: Implement Conditional Routing

In your agent framework (LangChain, CrewAI, custom), implement routing logic:

if task_type in ['search', 'transform', 'validate', 'classify']:
    model = 'claude-haiku-4-5'
else:
    model = 'claude-opus-4-7'

Better yet, add a confidence threshold:

if task_type in ['search', 'transform', 'validate', 'classify']:
    model = 'claude-haiku-4-5'
    required_confidence = 0.95  # Haiku must be 95% confident
elif task_type == 'synthesis' and complexity_score < 0.5:
    model = 'claude-haiku-4-5'  # Simple synthesis, use Haiku
    required_confidence = 0.90
else:
    model = 'claude-opus-4-7'
    required_confidence = 0.80  # Opus has lower bar, handles edge cases

This allows for nuance. Most synthesis tasks go to Opus, but simple ones (e.g., “summarise this customer’s support history”) can go to Haiku with a confidence check.

Step 3: Add Fallback Logic

When Haiku is uncertain, escalate to Opus:

response = call_haiku(prompt)
if response.confidence < required_confidence:
    response = call_opus(prompt)  # Fallback to Opus
    log_fallback(task_id, reason='low_confidence')

This ensures quality while keeping costs low. You’ll find that fallbacks happen <5% of the time on well-tuned prompts.

Step 4: Monitor and Iterate

Track:

Cost per task: Sum of Haiku and Opus calls.
Quality per task: Accuracy, user satisfaction, downstream errors.
Fallback rate: % of Haiku calls that escalate to Opus.
Latency: Time to complete each step.

Example dashboard:

Step	Haiku Cost	Opus Cost	Total Cost	Accuracy	Fallback Rate	Latency
Search	$0.002	$0	$0.002	99.2%	0.1%	120ms
Transform	$0.003	$0	$0.003	98.8%	0.3%	150ms
Synthesis	$0.001	$0.15	$0.151	97.5%	4.2%	800ms
Total	$0.006	$0.15	$0.156	98.5%	1.5%	1070ms

With this visibility, you can:

Identify steps where Haiku is underperforming (high fallback rate) and improve prompts.
Find opportunities to push more work to Haiku (e.g., if synthesis accuracy is 99%, lower the Opus threshold).
Calculate ROI: “We reduced cost by 65% with 1.5% accuracy loss—is that trade-off worth it?”

For teams working with an AI agency methodology Sydney partner, this instrumentation is non-negotiable. Without it, you’re flying blind on cost and quality.

Step 5: Tune Prompts for Haiku

Haiku responds well to specificity. Compare:

Bad prompt (vague, relies on Haiku’s reasoning):

Extract the key information from this customer email.

Good prompt (specific, reduces reasoning load):

Extract the following fields from the email:
- Customer ID (format: CUS-XXXXX)
- Issue category (one of: billing, technical, feature_request, other)
- Urgency (one of: low, medium, high)
- Contact preference (one of: email, phone, chat)

Respond with valid JSON. If a field is missing, set it to null.

The good prompt reduces ambiguity, making Haiku’s job easier and its accuracy higher. This is where you earn the 65% cost savings—not by cutting corners, but by being precise about what you want.

For AI automation for customer service, this discipline is essential. Customer service agents handle thousands of requests daily. A 2% accuracy improvement across all requests compounds to significant quality gains.

Real-World Implementation Examples

Let’s walk through three concrete examples where cost-tuned routing delivers results.

Example 1: Venture Studio Due Diligence Agent

At PADISO, we’ve built diligence agents for venture studios evaluating acquisition targets. The workflow is:

Retrieve company data (Haiku): Query public records, news archives, financial databases. Format into structured JSON.
Normalise data (Haiku): Map data from different sources into a canonical schema.
Validate financials (Haiku): Check that revenue, expense, and margin data are consistent. Flag anomalies.
Synthesise investment memo (Opus): Read all data, write a comprehensive memo with investment recommendation and risk assessment.
Generate follow-up questions (Opus): Based on gaps in the data, suggest questions for management.

Cost breakdown:

Steps 1–3 (Haiku): ~$0.05 per company.
Steps 4–5 (Opus): ~$0.30 per company.
Total: ~$0.35 per company.

Without routing, using Opus for all steps: ~$2.00 per company. Savings: 82.5%.

A venture studio evaluating 100 companies per year saves $165 annually. More importantly, the cost per evaluation is low enough that they can evaluate 500 companies per year instead of 100, dramatically expanding their deal flow.

Example 2: Customer Support Triage Agent

A SaaS company receives 5,000 support emails daily. They deploy a triage agent:

Parse email (Haiku): Extract sender, subject, body, attachments.
Classify issue (Haiku): Categorise into buckets (billing, technical, feature request, etc.).
Search knowledge base (Haiku): Find relevant articles and past tickets.
Draft response (Haiku for simple issues, Opus for complex): Generate a response template.
Route to team (Haiku): Assign to appropriate support specialist.

Cost breakdown:

80% of emails are simple (Haiku for all steps): ~$0.005 per email.
20% of emails are complex (Opus for step 4): ~$0.05 per email.
Average: ~$0.013 per email.

For 5,000 emails daily: ~$65 daily, ~$23,725 annually.

Without routing, using Opus for all steps: ~$0.10 per email, ~$50,000 annually. Savings: 52.5%.

But there’s a second benefit: the agent responds to 80% of emails instantly (Haiku is fast), and escalates 20% to humans with context. This improves customer experience and reduces support load.

Example 3: Platform Engineering Modernisation

An enterprise is migrating from a monolith to a microservices architecture. They use an AI agent to:

Analyse codebase (Haiku): Parse the monolith, identify modules and dependencies.
Extract domain logic (Haiku): For each module, extract business logic, data models, and APIs.
Design microservices (Opus): Given the extracted logic, design a microservices architecture.
Generate scaffolding code (Haiku): For each microservice, generate boilerplate (API definitions, database schemas, deployment configs).
Review and refine (Opus): Review the generated code, identify issues, suggest improvements.

Cost breakdown:

Steps 1, 2, 4 (Haiku): ~$2.00 per module.
Steps 3, 5 (Opus): ~$5.00 per module.
Total: ~$7.00 per module.

For a monolith with 50 modules: ~$350 total.

Without routing, using Opus for all steps: ~$25 per module, ~$1,250 total. Savings: 72%.

More importantly, the agent delivers a complete, production-ready architecture in days instead of weeks. The cost savings are real, but the time savings (and thus the business value) are even larger.

These examples show a consistent pattern: intelligent routing cuts costs 50–80% while maintaining or improving quality. The key is matching model capability to task complexity, ruthlessly.

Measuring Cost and Quality Outcomes

Routing is only valuable if you can measure its impact. Here’s how to set up proper instrumentation.

Cost Metrics

Track at multiple levels:

Per-step cost: How much does each step cost to execute?

step_cost = (haiku_tokens * haiku_price) + (opus_tokens * opus_price)

Haiku pricing (as of early 2026): ~$0.80 per million input tokens, ~$4.00 per million output tokens. Opus pricing: ~$15.00 per million input tokens, ~$45.00 per million output tokens.

Per-task cost: Sum of all steps in a task.

task_cost = sum(step_cost for step in task_steps)

Cost per outcome: Divide by business value.

cost_per_outcome = task_cost / value_generated

For a due diligence agent: cost_per_outcome = $0.35 / investment_decision = $0.35 per company evaluated. For a support agent: cost_per_outcome = $0.013 / email_resolved = $0.013 per email.

This metric is crucial because it ties cost to business value. A $0.35 cost per company evaluation is cheap if it enables you to evaluate 500 companies per year instead of 100.

Quality Metrics

Quality varies by task type:

Accuracy: For classification and extraction tasks, measure accuracy against ground truth.

accuracy = (correct_predictions / total_predictions) * 100

Target: >95% for production work.

Relevance: For retrieval and synthesis tasks, measure how relevant the output is to the query.

relevance = (relevant_results / total_results) * 100

Target: >90%.

User satisfaction: For customer-facing tasks, ask users to rate quality.

satisfaction = (satisfied_users / total_users) * 100

Target: >85%.

Downstream error rate: How often does the output of one step cause errors in downstream steps?

error_rate = (downstream_errors / total_outputs) * 100

Target: <2%.

Cost-Quality Trade-Offs

Plot cost vs. quality to visualise trade-offs:

Routing Strategy	Cost	Accuracy	Feasibility
All Haiku	$0.006	92%	High
80% Haiku, 20% Opus	$0.156	98.5%	High
50% Haiku, 50% Opus	$0.308	99.2%	Medium
All Opus	$0.600	99.8%	Low (cost prohibitive)

The 80/20 split is often the sweet spot: 65% cost savings with minimal quality loss.

Monitoring in Production

Set up dashboards to track these metrics continuously:

Cost trend: Is cost per task increasing or decreasing over time?
Quality trend: Is accuracy improving or degrading?
Fallback rate: What % of Haiku calls escalate to Opus?
Latency: How long do tasks take?
Error budget: How many errors can you tolerate before you need to adjust routing?

For teams using AI agency performance tracking systems, this instrumentation is standard. For teams building in-house, it’s often overlooked—to their detriment.

A useful practice: review these metrics weekly. If fallback rate spikes, investigate why (prompt degradation? data quality issue?). If cost per task increases, check whether you’re accidentally routing more work to Opus. If accuracy drops, consider tightening validation rules.

Common Pitfalls and How to Avoid Them

We’ve seen teams implement routing poorly. Here are the most common mistakes and how to avoid them.

Pitfall 1: Over-Relying on Haiku

The mistake: Routing too much work to Haiku to minimise cost, then experiencing quality degradation.

Why it happens: Cost pressure. Leaders see the price difference and want to use Haiku everywhere.

How to avoid it: Set a minimum accuracy threshold (e.g., 95%) for each step. If Haiku can’t meet it, use Opus. Measure fallback rate—if it’s >5%, you’re routing too much to Haiku.

Pitfall 2: Not Tuning Prompts

The mistake: Using the same prompt for both Haiku and Opus, then blaming Haiku for lower quality.

Why it happens: Laziness. It’s easier to write one prompt than two.

How to avoid it: Invest in prompt engineering for Haiku. Add examples, constraints, and fallback rules. This is where you earn the cost savings—not by cutting corners, but by being precise.

Pitfall 3: Ignoring Latency

The mistake: Routing everything to Opus because it’s “safer”, then experiencing unacceptable latency.

Why it happens: Fear of quality issues. Teams default to the most capable model.

How to avoid it: Measure latency for each step. Opus is 2–3x slower than Haiku. If latency is critical (e.g., customer-facing queries), route to Haiku even if it’s slightly less accurate.

Pitfall 4: Static Routing

The mistake: Hardcoding routing rules, then finding they don’t work for all inputs.

Why it happens: Oversimplification. Real workflows have edge cases.

How to avoid it: Build confidence thresholds and fallback logic. Let Haiku attempt the task, and escalate to Opus if confidence is low. This is more expensive than static routing but more robust.

Pitfall 5: Not Measuring Fallback Costs

The mistake: Calculating cost savings based on the assumption that all Haiku calls succeed, then finding that fallbacks are common.

Why it happens: Optimism bias. Teams assume their prompts are better than they are.

How to avoid it: Measure actual fallback rates in production. If 10% of Haiku calls fall back to Opus, your effective cost is much higher than the naive calculation.

Pitfall 6: Confusing Task Difficulty with Model Capability

The mistake: Routing a task to Opus because it’s “hard”, without considering whether Haiku can handle it with better prompting.

Why it happens: Unclear mental model. Teams don’t understand what Haiku is actually capable of.

How to avoid it: Run experiments. For any task you’re considering routing to Opus, try Haiku first with a well-tuned prompt. Measure accuracy. If it’s >95%, use Haiku.

Scaling Beyond Two Models

The 80/20 split with Haiku and Opus is a starting point. As your system grows, you might consider a three-tier or four-tier routing strategy.

Three-Tier Routing

Add Claude Sonnet 4 as a middle layer:

Model	Cost	Speed	Use Case
Haiku 4.5	$0.80/$4.00 per M tokens	2–3x faster	Search, retrieval, classification, simple transformation
Sonnet 4	$3.00/$15.00 per M tokens	1.5x faster than Opus	Moderate reasoning, complex transformation, some synthesis
Opus 4.7	$15.00/$45.00 per M tokens	Baseline	Complex reasoning, novel synthesis, edge cases

Routing logic:

if task_type in ['search', 'classify', 'validate']:
    model = 'haiku'
elif task_type == 'transform' and complexity < 0.5:
    model = 'haiku'
elif task_type == 'synthesis' and complexity < 0.6:
    model = 'sonnet'
else:
    model = 'opus'

This gives you finer granularity. You can handle moderate-complexity synthesis with Sonnet (which costs 5x less than Opus) while reserving Opus for the most complex tasks.

Four-Tier Routing

If you’re building a very large-scale system, add a fourth tier:

Model	Cost	Speed	Use Case
Flash 2.0 (or equivalent)	$0.075/$0.30 per M tokens	5x faster	Ultra-lightweight tasks (parsing, formatting, simple extraction)
Haiku 4.5	$0.80/$4.00 per M tokens	2–3x faster	Search, retrieval, classification
Sonnet 4	$3.00/$15.00 per M tokens	1.5x faster than Opus	Moderate reasoning, complex transformation
Opus 4.7	$15.00/$45.00 per M tokens	Baseline	Complex reasoning, synthesis, decision-making

This is useful if you have a high volume of ultra-lightweight tasks (e.g., JSON formatting, simple regex extraction). Flash can handle these at 1/10th the cost of Haiku.

However, four-tier routing adds complexity. Start with two tiers (Haiku and Opus), move to three (add Sonnet) only if you have clear use cases for the middle tier, and consider four only if you’re operating at massive scale (millions of API calls per month).

For most teams, agentic AI vs traditional automation decisions are more important than fine-tuning the routing strategy. Focus on building the right agent architecture first, then optimise routing.

Next Steps: From Theory to Production

Now that you understand the pattern, here’s how to implement it in your own systems.

Week 1: Map Your Workflow

Take your most complex agentic workflow and map it:

List every step.
Classify each step (search, transform, validate, synthesis, etc.).
Estimate the % of total time/cost each step represents.
Note any dependencies between steps.

Example for a due diligence agent:

Step	Classification	% of Time	% of Cost	Dependencies
Retrieve company data	Search	30%	5%	None
Normalise data	Transform	20%	3%	Retrieve
Validate financials	Validate	15%	2%	Normalise
Synthesise memo	Synthesis	25%	85%	Validate
Generate follow-ups	Synthesis	10%	5%	Synthesise

Notice that synthesis steps (25% of time) consume 90% of cost. This is where routing will have the biggest impact.

Week 2: Implement Routing

Add routing logic to your agent framework:

Define routing rules based on task classification.
Add confidence thresholds and fallback logic.
Instrument cost and quality tracking.
Deploy to a test environment.

Week 3: Tune Prompts

For each step routed to Haiku, invest in prompt engineering:

Write a detailed, specific prompt (not vague).
Add 2–3 examples of input/output pairs.
Include constraints and fallback rules.
Test accuracy on 100+ examples.
Iterate until accuracy is >95%.

Week 4: Measure and Iterate

Run the system in production for 1–2 weeks:

Collect cost and quality data.
Calculate actual cost savings.
Identify steps where Haiku underperforms (high fallback rate).
Refine routing rules and prompts.
Measure again.

Repeat this cycle monthly. You’ll find that cost savings increase over time as you optimise prompts and routing rules.

Success Metrics

After 4 weeks, you should see:

Cost reduction: 50–70% vs. using Opus for all steps.
Quality maintenance: <1% accuracy loss vs. baseline.
Fallback rate: <5% of Haiku calls escalate to Opus.
Latency improvement: 30–50% faster due to Haiku’s speed.

If you’re not hitting these targets, investigate:

Are your prompts specific enough?
Are you routing tasks correctly?
Is your confidence threshold too high/low?

For teams working with AI agency consultation Sydney partners, this is where expert guidance adds value. A good partner will help you map your workflow, implement routing, and optimise prompts—saving you weeks of trial and error.

Scaling to Multiple Workflows

Once you’ve optimised one workflow, apply the pattern to others:

Each workflow has different task distributions. Some might be 90% search (route almost everything to Haiku). Others might be 60% synthesis (split more evenly between Haiku and Opus).
Build a library of well-tuned prompts for common tasks (search, extraction, classification, synthesis). Reuse across workflows.
Create a central dashboard tracking cost and quality across all workflows. This gives you visibility into where optimisation opportunities lie.

Long-Term: Build Your Own Routing Framework

Over time, you might build proprietary routing logic:

Dynamic routing: Route based on input complexity, not just task type. A simple query goes to Haiku; a complex query goes to Opus.
Predictive routing: Use historical data to predict whether Haiku will succeed. If success probability is <90%, route to Opus proactively.
Cost-aware routing: Route based on cost budget. If you’ve spent 80% of your daily budget, route remaining tasks to Haiku even if quality might be slightly lower.

These are advanced techniques, but they’re worth considering as you scale.

Conclusion: Cost-Tuned Routing Is the Future of Agentic AI

Building agentic AI systems is no longer about picking the most capable model and running everything through it. That approach doesn’t scale—financially or operationally.

The future is cost-tuned routing: matching model capability to task complexity, ruthlessly. Use Haiku 4.5 for the 80% of work that doesn’t require reasoning. Reserve Opus 4.7 for the 20% that does. Measure cost and quality obsessively. Iterate on prompts and routing rules monthly.

Done right, this pattern delivers:

65% cost reduction vs. using Opus everywhere.
Maintained or improved quality through better prompting and validation.
Faster execution because Haiku is 2–3x faster than Opus.
Scaled throughput because cost per task is low enough to handle higher volumes.

For teams building AI products, automating operations, or modernising platforms, this is non-negotiable. It’s the difference between a sustainable business and one that dies on the spreadsheet.

At PADISO, we’ve implemented this pattern across dozens of systems—from AI automation agency Sydney projects to enterprise platform modernisations. The results are consistent. The economics are undeniable.

If you’re building agentic AI and haven’t implemented cost-tuned routing yet, start this week. The 65% cost savings are waiting.

For guidance on implementing this pattern in your specific context—whether you’re a startup exploring AI agency business model Sydney options or an enterprise modernising with AI—reach out to PADISO. We specialise in exactly this work: building cost-efficient, production-grade agentic AI systems that scale.

Cost-Tuned Subagent Routing: Haiku for Search, Opus for Synthesis

Cost-Tuned Subagent Routing: Haiku for Search, Opus for Synthesis

Table of Contents

Why Subagent Routing Matters

The Core Pattern: 80/20 Model Split

Haiku 4.5 for Search and Lightweight Tasks

Search and Retrieval

Data Transformation and Normalisation

Validation and Filtering

Classification and Routing

Why Haiku Is the Right Choice

Opus 4.7 for Synthesis and Complex Reasoning

Complex Synthesis

Edge Case Handling

Creative and Strategic Work

Multi-Step Reasoning with Uncertainty

The Cost-Benefit Trade-Off

Building the Routing Logic

Step 1: Define Task Categories

Step 2: Implement Conditional Routing

Step 3: Add Fallback Logic

Step 4: Monitor and Iterate

Step 5: Tune Prompts for Haiku

Real-World Implementation Examples

Example 1: Venture Studio Due Diligence Agent

Example 2: Customer Support Triage Agent

Example 3: Platform Engineering Modernisation

Measuring Cost and Quality Outcomes

Cost Metrics

Quality Metrics

Cost-Quality Trade-Offs

Monitoring in Production

Common Pitfalls and How to Avoid Them

Pitfall 1: Over-Relying on Haiku

Pitfall 2: Not Tuning Prompts

Pitfall 3: Ignoring Latency

Pitfall 4: Static Routing

Pitfall 5: Not Measuring Fallback Costs

Pitfall 6: Confusing Task Difficulty with Model Capability

Scaling Beyond Two Models

Three-Tier Routing

Four-Tier Routing

Next Steps: From Theory to Production

Week 1: Map Your Workflow

Week 2: Implement Routing

Week 3: Tune Prompts

Week 4: Measure and Iterate

Success Metrics

Scaling to Multiple Workflows

Long-Term: Build Your Own Routing Framework

Conclusion: Cost-Tuned Routing Is the Future of Agentic AI