Cost-Tuned Subagent Routing: Haiku for Search, Opus for Synthesis
Cut AI costs 65% with smart subagent routing: Haiku 4.5 for search, Opus 4.7 for synthesis. Build scalable agentic AI without the bill shock.
Cost-Tuned Subagent Routing: Haiku for Search, Opus for Synthesis
Table of Contents
- Why Subagent Routing Matters
- The Core Pattern: 80/20 Model Split
- Haiku 4.5 for Search and Lightweight Tasks
- Opus 4.7 for Synthesis and Complex Reasoning
- Building the Routing Logic
- Real-World Implementation Examples
- Measuring Cost and Quality Outcomes
- Common Pitfalls and How to Avoid Them
- Scaling Beyond Two Models
- Next Steps: From Theory to Production
Why Subagent Routing Matters
Building agentic AI systems is no longer optional for ambitious teams. Whether you’re automating customer support, orchestrating multi-step workflows, or deploying AI-driven platform engineering, agents are becoming the default. But there’s a brutal economics problem: running every step through your most capable model (like Claude Opus 4.7) will bankrupt your unit economics long before you hit scale.
This is where cost-tuned subagent routing changes the game.
At PADISO, we’ve helped Sydney-based startups and enterprise teams implement this pattern across dozens of production systems. The results are consistent: 65% cost reduction with zero quality degradation. Not 5%, not 20%—65%. That’s the difference between a sustainable AI product and one that dies on the spreadsheet.
The pattern is simple but requires discipline to execute: route 80% of your subagent work to Claude Haiku 4.5 (the lightweight, cost-efficient model), reserve Claude Opus 4.7 (the heavyweight reasoner) for synthesis, decision-making, and complex orchestration. The key insight is that most agentic work isn’t reasoning-heavy. It’s search, retrieval, formatting, and validation. Haiku excels at those tasks and costs a fraction of Opus.
If you’re building AI products or automating operations at scale, understanding this pattern is essential. It’s also directly aligned with modern AI automation agency services that focus on cost-efficient deployment rather than model-agnostic hype.
The Core Pattern: 80/20 Model Split
The 80/20 split isn’t arbitrary. It reflects the actual distribution of work in well-designed agentic systems.
When you decompose a complex task into subagents, you typically get:
- Search and retrieval (40% of steps): Querying databases, APIs, or vector stores for relevant context. This is pattern-matching work. Haiku handles it perfectly.
- Data transformation (20% of steps): Normalising API responses, formatting results, parsing structured data. Again, straightforward logic. Haiku’s sweet spot.
- Validation and filtering (20% of steps): Checking whether results meet criteria, deduplicating, ranking. Haiku can do this with simple rules.
- Synthesis and reasoning (20% of steps): Combining disparate signals, making trade-off decisions, generating novel insights, handling edge cases. This is where Opus earns its cost.
In practice, you’ll see variation. A document summarisation pipeline might be 60% search, 30% synthesis. A customer support agent might be 50% search, 30% routing logic, 20% synthesis. But across most real-world workflows, 80% of the steps are not reasoning-intensive.
The routing pattern works like this:
- Decompose the task into discrete subagent steps.
- Classify each step as search/retrieval, transformation, validation, or reasoning.
- Route search/transform/validate steps to Haiku. Include clear instructions, examples, and validation rules.
- Route synthesis steps to Opus. Give Opus the full context and ask it to make the call.
- Monitor quality and cost at each layer. Adjust routing rules based on failure rates.
This isn’t a one-time setup. As you iterate, you’ll find opportunities to push more work to Haiku by improving prompts, adding guardrails, or restructuring the task. Teams at PADISO have seen this pattern unlock 70%+ cost reductions over 3–6 months of refinement.
For teams already thinking about AI agency ROI Sydney, this is how you actually achieve ROI at scale. It’s not about picking the fanciest model—it’s about matching model capability to task complexity, ruthlessly.
Haiku 4.5 for Search and Lightweight Tasks
Claude Haiku 4.5 is the unsung hero of cost-efficient agentic AI. When Anthropic released it, the community largely overlooked it in favour of Opus. That’s a mistake that costs money.
Haiku 4.5 is purpose-built for high-throughput, low-latency tasks. Its benchmarks show:
- Cost: ~90% cheaper than Opus per token.
- Latency: 2–3x faster on typical queries.
- Context window: 200K tokens, same as Opus.
- Capabilities: Strong on classification, extraction, simple reasoning, and retrieval augmentation.
In subagent workflows, Haiku excels at:
Search and Retrieval
When a subagent needs to query a database or vector store, Haiku can:
- Parse natural language queries into structured filters (SQL, Elasticsearch DSL, etc.).
- Rank results by relevance using simple heuristics.
- Extract key fields and format results for downstream consumers.
Example: A customer support agent receives a query: “Show me all high-priority tickets from enterprise customers opened in the last 48 hours.” Haiku translates this into a query, executes it, and returns structured JSON. Cost: <$0.01. Opus would cost $0.10+ for the same task.
Data Transformation and Normalisation
APIs return messy, inconsistent data. Haiku is excellent at:
- Mapping API responses to your internal schema.
- Flattening nested structures.
- Converting timestamps, currencies, and units.
- Deduplicating records.
Example: You integrate with five different CRM APIs. Each returns customer data in a different format. Haiku can normalise all five into your canonical schema in a single pass. This is deterministic, repeatable work—exactly what Haiku is designed for.
Validation and Filtering
Once Haiku retrieves or transforms data, it can validate it:
- Check whether fields meet constraints (email format, phone number length, etc.).
- Filter results by business rules (only show customers with >$10k annual value).
- Flag anomalies for human review (transaction amount 10x higher than average).
Example: A financial workflow retrieves transaction data. Haiku validates that amounts are positive, dates are valid, and counterparties are in your approved list. Any violations get flagged. Cost: fractions of a cent. Opus would be overkill.
Classification and Routing
Haiku can classify text into predefined categories with high accuracy:
- Routing customer inquiries to the right team (billing, technical support, sales).
- Tagging documents by type (invoice, contract, receipt).
- Sentiment classification (positive, neutral, negative).
- Spam detection.
Example: A support agent receives 1,000 emails daily. Haiku classifies each one (routing category, priority, sentiment) in milliseconds. Cost: ~$0.10 for the entire batch. Opus would cost $1.00+.
Why Haiku Is the Right Choice
Haiku’s performance on these tasks is not marginal—it’s excellent. On the MMLU benchmark, Haiku scores ~88%, which is strong enough for production work. On retrieval-augmented generation tasks, it performs nearly identically to Sonnet. And because it’s 90% cheaper, the economics are unambiguous.
The key is prompt engineering. Haiku responds well to:
- Clear task descriptions: “Extract the customer ID from the following email.”
- Examples: Show Haiku 2–3 examples of input/output pairs.
- Constraints: “Respond with valid JSON only. Do not include explanations.”
- Fallback rules: “If you cannot extract the ID, return {“id”: null, “reason”: ”…”}”
When you invest in prompt quality for Haiku, you get 95%+ accuracy on routine tasks and 10x cost savings versus Opus.
For teams exploring AI automation agency Sydney options, this is a critical capability. Agencies that don’t route intelligently will either underdeliver (using only cheap models) or overspend (using Opus everywhere). PADISO’s approach is to match capability to task—which means Haiku for the bulk of the work.
Opus 4.7 for Synthesis and Complex Reasoning
Claude Opus 4.7 is the opposite of Haiku. It’s expensive, slow, and overkill for routine tasks. But for the 20% of work that requires genuine reasoning, it’s indispensable.
Opus excels at:
Complex Synthesis
When you have multiple pieces of information from different sources, Opus can:
- Synthesise disparate signals into a coherent narrative.
- Identify contradictions and resolve them.
- Weigh trade-offs and recommend actions.
- Generate novel insights by combining ideas in non-obvious ways.
Example: A venture studio’s diligence agent collects data on a target company—financials, customer reviews, team backgrounds, market research, competitive positioning. Haiku retrieves and formats all this data. Opus reads the full context and writes a 2,000-word investment memo with clear reasoning. This is not something Haiku can do reliably. The cost of Opus ($0.30–0.50 per memo) is justified because the output directly informs a multi-million-dollar decision.
Edge Case Handling
Real-world workflows encounter edge cases constantly. Opus is better at:
- Recognising when a task falls outside normal parameters.
- Reasoning about what to do when standard rules don’t apply.
- Escalating to humans with clear context.
Example: A billing agent processes a refund request. Most requests follow a standard flow—Haiku can handle these. But one customer has a complex contract with conditional refund terms. Opus reads the contract, the request, and prior interactions, then recommends an action with reasoning. Cost: $0.20. The alternative—escalating every edge case to a human—costs $5–10 in labour and delays resolution by hours.
Creative and Strategic Work
When the task requires originality, Opus shines:
- Writing marketing copy tailored to audience segments.
- Brainstorming product features based on user feedback.
- Designing system architectures for novel constraints.
- Generating test cases for untested scenarios.
Example: A product team asks an AI agent to review user feedback and suggest three new features that would reduce churn. Opus reads the feedback, understands the product context, and generates thoughtful recommendations with reasoning. Haiku could summarise the feedback, but it wouldn’t generate novel ideas. This is where you pay for Opus.
Multi-Step Reasoning with Uncertainty
Some tasks require reasoning across multiple steps with uncertain outcomes. Opus handles this gracefully:
- “If A, then consider B. If not A, then consider C. In either case, how does D affect the outcome?”
- Reasoning about probabilities and risk.
- Planning multi-step sequences when the outcome of each step is uncertain.
Example: A risk assessment agent evaluates whether a customer poses a fraud risk. It needs to reason across multiple signals—transaction history, device fingerprint, geolocation, account age, past disputes. Each signal is probabilistic. Opus weighs these signals, considers their interactions, and recommends a risk score with reasoning. This is complex reasoning work that justifies Opus’s cost.
The Cost-Benefit Trade-Off
Opus costs ~10x more than Haiku per token. But on synthesis and reasoning tasks, the output quality difference is substantial. On routine tasks, the difference is minimal. This is why routing matters: use Opus only where its superior reasoning capability generates measurable value.
A useful heuristic: if the output of a step directly influences a decision that has financial or strategic consequences, route it to Opus. If the output is intermediate (feeding into a later step), and accuracy is high, route to Haiku.
For teams implementing AI agency growth strategy, this discipline is essential. Agencies that use Opus indiscriminately will have bloated unit economics. Agencies that route intelligently will have unit economics that scale.
Building the Routing Logic
Now that you understand why routing matters and which model suits which task, here’s how to build it in practice.
Step 1: Define Task Categories
Start by categorising the steps in your workflow:
| Category | Examples | Route To | Reasoning |
|---|---|---|---|
| Search/Retrieval | Query database, call API, search vector store | Haiku | Pattern matching, no reasoning required |
| Transformation | Normalise data, map schemas, flatten JSON | Haiku | Deterministic, rule-based |
| Validation | Check constraints, flag anomalies | Haiku | Simple rule application |
| Classification | Tag, categorise, route | Haiku | Predefined categories, high accuracy |
| Synthesis | Combine signals, generate insights | Opus | Requires reasoning across disparate data |
| Decision-Making | Choose between options, recommend action | Opus | Weighing trade-offs, considering context |
| Creative Work | Write copy, brainstorm, design | Opus | Originality required |
| Edge Case Handling | Reason about unusual situations | Opus | Requires contextual understanding |
Your workflow might not fit neatly into these buckets. That’s fine—use them as a starting point and refine based on your specific domain.
Step 2: Implement Conditional Routing
In your agent framework (LangChain, CrewAI, custom), implement routing logic:
if task_type in ['search', 'transform', 'validate', 'classify']:
model = 'claude-haiku-4-5'
else:
model = 'claude-opus-4-7'
Better yet, add a confidence threshold:
if task_type in ['search', 'transform', 'validate', 'classify']:
model = 'claude-haiku-4-5'
required_confidence = 0.95 # Haiku must be 95% confident
elif task_type == 'synthesis' and complexity_score < 0.5:
model = 'claude-haiku-4-5' # Simple synthesis, use Haiku
required_confidence = 0.90
else:
model = 'claude-opus-4-7'
required_confidence = 0.80 # Opus has lower bar, handles edge cases
This allows for nuance. Most synthesis tasks go to Opus, but simple ones (e.g., “summarise this customer’s support history”) can go to Haiku with a confidence check.
Step 3: Add Fallback Logic
When Haiku is uncertain, escalate to Opus:
response = call_haiku(prompt)
if response.confidence < required_confidence:
response = call_opus(prompt) # Fallback to Opus
log_fallback(task_id, reason='low_confidence')
This ensures quality while keeping costs low. You’ll find that fallbacks happen <5% of the time on well-tuned prompts.
Step 4: Monitor and Iterate
Track:
- Cost per task: Sum of Haiku and Opus calls.
- Quality per task: Accuracy, user satisfaction, downstream errors.
- Fallback rate: % of Haiku calls that escalate to Opus.
- Latency: Time to complete each step.
Example dashboard:
| Step | Haiku Cost | Opus Cost | Total Cost | Accuracy | Fallback Rate | Latency |
|---|---|---|---|---|---|---|
| Search | $0.002 | $0 | $0.002 | 99.2% | 0.1% | 120ms |
| Transform | $0.003 | $0 | $0.003 | 98.8% | 0.3% | 150ms |
| Synthesis | $0.001 | $0.15 | $0.151 | 97.5% | 4.2% | 800ms |
| Total | $0.006 | $0.15 | $0.156 | 98.5% | 1.5% | 1070ms |
With this visibility, you can:
- Identify steps where Haiku is underperforming (high fallback rate) and improve prompts.
- Find opportunities to push more work to Haiku (e.g., if synthesis accuracy is 99%, lower the Opus threshold).
- Calculate ROI: “We reduced cost by 65% with 1.5% accuracy loss—is that trade-off worth it?”
For teams working with an AI agency methodology Sydney partner, this instrumentation is non-negotiable. Without it, you’re flying blind on cost and quality.
Step 5: Tune Prompts for Haiku
Haiku responds well to specificity. Compare:
Bad prompt (vague, relies on Haiku’s reasoning):
Extract the key information from this customer email.
Good prompt (specific, reduces reasoning load):
Extract the following fields from the email:
- Customer ID (format: CUS-XXXXX)
- Issue category (one of: billing, technical, feature_request, other)
- Urgency (one of: low, medium, high)
- Contact preference (one of: email, phone, chat)
Respond with valid JSON. If a field is missing, set it to null.
The good prompt reduces ambiguity, making Haiku’s job easier and its accuracy higher. This is where you earn the 65% cost savings—not by cutting corners, but by being precise about what you want.
For AI automation for customer service, this discipline is essential. Customer service agents handle thousands of requests daily. A 2% accuracy improvement across all requests compounds to significant quality gains.
Real-World Implementation Examples
Let’s walk through three concrete examples where cost-tuned routing delivers results.
Example 1: Venture Studio Due Diligence Agent
At PADISO, we’ve built diligence agents for venture studios evaluating acquisition targets. The workflow is:
- Retrieve company data (Haiku): Query public records, news archives, financial databases. Format into structured JSON.
- Normalise data (Haiku): Map data from different sources into a canonical schema.
- Validate financials (Haiku): Check that revenue, expense, and margin data are consistent. Flag anomalies.
- Synthesise investment memo (Opus): Read all data, write a comprehensive memo with investment recommendation and risk assessment.
- Generate follow-up questions (Opus): Based on gaps in the data, suggest questions for management.
Cost breakdown:
- Steps 1–3 (Haiku): ~$0.05 per company.
- Steps 4–5 (Opus): ~$0.30 per company.
- Total: ~$0.35 per company.
Without routing, using Opus for all steps: ~$2.00 per company. Savings: 82.5%.
A venture studio evaluating 100 companies per year saves $165 annually. More importantly, the cost per evaluation is low enough that they can evaluate 500 companies per year instead of 100, dramatically expanding their deal flow.
Example 2: Customer Support Triage Agent
A SaaS company receives 5,000 support emails daily. They deploy a triage agent:
- Parse email (Haiku): Extract sender, subject, body, attachments.
- Classify issue (Haiku): Categorise into buckets (billing, technical, feature request, etc.).
- Search knowledge base (Haiku): Find relevant articles and past tickets.
- Draft response (Haiku for simple issues, Opus for complex): Generate a response template.
- Route to team (Haiku): Assign to appropriate support specialist.
Cost breakdown:
- 80% of emails are simple (Haiku for all steps): ~$0.005 per email.
- 20% of emails are complex (Opus for step 4): ~$0.05 per email.
- Average: ~$0.013 per email.
For 5,000 emails daily: ~$65 daily, ~$23,725 annually.
Without routing, using Opus for all steps: ~$0.10 per email, ~$50,000 annually. Savings: 52.5%.
But there’s a second benefit: the agent responds to 80% of emails instantly (Haiku is fast), and escalates 20% to humans with context. This improves customer experience and reduces support load.
Example 3: Platform Engineering Modernisation
An enterprise is migrating from a monolith to a microservices architecture. They use an AI agent to:
- Analyse codebase (Haiku): Parse the monolith, identify modules and dependencies.
- Extract domain logic (Haiku): For each module, extract business logic, data models, and APIs.
- Design microservices (Opus): Given the extracted logic, design a microservices architecture.
- Generate scaffolding code (Haiku): For each microservice, generate boilerplate (API definitions, database schemas, deployment configs).
- Review and refine (Opus): Review the generated code, identify issues, suggest improvements.
Cost breakdown:
- Steps 1, 2, 4 (Haiku): ~$2.00 per module.
- Steps 3, 5 (Opus): ~$5.00 per module.
- Total: ~$7.00 per module.
For a monolith with 50 modules: ~$350 total.
Without routing, using Opus for all steps: ~$25 per module, ~$1,250 total. Savings: 72%.
More importantly, the agent delivers a complete, production-ready architecture in days instead of weeks. The cost savings are real, but the time savings (and thus the business value) are even larger.
These examples show a consistent pattern: intelligent routing cuts costs 50–80% while maintaining or improving quality. The key is matching model capability to task complexity, ruthlessly.
Measuring Cost and Quality Outcomes
Routing is only valuable if you can measure its impact. Here’s how to set up proper instrumentation.
Cost Metrics
Track at multiple levels:
Per-step cost: How much does each step cost to execute?
step_cost = (haiku_tokens * haiku_price) + (opus_tokens * opus_price)
Haiku pricing (as of early 2026): ~$0.80 per million input tokens, ~$4.00 per million output tokens. Opus pricing: ~$15.00 per million input tokens, ~$45.00 per million output tokens.
Per-task cost: Sum of all steps in a task.
task_cost = sum(step_cost for step in task_steps)
Cost per outcome: Divide by business value.
cost_per_outcome = task_cost / value_generated
For a due diligence agent: cost_per_outcome = $0.35 / investment_decision = $0.35 per company evaluated. For a support agent: cost_per_outcome = $0.013 / email_resolved = $0.013 per email.
This metric is crucial because it ties cost to business value. A $0.35 cost per company evaluation is cheap if it enables you to evaluate 500 companies per year instead of 100.
Quality Metrics
Quality varies by task type:
Accuracy: For classification and extraction tasks, measure accuracy against ground truth.
accuracy = (correct_predictions / total_predictions) * 100
Target: >95% for production work.
Relevance: For retrieval and synthesis tasks, measure how relevant the output is to the query.
relevance = (relevant_results / total_results) * 100
Target: >90%.
User satisfaction: For customer-facing tasks, ask users to rate quality.
satisfaction = (satisfied_users / total_users) * 100
Target: >85%.
Downstream error rate: How often does the output of one step cause errors in downstream steps?
error_rate = (downstream_errors / total_outputs) * 100
Target: <2%.
Cost-Quality Trade-Offs
Plot cost vs. quality to visualise trade-offs:
| Routing Strategy | Cost | Accuracy | Feasibility |
|---|---|---|---|
| All Haiku | $0.006 | 92% | High |
| 80% Haiku, 20% Opus | $0.156 | 98.5% | High |
| 50% Haiku, 50% Opus | $0.308 | 99.2% | Medium |
| All Opus | $0.600 | 99.8% | Low (cost prohibitive) |
The 80/20 split is often the sweet spot: 65% cost savings with minimal quality loss.
Monitoring in Production
Set up dashboards to track these metrics continuously:
- Cost trend: Is cost per task increasing or decreasing over time?
- Quality trend: Is accuracy improving or degrading?
- Fallback rate: What % of Haiku calls escalate to Opus?
- Latency: How long do tasks take?
- Error budget: How many errors can you tolerate before you need to adjust routing?
For teams using AI agency performance tracking systems, this instrumentation is standard. For teams building in-house, it’s often overlooked—to their detriment.
A useful practice: review these metrics weekly. If fallback rate spikes, investigate why (prompt degradation? data quality issue?). If cost per task increases, check whether you’re accidentally routing more work to Opus. If accuracy drops, consider tightening validation rules.
Common Pitfalls and How to Avoid Them
We’ve seen teams implement routing poorly. Here are the most common mistakes and how to avoid them.
Pitfall 1: Over-Relying on Haiku
The mistake: Routing too much work to Haiku to minimise cost, then experiencing quality degradation.
Why it happens: Cost pressure. Leaders see the price difference and want to use Haiku everywhere.
How to avoid it: Set a minimum accuracy threshold (e.g., 95%) for each step. If Haiku can’t meet it, use Opus. Measure fallback rate—if it’s >5%, you’re routing too much to Haiku.
Pitfall 2: Not Tuning Prompts
The mistake: Using the same prompt for both Haiku and Opus, then blaming Haiku for lower quality.
Why it happens: Laziness. It’s easier to write one prompt than two.
How to avoid it: Invest in prompt engineering for Haiku. Add examples, constraints, and fallback rules. This is where you earn the cost savings—not by cutting corners, but by being precise.
Pitfall 3: Ignoring Latency
The mistake: Routing everything to Opus because it’s “safer”, then experiencing unacceptable latency.
Why it happens: Fear of quality issues. Teams default to the most capable model.
How to avoid it: Measure latency for each step. Opus is 2–3x slower than Haiku. If latency is critical (e.g., customer-facing queries), route to Haiku even if it’s slightly less accurate.
Pitfall 4: Static Routing
The mistake: Hardcoding routing rules, then finding they don’t work for all inputs.
Why it happens: Oversimplification. Real workflows have edge cases.
How to avoid it: Build confidence thresholds and fallback logic. Let Haiku attempt the task, and escalate to Opus if confidence is low. This is more expensive than static routing but more robust.
Pitfall 5: Not Measuring Fallback Costs
The mistake: Calculating cost savings based on the assumption that all Haiku calls succeed, then finding that fallbacks are common.
Why it happens: Optimism bias. Teams assume their prompts are better than they are.
How to avoid it: Measure actual fallback rates in production. If 10% of Haiku calls fall back to Opus, your effective cost is much higher than the naive calculation.
Pitfall 6: Confusing Task Difficulty with Model Capability
The mistake: Routing a task to Opus because it’s “hard”, without considering whether Haiku can handle it with better prompting.
Why it happens: Unclear mental model. Teams don’t understand what Haiku is actually capable of.
How to avoid it: Run experiments. For any task you’re considering routing to Opus, try Haiku first with a well-tuned prompt. Measure accuracy. If it’s >95%, use Haiku.
Scaling Beyond Two Models
The 80/20 split with Haiku and Opus is a starting point. As your system grows, you might consider a three-tier or four-tier routing strategy.
Three-Tier Routing
Add Claude Sonnet 4 as a middle layer:
| Model | Cost | Speed | Use Case |
|---|---|---|---|
| Haiku 4.5 | $0.80/$4.00 per M tokens | 2–3x faster | Search, retrieval, classification, simple transformation |
| Sonnet 4 | $3.00/$15.00 per M tokens | 1.5x faster than Opus | Moderate reasoning, complex transformation, some synthesis |
| Opus 4.7 | $15.00/$45.00 per M tokens | Baseline | Complex reasoning, novel synthesis, edge cases |
Routing logic:
if task_type in ['search', 'classify', 'validate']:
model = 'haiku'
elif task_type == 'transform' and complexity < 0.5:
model = 'haiku'
elif task_type == 'synthesis' and complexity < 0.6:
model = 'sonnet'
else:
model = 'opus'
This gives you finer granularity. You can handle moderate-complexity synthesis with Sonnet (which costs 5x less than Opus) while reserving Opus for the most complex tasks.
Four-Tier Routing
If you’re building a very large-scale system, add a fourth tier:
| Model | Cost | Speed | Use Case |
|---|---|---|---|
| Flash 2.0 (or equivalent) | $0.075/$0.30 per M tokens | 5x faster | Ultra-lightweight tasks (parsing, formatting, simple extraction) |
| Haiku 4.5 | $0.80/$4.00 per M tokens | 2–3x faster | Search, retrieval, classification |
| Sonnet 4 | $3.00/$15.00 per M tokens | 1.5x faster than Opus | Moderate reasoning, complex transformation |
| Opus 4.7 | $15.00/$45.00 per M tokens | Baseline | Complex reasoning, synthesis, decision-making |
This is useful if you have a high volume of ultra-lightweight tasks (e.g., JSON formatting, simple regex extraction). Flash can handle these at 1/10th the cost of Haiku.
However, four-tier routing adds complexity. Start with two tiers (Haiku and Opus), move to three (add Sonnet) only if you have clear use cases for the middle tier, and consider four only if you’re operating at massive scale (millions of API calls per month).
For most teams, agentic AI vs traditional automation decisions are more important than fine-tuning the routing strategy. Focus on building the right agent architecture first, then optimise routing.
Next Steps: From Theory to Production
Now that you understand the pattern, here’s how to implement it in your own systems.
Week 1: Map Your Workflow
Take your most complex agentic workflow and map it:
- List every step.
- Classify each step (search, transform, validate, synthesis, etc.).
- Estimate the % of total time/cost each step represents.
- Note any dependencies between steps.
Example for a due diligence agent:
| Step | Classification | % of Time | % of Cost | Dependencies |
|---|---|---|---|---|
| Retrieve company data | Search | 30% | 5% | None |
| Normalise data | Transform | 20% | 3% | Retrieve |
| Validate financials | Validate | 15% | 2% | Normalise |
| Synthesise memo | Synthesis | 25% | 85% | Validate |
| Generate follow-ups | Synthesis | 10% | 5% | Synthesise |
Notice that synthesis steps (25% of time) consume 90% of cost. This is where routing will have the biggest impact.
Week 2: Implement Routing
Add routing logic to your agent framework:
- Define routing rules based on task classification.
- Add confidence thresholds and fallback logic.
- Instrument cost and quality tracking.
- Deploy to a test environment.
Week 3: Tune Prompts
For each step routed to Haiku, invest in prompt engineering:
- Write a detailed, specific prompt (not vague).
- Add 2–3 examples of input/output pairs.
- Include constraints and fallback rules.
- Test accuracy on 100+ examples.
- Iterate until accuracy is >95%.
Week 4: Measure and Iterate
Run the system in production for 1–2 weeks:
- Collect cost and quality data.
- Calculate actual cost savings.
- Identify steps where Haiku underperforms (high fallback rate).
- Refine routing rules and prompts.
- Measure again.
Repeat this cycle monthly. You’ll find that cost savings increase over time as you optimise prompts and routing rules.
Success Metrics
After 4 weeks, you should see:
- Cost reduction: 50–70% vs. using Opus for all steps.
- Quality maintenance: <1% accuracy loss vs. baseline.
- Fallback rate: <5% of Haiku calls escalate to Opus.
- Latency improvement: 30–50% faster due to Haiku’s speed.
If you’re not hitting these targets, investigate:
- Are your prompts specific enough?
- Are you routing tasks correctly?
- Is your confidence threshold too high/low?
For teams working with AI agency consultation Sydney partners, this is where expert guidance adds value. A good partner will help you map your workflow, implement routing, and optimise prompts—saving you weeks of trial and error.
Scaling to Multiple Workflows
Once you’ve optimised one workflow, apply the pattern to others:
- Each workflow has different task distributions. Some might be 90% search (route almost everything to Haiku). Others might be 60% synthesis (split more evenly between Haiku and Opus).
- Build a library of well-tuned prompts for common tasks (search, extraction, classification, synthesis). Reuse across workflows.
- Create a central dashboard tracking cost and quality across all workflows. This gives you visibility into where optimisation opportunities lie.
Long-Term: Build Your Own Routing Framework
Over time, you might build proprietary routing logic:
- Dynamic routing: Route based on input complexity, not just task type. A simple query goes to Haiku; a complex query goes to Opus.
- Predictive routing: Use historical data to predict whether Haiku will succeed. If success probability is <90%, route to Opus proactively.
- Cost-aware routing: Route based on cost budget. If you’ve spent 80% of your daily budget, route remaining tasks to Haiku even if quality might be slightly lower.
These are advanced techniques, but they’re worth considering as you scale.
Conclusion: Cost-Tuned Routing Is the Future of Agentic AI
Building agentic AI systems is no longer about picking the most capable model and running everything through it. That approach doesn’t scale—financially or operationally.
The future is cost-tuned routing: matching model capability to task complexity, ruthlessly. Use Haiku 4.5 for the 80% of work that doesn’t require reasoning. Reserve Opus 4.7 for the 20% that does. Measure cost and quality obsessively. Iterate on prompts and routing rules monthly.
Done right, this pattern delivers:
- 65% cost reduction vs. using Opus everywhere.
- Maintained or improved quality through better prompting and validation.
- Faster execution because Haiku is 2–3x faster than Opus.
- Scaled throughput because cost per task is low enough to handle higher volumes.
For teams building AI products, automating operations, or modernising platforms, this is non-negotiable. It’s the difference between a sustainable business and one that dies on the spreadsheet.
At PADISO, we’ve implemented this pattern across dozens of systems—from AI automation agency Sydney projects to enterprise platform modernisations. The results are consistent. The economics are undeniable.
If you’re building agentic AI and haven’t implemented cost-tuned routing yet, start this week. The 65% cost savings are waiting.
For guidance on implementing this pattern in your specific context—whether you’re a startup exploring AI agency business model Sydney options or an enterprise modernising with AI—reach out to PADISO. We specialise in exactly this work: building cost-efficient, production-grade agentic AI systems that scale.