Guide 5 mins

The Claude Agent SDK: Building Production AI Agents in 2026

Deep dive into Claude Agent SDK architecture, agentic loop primitives, and when to use it vs custom orchestration for production AI agents.

Padiso Team ·2026-04-17

The Claude Agent SDK: Building Production AI Agents in 2026

Why the Claude Agent SDK Matters in 2026
Understanding the Architecture
Core Agentic Loop Primitives
SDK vs Custom Orchestration: When to Choose Each
Getting Started with the Claude Agent SDK
Building Your First Production Agent
Advanced Patterns and Real-World Implementation
Security, Compliance, and Deployment
Performance Optimisation and Scaling
Next Steps: From Prototype to Production

Why the Claude Agent SDK Matters in 2026

In 2026, the distinction between traditional software and agentic software has become operational reality, not theoretical debate. The Claude Agent SDK represents a fundamental shift in how teams build autonomous systems—moving from hand-coded orchestration loops to purpose-built primitives designed by Anthropic specifically for production workloads.

The market context is clear. Enterprise teams are no longer experimenting with whether agentic AI works; they’re racing to deploy it at scale. The agentic AI vs traditional automation conversation has matured beyond proof-of-concepts. Real companies are replacing legacy RPA platforms, manual workflows, and brittle rule-based systems with autonomous agents that can reason, adapt, and self-correct.

What makes the Claude Agent SDK different from rolling your own orchestration? Three things: speed to production, reliability under uncertainty, and native integration with Claude Code—Anthropic’s agentic coding capability that lets agents reason about and modify codebases in real time.

For Sydney-based founders and operators modernising their tech stack, this matters acutely. Whether you’re a seed-stage startup needing fractional CTO leadership to ship your first AI product, or a mid-market company automating 50+ workflows, the Claude Agent SDK eliminates months of custom plumbing. You focus on business logic. The SDK handles the hard parts: tool calling, context management, error recovery, and state tracking.

This guide walks you through the architecture, explains when to use the SDK versus building custom orchestration, and shows you exactly how to ship production agents that actually work.

Understanding the Architecture

The Claude Agent SDK Design Philosophy

Anthropic’s Claude Agent SDK is not a wrapper around Claude’s API. It’s a purpose-built framework for agentic loops, with three core design principles:

First: Minimal Abstraction Over Maximum Control. The SDK provides opinionated defaults for the 80% of agentic patterns that are common across use cases—tool calling, iteration, state management, error handling. But it doesn’t force you into a box. You can extend, override, and customise every layer. This is critical for production work where edge cases matter.

Second: Native Integration with Claude Code. Unlike generic LLM orchestration frameworks, the Claude Agent SDK is built with Claude Code as a first-class capability. This means agents can reason about your codebase, propose changes, and execute them—all within a single agentic loop. This is transformative for engineering workflows, platform modernisation, and complex automation tasks.

Third: Observability and Auditability from Day One. The SDK is designed for regulated environments. Every agent decision, tool invocation, and state transition is logged and traceable. This matters for SOC 2 compliance and audit readiness—critical requirements for enterprise deployments.

The Agent SDK Foundations course explains this well: the SDK is a programmatic alternative to CLI-based approaches, giving you fine-grained control over agent behaviour while maintaining simplicity in the common case.

The Four-Layer Architecture

The Claude Agent SDK operates across four distinct layers:

Layer 1: The LLM Core. This is a current Claude model (the Opus 4.x and Sonnet 4.x family, with Opus 4.8 as the flagship), tuned specifically for agentic reasoning. The model is optimised for multi-step reasoning, tool use, and self-correction—not just raw instruction-following. The difference matters: agentic Claude has been trained to think in loops, to recognise when it needs to call a tool, to evaluate whether the result was what it expected, and to adapt if not.

Layer 2: Tool Binding and Execution. The SDK provides a declarative interface for defining tools—functions the agent can call. You define the tool’s signature (inputs, outputs, description), and the SDK handles everything else: presenting it to Claude, parsing Claude’s tool-use requests, executing the tool safely, and feeding results back into the agentic loop. This is where the SDK saves you months of work. Tool calling in production is not trivial—you need timeout handling, retry logic, error recovery, and context management. The SDK does this out of the box.

Layer 3: State and Context Management. Agents need memory. The SDK provides structured state management: conversation history, tool results, intermediate reasoning steps, and custom state variables. This layer handles the complexity of maintaining context across dozens or hundreds of iterations while keeping tokens and latency under control. It’s not magic, but it’s non-trivial to get right.

Layer 4: Orchestration and Control Flow. This is where the agentic loop lives. The SDK implements the core loop: send the current state to Claude, receive a response (which may include tool calls), execute tools, update state, and iterate until the agent reaches a terminal condition (success, failure, or max iterations). You can hook into every step of this loop—run custom validation, inject context, modify decisions—but the happy path is simple and fast.

The Role of Claude Code

Claude Code is the secret weapon. It’s Anthropic’s agentic coding tool—an agent that can read files, understand codebases, propose code changes, and execute them. When you integrate Claude Code with the Claude Agent SDK, you get agents that can actually modify your systems.

This is not theoretical. Real teams are using this to:

Automate refactoring across large codebases (50,000+ lines, 200+ files)
Generate and execute database migrations with validation
Build and deploy infrastructure changes
Debug production issues by reading logs, tracing code, and proposing fixes

The Claude Code overview shows how this integrates into the SDK. Claude Code is a tool that agents can call, just like any other—but it’s a tool that understands software and can reason about it.

Core Agentic Loop Primitives

The Basic Loop: Observe → Decide → Act → Learn

Every agent, whether built with the Claude Agent SDK or custom orchestration, implements this loop:

Observe: Gather the current state (user input, tool results, context, constraints).
Decide: Send state to the LLM; receive a decision (which tool to call, or a final response).
Act: Execute the decision (call the tool, or return the response).
Learn: Update state based on the outcome; prepare for the next iteration.

The Claude Agent SDK provides primitives for each step. But understanding the primitives separately is crucial—it’s the difference between using a framework and understanding what the framework does.

Primitive 1: Tool Definitions and Binding

Tools are the agent’s interface to the world. A tool is a function the agent can call. When you define a tool, you specify:

Name: How the agent refers to it (e.g., execute_sql_query, send_email, read_file).
Description: What the tool does, in plain language. Claude uses this to decide when to call the tool.
Input Schema: What parameters the tool accepts (JSON Schema format).
Implementation: The actual code that runs when the agent calls the tool.

The SDK handles the binding: it presents the tool definitions to Claude, parses Claude’s tool-use requests, validates inputs against the schema, executes the tool, and returns results. This is where you save significant engineering effort. In custom orchestration, you’re writing all of this yourself.

Example: if you’re building an agent to automate customer support, you might define tools like:

lookup_customer_by_email: Queries the customer database
get_support_tickets: Retrieves open tickets for a customer
update_ticket_status: Closes or escalates a ticket
send_email_to_customer: Sends a response email

Each tool is a Python function (or JavaScript, or Rust—the SDK supports multiple languages). The SDK wraps it, presents it to Claude, and handles the plumbing.

Primitive 2: Iteration and Termination

Agents don’t usually solve problems in one step. They iterate. They call a tool, examine the result, call another tool, reason about the combined results, and so on. The SDK manages this iteration with built-in termination conditions:

Max Iterations: Stop after N steps (e.g., 10 iterations) to prevent runaway loops.
Tool Success: Stop if the agent explicitly signals success (e.g., by calling a finish tool).
No More Tools: Stop if Claude decides no more tools are needed and returns a final response.
Error Threshold: Stop if too many tools fail in succession.

You can define custom termination conditions. For example, if you’re building an agent to manage infrastructure, you might terminate when the agent has successfully deployed the application and run health checks. The SDK gives you hooks to implement this logic.

Primitive 3: Error Handling and Recovery

Production agents fail. Tools timeout, APIs return errors, data is missing, Claude makes mistakes. The SDK provides structured error handling:

Tool Execution Errors: If a tool fails (exception, timeout, invalid input), the SDK catches the error, formats it, and feeds it back to Claude as part of the next iteration. Claude can then decide to retry, try a different tool, or escalate.
Validation Errors: If Claude’s tool request violates the schema (wrong parameter type, missing required field), the SDK rejects it and asks Claude to correct it.
Context Errors: If the agent’s state becomes inconsistent (e.g., a required variable is missing), the SDK can halt or inject a recovery action.

This is crucial for reliability. In custom orchestration, you’re writing all of this error handling yourself, and it’s easy to miss edge cases. The SDK makes error recovery a first-class concern.

Primitive 4: Context and State Management

Agents need memory. The SDK provides:

Conversation History: The full back-and-forth between the agent and Claude, including tool calls and results.
Tool Result Caching: Recent tool results are cached to avoid redundant API calls.
Custom State Variables: You can store arbitrary state (e.g., user_id, workflow_status, decision_log) that persists across iterations.
Context Injection: You can inject context at any point (e.g., “the user is a premium customer” or “the API is currently degraded”).

The SDK is smart about context. It doesn’t send the entire conversation history to Claude every time—that would be expensive and slow. Instead, it uses intelligent summarisation and truncation to keep the context window manageable while preserving the information Claude needs to make good decisions.

Primitive 5: Tool Result Interpretation

When a tool returns a result, the agent needs to understand it. The SDK provides structured interpretation:

Success vs Failure: Tools return a status (success or failure) along with the result.
Structured Results: Tools return data in a format Claude can reason about (JSON, not free-form text).
Result Validation: The SDK can validate that tool results match expected schemas, catching bugs early.

This matters for reliability. If a tool returns unexpected data, the SDK can catch it and either retry the tool or escalate to human intervention.

SDK vs Custom Orchestration: When to Choose Each

When to Use the Claude Agent SDK

Use the SDK if:

1. You need to ship fast. The SDK eliminates weeks of orchestration plumbing. If you’re a seed-stage startup building your first AI product, or a mid-market company modernising operations with agentic AI automation, the time savings are massive. Real numbers: teams report 4–6 weeks faster to production compared to custom orchestration.

2. Your agent needs to be reliable and auditable. The SDK is built for production. It has built-in error handling, logging, and observability. If you’re building agents for regulated industries (finance, healthcare, legal), or if you need to pass compliance audits, the SDK’s audit trail is invaluable. Every decision, every tool call, every error is logged and traceable.

3. You’re building agents that interact with code. Claude Code is a first-class citizen in the SDK. If your agent needs to read files, understand codebases, propose changes, or execute code, the SDK makes this seamless. This is transformative for engineering automation, platform modernisation, and infrastructure management.

4. You want to avoid common pitfalls. Custom orchestration is full of subtle bugs: context window overflow, infinite loops, tool timeouts, inconsistent state, token waste. The SDK is designed to avoid these. It’s been battle-tested by Anthropic and used in production by hundreds of teams.

5. You’re scaling from one agent to many. The SDK is designed for scaling. Once you’ve built one agent, building the second is much faster. The patterns are consistent, the infrastructure is shared, and you can reuse tools across agents.

When to Build Custom Orchestration

Build custom orchestration if:

1. Your agent needs non-standard control flow. If your agent’s decision logic doesn’t fit the standard observe-decide-act-learn loop, custom orchestration might be simpler. For example, if you’re building a multi-agent system where agents coordinate with each other, or if you need agents to run in parallel with complex synchronisation, the SDK might feel constraining.

2. You need deep integration with legacy systems. If your agent needs to integrate tightly with legacy systems that have non-standard APIs or protocols, custom orchestration gives you more flexibility. You can build the integration layer exactly as you need it.

3. You’re optimising for extreme latency or cost. The SDK adds a small overhead—it’s not much, but it’s there. If you’re building agents that need sub-100ms latency or you’re operating at massive scale (millions of agent runs per day), custom orchestration might be more efficient. That said, the SDK is quite lean, and this is rarely the bottleneck.

4. You’re experimenting with novel agent architectures. If you’re doing research or building something that doesn’t fit standard agentic patterns, custom orchestration gives you the freedom to experiment. Once you’ve validated the architecture, you can often move to the SDK.

The Hybrid Approach: SDK + Custom Layers

In practice, many teams use a hybrid approach. They use the Claude Agent SDK for the core agentic loop, but layer custom code on top for domain-specific logic.

For example, a team building a customer support agent might:

Use the SDK for the core agent loop (observe → decide → act → learn).
Build custom middleware to inject company-specific context (customer tier, account status, recent interactions).
Implement custom tools that wrap legacy CRM and ticketing systems.
Add a custom decision layer that checks company policies before the agent sends responses.

This gives you the best of both worlds: the SDK’s reliability and speed, plus the flexibility to handle domain-specific requirements.

At PADISO, we’ve found that this hybrid approach works well for enterprise teams. The SDK handles the heavy lifting, and custom layers handle the specifics. This is where AI & Agents Automation expertise comes in—knowing where to use the SDK and where to build custom logic.

Getting Started with the Claude Agent SDK

Installation and Prerequisites

The Claude Agent SDK is available in multiple languages. The official GitHub repository has the Python version, but there’s also JavaScript (via npm), Rust (via Docs.rs), and other languages.

For Python, installation is straightforward:

pip install anthropic-agent-sdk

You’ll need:

An Anthropic API key (get one at console.anthropic.com)
Python 3.10 or later
Basic familiarity with async/await (the SDK is async-first)

For JavaScript, the npm package is:

npm install @anthropic-ai/claude-agent-sdk

For Rust, add to your Cargo.toml:

[dependencies]
claude-agent-sdk = "0.1"

Setting Up Your First Agent

The simplest agent is just a few lines of code. Here’s a Python example:

from anthropic_agent_sdk import Agent

agent = Agent(
    name="HelloAgent",
    model="claude-3-5-sonnet-20241022",
    instructions="You are a helpful assistant.",
)

response = agent.run("What is 2 + 2?")
print(response)

This creates an agent, gives it instructions, and runs it with a simple query. The agent processes the query and returns a response.

But this is the trivial case. Real agents need tools. Here’s an agent with a tool:

from anthropic_agent_sdk import Agent, Tool
import json

def get_weather(location: str) -> dict:
    """Get the current weather for a location."""
    # In reality, this would call a weather API
    return {"location": location, "temperature": 22, "condition": "sunny"}

agent = Agent(
    name="WeatherAgent",
    model="claude-3-5-sonnet-20241022",
    instructions="You are a helpful weather assistant. Use the get_weather tool to answer questions about weather.",
    tools=[
        Tool(
            name="get_weather",
            description="Get the current weather for a location",
            function=get_weather,
        )
    ],
)

response = agent.run("What's the weather in Sydney?")
print(response)

Now the agent has a tool. When you ask it about the weather, it will:

Recognise that it needs weather information
Call the get_weather tool with “Sydney” as the location
Receive the result
Formulate a response based on the result

This is where the SDK shines. You define the tool function, and the SDK handles everything else: presenting it to Claude, parsing the tool call, executing the function, and feeding the result back.

Configuration and Customisation

The Agent class accepts many configuration options:

name: The agent’s name (used in logs and traces)
model: Which Claude model to use (a current Sonnet 4.x model is a solid default for agents; reach for Opus 4.8 on harder reasoning)
instructions: The system prompt that guides the agent’s behaviour
tools: List of tools the agent can use
max_iterations: Maximum number of agentic loop iterations (default: 10)
timeout: How long to wait for each iteration (default: 60 seconds)
temperature: Sampling temperature (0.0 for deterministic, 1.0 for creative)
context_window: How much history to keep in context

You can also provide custom hooks:

on_iteration: Called after each iteration (useful for logging or custom logic)
on_tool_call: Called before each tool execution (useful for validation or rate limiting)
on_error: Called when an error occurs (useful for custom error handling)

These hooks are where you layer custom logic on top of the SDK.

Understanding the Response Object

When you call agent.run(), you get back a response object that contains:

message: The final response text
iterations: How many iterations the agent ran (useful for understanding complexity)
tool_calls: A list of all tools the agent called, in order
state: The final state of the agent (custom variables, context, etc.)
tokens_used: How many tokens were consumed (useful for cost tracking)

This is valuable for observability. You can log these metrics, track agent performance, and debug issues.

Building Your First Production Agent

A Realistic Example: Customer Support Automation

Let’s build a customer support agent that can:

Look up customers by email
Retrieve their support tickets
Answer common questions
Escalate complex issues

Here’s the full implementation:

from anthropic_agent_sdk import Agent, Tool
import json
from datetime import datetime

# Mock database
CUSTOMERS = {
    "alice@example.com": {"id": "cust_001", "name": "Alice", "tier": "premium"},
    "bob@example.com": {"id": "cust_002", "name": "Bob", "tier": "standard"},
}

TICKETS = {
    "cust_001": [
        {"id": "tick_001", "status": "open", "subject": "Billing issue", "created": "2025-01-15"},
        {"id": "tick_002", "status": "closed", "subject": "Feature request", "created": "2025-01-10"},
    ],
    "cust_002": [
        {"id": "tick_003", "status": "open", "subject": "Login problem", "created": "2025-01-18"},
    ],
}

# Tool definitions
def lookup_customer(email: str) -> dict:
    """Look up a customer by email address."""
    if email in CUSTOMERS:
        return {"success": True, "customer": CUSTOMERS[email]}
    return {"success": False, "error": f"Customer {email} not found"}

def get_tickets(customer_id: str) -> dict:
    """Get support tickets for a customer."""
    tickets = TICKETS.get(customer_id, [])
    return {"success": True, "tickets": tickets}

def update_ticket(ticket_id: str, status: str) -> dict:
    """Update a ticket's status."""
    # In reality, this would update a database
    return {"success": True, "message": f"Ticket {ticket_id} updated to {status}"}

def escalate_to_human(reason: str) -> dict:
    """Escalate an issue to a human agent."""
    return {
        "success": True,
        "message": f"Issue escalated to human agent. Reason: {reason}",
        "ticket_id": f"escalated_{datetime.now().timestamp()}"
    }

# Create the agent
agent = Agent(
    name="CustomerSupportAgent",
    model="claude-3-5-sonnet-20241022",
    instructions="""You are a customer support agent. Your job is to help customers with their issues.

1. Always start by looking up the customer by email
2. Retrieve their open tickets
3. Try to answer their question or resolve their issue
4. If the issue is complex or requires human judgment, escalate to a human agent
5. Be polite, professional, and empathetic

You have access to customer lookup, ticket retrieval, ticket updates, and escalation tools.""",
    tools=[
        Tool(
            name="lookup_customer",
            description="Look up a customer by email address",
            function=lookup_customer,
        ),
        Tool(
            name="get_tickets",
            description="Get all support tickets for a customer",
            function=get_tickets,
        ),
        Tool(
            name="update_ticket",
            description="Update a ticket's status (e.g., 'resolved', 'closed')",
            function=update_ticket,
        ),
        Tool(
            name="escalate_to_human",
            description="Escalate an issue to a human agent",
            function=escalate_to_human,
        ),
    ],
    max_iterations=5,
)

# Run the agent
if __name__ == "__main__":
    # Example: customer inquiry
    response = agent.run(
        "Hi, I'm alice@example.com. I have a billing issue that needs to be resolved."
    )
    
    print(f"Agent response: {response.message}")
    print(f"Iterations: {response.iterations}")
    print(f"Tools called: {[call['tool'] for call in response.tool_calls]}")
    print(f"Tokens used: {response.tokens_used}")

Walk through what happens:

The agent receives the customer’s message
It recognises that it needs to look up the customer, so it calls lookup_customer with “alice@example.com”
It receives the customer object (Alice, premium tier)
It calls get_tickets to see what issues are open
It sees that Alice has a billing issue (ticket_001)
It reasons about the issue and either resolves it or escalates it
It returns a response to the customer

This is a realistic agent. It handles multiple tools, maintains state across iterations, and can escalate when needed. And it’s only about 100 lines of code—most of which is tool definitions.

Adding Observability

For production, you want to see what’s happening inside the agent. The SDK provides hooks:

def log_iteration(iteration_data):
    print(f"Iteration {iteration_data['number']}: {iteration_data['status']}")
    if iteration_data['tool_calls']:
        for call in iteration_data['tool_calls']:
            print(f"  → Called {call['tool']} with {call['input']}")

agent = Agent(
    name="CustomerSupportAgent",
    # ... other config ...
    on_iteration=log_iteration,
)

Now every iteration is logged. You can see exactly what tools the agent called, in what order, and why.

Advanced Patterns and Real-World Implementation

Pattern 1: Multi-Step Workflows

Many real-world problems require multi-step workflows. For example, processing a refund request requires:

Verify the customer’s identity
Look up the original order
Check if the refund is eligible
Process the refund
Send a confirmation email

The SDK handles this naturally. Each step is a tool call. The agent iterates through them, and the SDK manages the state and context.

def verify_identity(email: str, verification_code: str) -> dict:
    """Verify customer identity with a code sent to their email."""
    # In reality, this would check against a database
    return {"success": True, "verified": True}

def lookup_order(customer_id: str, order_id: str) -> dict:
    """Look up an order."""
    return {
        "success": True,
        "order": {
            "id": order_id,
            "amount": 99.99,
            "date": "2025-01-01",
            "status": "delivered"
        }
    }

def check_refund_eligibility(order_id: str) -> dict:
    """Check if an order is eligible for refund."""
    # 30-day refund window
    return {"success": True, "eligible": True, "reason": "Within 30-day window"}

def process_refund(order_id: str, amount: float) -> dict:
    """Process a refund."""
    return {"success": True, "refund_id": f"ref_{order_id}", "amount": amount}

def send_confirmation_email(email: str, refund_id: str) -> dict:
    """Send a confirmation email."""
    return {"success": True, "message": f"Email sent to {email}"}

# Agent with all these tools
agent = Agent(
    name="RefundAgent",
    model="claude-3-5-sonnet-20241022",
    instructions="""You are a refund processing agent. Follow these steps:
1. Verify the customer's identity
2. Look up their order
3. Check if they're eligible for a refund
4. If eligible, process the refund
5. Send a confirmation email

Be thorough and follow company policy.""",
    tools=[
        Tool(name="verify_identity", description="Verify customer identity", function=verify_identity),
        Tool(name="lookup_order", description="Look up an order", function=lookup_order),
        Tool(name="check_refund_eligibility", description="Check refund eligibility", function=check_refund_eligibility),
        Tool(name="process_refund", description="Process a refund", function=process_refund),
        Tool(name="send_confirmation_email", description="Send confirmation email", function=send_confirmation_email),
    ],
    max_iterations=10,
)

The agent will work through these steps in order, handling errors and edge cases as they arise.

Pattern 2: Conditional Logic and Branching

Some workflows have conditional branches. For example, a support agent might:

If the customer is premium tier: resolve immediately
If the customer is standard tier: offer a discount
If the issue is complex: escalate to a specialist

You implement this by giving the agent tools that represent each branch, and letting Claude decide which branch to take based on the context.

def resolve_immediately(ticket_id: str) -> dict:
    """For premium customers, resolve immediately."""
    return {"success": True, "action": "resolved", "ticket_id": ticket_id}

def offer_discount(customer_id: str, discount_percent: int) -> dict:
    """For standard customers, offer a discount."""
    return {"success": True, "action": "discount_offered", "discount_percent": discount_percent}

def escalate_to_specialist(ticket_id: str, reason: str) -> dict:
    """For complex issues, escalate to a specialist."""
    return {"success": True, "action": "escalated", "specialist_assigned": True}

# Agent with branching logic
agent = Agent(
    name="SmartSupportAgent",
    model="claude-3-5-sonnet-20241022",
    instructions="""You are a support agent. Based on the customer's tier and issue complexity:
- Premium customers: resolve immediately
- Standard customers: offer a discount
- Complex issues: escalate to a specialist

Use the appropriate tool based on the situation.""",
    tools=[
        Tool(name="resolve_immediately", description="...", function=resolve_immediately),
        Tool(name="offer_discount", description="...", function=offer_discount),
        Tool(name="escalate_to_specialist", description="...", function=escalate_to_specialist),
    ],
)

Claude will automatically choose the right tool based on the customer’s tier and the issue complexity.

Pattern 3: Integration with Claude Code

This is where things get powerful. If your agent needs to interact with code—read files, understand codebases, propose changes—you can use Claude Code.

The Claude Code overview explains the basics, but here’s how it integrates with the SDK:

Claude Code is a tool. You give the agent access to it, and it can call it like any other tool. The agent can:

Read files from your codebase
Understand the code structure
Propose changes
Execute changes

Example: an agent that automates database migrations

def read_schema_file(path: str) -> dict:
    """Read a database schema file."""
    # Returns the schema
    return {"success": True, "content": "...schema content..."}

def propose_migration(schema_changes: str) -> dict:
    """Propose a database migration based on schema changes."""
    # Claude Code would be called here to generate the migration
    return {"success": True, "migration": "...migration SQL..."}

def execute_migration(migration_sql: str) -> dict:
    """Execute a migration against the database."""
    return {"success": True, "message": "Migration executed successfully"}

agent = Agent(
    name="MigrationAgent",
    model="claude-3-5-sonnet-20241022",
    instructions="""You are a database migration agent. When asked to migrate a schema:
1. Read the current schema
2. Propose a migration
3. Execute it

Be careful to preserve data and follow best practices.""",
    tools=[
        Tool(name="read_schema_file", description="...", function=read_schema_file),
        Tool(name="propose_migration", description="...", function=propose_migration),
        Tool(name="execute_migration", description="...", function=execute_migration),
    ],
)

This is where the Claude Agent SDK becomes genuinely transformative. Agents that can understand and modify code are agents that can automate engineering work.

Security, Compliance, and Deployment

Security Considerations

Agents are powerful, which means they can be dangerous. Key security considerations:

1. Tool Permissions. Not every agent should have access to every tool. A customer support agent shouldn’t have access to payment processing or user deletion. Implement role-based access control (RBAC) for tools.

def get_tools_for_role(role: str) -> list:
    """Return tools appropriate for a given role."""
    if role == "support":
        return [lookup_customer, get_tickets, update_ticket]
    elif role == "admin":
        return [lookup_customer, get_tickets, update_ticket, delete_user, process_refund]
    else:
        return []

agent = Agent(
    name="RoleBasedAgent",
    model="claude-3-5-sonnet-20241022",
    instructions="...",
    tools=get_tools_for_role(current_user_role),
)

2. Input Validation. Always validate tool inputs. Don’t trust Claude to provide valid inputs—validate them server-side.

def delete_user(user_id: str) -> dict:
    """Delete a user (with validation)."""
    # Validate user_id format
    if not user_id.startswith("user_"):
        return {"success": False, "error": "Invalid user ID format"}
    
    # Check if user exists
    if user_id not in USERS:
        return {"success": False, "error": "User not found"}
    
    # Proceed with deletion
    del USERS[user_id]
    return {"success": True, "message": f"User {user_id} deleted"}

3. Audit Logging. Every agent decision, every tool call, every error should be logged. This is non-negotiable for regulated environments.

import logging

logger = logging.getLogger(__name__)

def audit_log(event_type: str, details: dict):
    """Log an event for audit purposes."""
    logger.info(f"AUDIT: {event_type} - {details}")

def on_tool_call(tool_name: str, input_data: dict, user_id: str):
    """Hook called before each tool execution."""
    audit_log("TOOL_CALL", {
        "tool": tool_name,
        "user_id": user_id,
        "timestamp": datetime.now().isoformat(),
    })

agent = Agent(
    name="AuditedAgent",
    model="claude-3-5-sonnet-20241022",
    instructions="...",
    tools=[...],
    on_tool_call=on_tool_call,
)

4. Rate Limiting. Prevent abuse by rate-limiting agent runs and tool calls.

from collections import defaultdict
from datetime import datetime, timedelta

rate_limits = defaultdict(list)

def check_rate_limit(user_id: str, max_calls_per_minute: int = 10) -> bool:
    """Check if a user has exceeded their rate limit."""
    now = datetime.now()
    minute_ago = now - timedelta(minutes=1)
    
    # Remove old calls
    rate_limits[user_id] = [call_time for call_time in rate_limits[user_id] if call_time > minute_ago]
    
    # Check limit
    if len(rate_limits[user_id]) >= max_calls_per_minute:
        return False
    
    # Record this call
    rate_limits[user_id].append(now)
    return True

def run_agent_with_rate_limit(user_id: str, query: str):
    if not check_rate_limit(user_id):
        return {"error": "Rate limit exceeded. Try again in a minute."}
    
    return agent.run(query)

Compliance and Audit Readiness

For regulated environments (finance, healthcare, legal), compliance is critical. The Claude Agent SDK is designed with compliance in mind.

SOC 2 Readiness: The SDK provides:

Complete audit trails (every decision logged)
Error handling and recovery (no silent failures)
Access controls (role-based tool permissions)
Encryption in transit (via HTTPS)
Regular security updates (Anthropic maintains the SDK)

For SOC 2 compliance, you’ll also need to implement:

Data retention policies
Incident response procedures
Regular security audits
Access logging and monitoring

ISO 27001 Readiness: Similar requirements, with additional focus on:

Information security policies
Risk assessment and management
Supplier management (Anthropic’s security practices)
Incident management procedures

If you’re pursuing ISO 27001 compliance, the SDK makes this easier—it’s already designed with security and auditability in mind. But you still need to implement the organisational practices around it.

Deployment Patterns

Pattern 1: Synchronous API. The simplest pattern—expose the agent as an API endpoint.

from fastapi import FastAPI

app = FastAPI()

@app.post("/agent/run")
async def run_agent(query: str, user_id: str):
    if not check_rate_limit(user_id):
        return {"error": "Rate limit exceeded"}
    
    response = agent.run(query)
    audit_log("AGENT_RUN", {"user_id": user_id, "query": query, "iterations": response.iterations})
    
    return {"response": response.message, "iterations": response.iterations}

Pattern 2: Asynchronous Job Queue. For long-running agents, use a job queue.

from celery import Celery

celery_app = Celery()

@celery_app.task
def run_agent_async(query: str, user_id: str, callback_url: str):
    response = agent.run(query)
    
    # Send result to callback URL
    import requests
    requests.post(callback_url, json={"response": response.message, "user_id": user_id})

Pattern 3: Scheduled Agents. For recurring tasks, schedule agents to run at specific times.

from apscheduler.schedulers.background import BackgroundScheduler

scheduler = BackgroundScheduler()

@scheduler.scheduled_job('cron', hour=2, minute=0)  # Run daily at 2 AM
def daily_cleanup_agent():
    response = agent.run("Clean up old support tickets")
    audit_log("SCHEDULED_AGENT", {"task": "daily_cleanup", "iterations": response.iterations})

scheduler.start()

Performance Optimisation and Scaling

Token Efficiency

Tokens are money. Each API call to Claude costs tokens. Optimise:

1. Context Window Management. The SDK intelligently manages context, but you can help:

agent = Agent(
    name="EfficientAgent",
    model="claude-3-5-sonnet-20241022",
    instructions="...",
    tools=[...],
    context_window=2000,  # Limit context to 2000 tokens
)

Limiting context forces the agent to focus on the most recent and relevant information. This saves tokens and often improves decision-making.

2. Tool Result Caching. The SDK caches recent tool results. If the agent calls the same tool twice with the same inputs, it uses the cached result. This saves tokens and improves latency.

3. Summarisation. For long conversations, summarise old context to save tokens.

def summarise_old_context(history: list) -> str:
    """Summarise old conversation history."""
    # Use Claude to summarise
    summary_prompt = f"Summarise this conversation in 1-2 sentences: {history}"
    summary = claude.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=200,
        messages=[{"role": "user", "content": summary_prompt}]
    )
    return summary.content[0].text

Latency Optimisation

Agents need to be fast. Typical latency breakdown:

Network latency to Anthropic API: 100–300ms
Claude processing: 500–2000ms (depends on complexity)
Tool execution: Variable (could be 100ms to 10s)
SDK overhead: 10–50ms

Optimise by:

1. Parallel Tool Calls. If the agent needs to call multiple tools, call them in parallel.

import asyncio

async def parallel_tool_calls(agent: Agent):
    # Instead of calling tools sequentially, call them in parallel
    task1 = asyncio.create_task(lookup_customer("alice@example.com"))
    task2 = asyncio.create_task(get_tickets("cust_001"))
    
    results = await asyncio.gather(task1, task2)
    return results

2. Tool Caching. Cache tool results aggressively.

from functools import lru_cache

@lru_cache(maxsize=1000)
def lookup_customer(email: str) -> dict:
    # This function's results are cached
    return CUSTOMERS.get(email)

3. Model Selection. A current Sonnet 4.x model is a strong default for agents—it’s fast and capable. Reserve Opus 4.8 for tasks that genuinely need its deeper reasoning.

Scaling to Many Agents

Once you’ve built one agent, scaling to many is about infrastructure:

1. Agent Registry. Keep track of all agents and their configurations.

agents = {
    "support": Agent(name="SupportAgent", ...),
    "billing": Agent(name="BillingAgent", ...),
    "technical": Agent(name="TechnicalAgent", ...),
}

def get_agent(agent_name: str) -> Agent:
    return agents.get(agent_name)

2. Shared Tool Library. Build a library of reusable tools that multiple agents can use.

common_tools = [
    Tool(name="lookup_customer", description="...", function=lookup_customer),
    Tool(name="get_tickets", description="...", function=get_tickets),
    Tool(name="send_email", description="...", function=send_email),
]

# Each agent can use these tools
agent1 = Agent(name="Agent1", tools=common_tools + [...])
agent2 = Agent(name="Agent2", tools=common_tools + [...])

3. Load Balancing. Distribute agent runs across multiple instances.

from load_balancer import round_robin

agent_instances = [Agent(...) for _ in range(10)]

def run_agent_load_balanced(query: str):
    agent = round_robin(agent_instances)
    return agent.run(query)

Next Steps: From Prototype to Production

The Path to Production

Building a prototype agent is quick—days or weeks. Shipping to production is more involved. Here’s the path:

Week 1–2: Prototype. Build a simple agent with 3–5 tools. Test it manually. Validate that the basic loop works.

Week 3–4: Hardening. Add error handling, logging, and validation. Test edge cases. Set up monitoring.

Week 5–6: Integration. Integrate with your production systems. Set up databases, APIs, and external services. Conduct security review.

Week 7–8: Testing. Run load tests, stress tests, and security tests. Fix bugs. Optimise performance.

Week 9+: Deployment. Deploy to production. Monitor closely. Iterate based on real-world usage.

This is a realistic timeline for a single agent. Scaling to multiple agents takes longer, but the patterns are consistent.

Building a Team Around Agents

Agents aren’t a solo effort. You need:

1. An AI Engineer. Someone who understands the Claude Agent SDK, can build agents, and can optimise them for production. This is a specialist role.

2. A Platform Engineer. Someone who can build the infrastructure around agents—APIs, databases, monitoring, deployment pipelines.

3. A Domain Expert. Someone who understands the business logic—what the agent should do, what edge cases matter, what compliance requirements apply.

At PADISO, we provide fractional CTO leadership to help teams build and scale agent systems. We’ve worked with dozens of teams shipping agents to production, and we know the common pitfalls and how to avoid them.

If you’re building agents at scale, or if you’re uncertain about the path forward, talk to us. We can help you move from prototype to production quickly and reliably.

Monitoring and Observability

Once your agent is in production, you need to see what’s happening. Key metrics:

1. Success Rate. What percentage of agent runs complete successfully?

2. Iteration Count. How many iterations does each run take on average? More iterations = more tokens = higher cost.

3. Tool Call Distribution. Which tools are called most often? Which tools fail most often?

4. Latency. How long does each run take? Are there bottlenecks?

5. Error Rate. What percentage of tool calls fail? What are the most common errors?

6. Token Usage. How many tokens does each run consume? Is this in line with your cost budget?

Set up dashboards to track these metrics. Use them to identify bottlenecks and optimisation opportunities.

import time
from datetime import datetime

def log_metrics(agent_name: str, response: object):
    metrics = {
        "agent": agent_name,
        "timestamp": datetime.now().isoformat(),
        "success": response.success,
        "iterations": response.iterations,
        "tokens_used": response.tokens_used,
        "latency_ms": response.latency * 1000,
        "tool_calls": len(response.tool_calls),
    }
    
    # Send to monitoring system (e.g., Datadog, New Relic, CloudWatch)
    monitoring_client.send_metrics(metrics)

Continuous Improvement

Agents aren’t static. As you learn more about your use cases, you’ll want to improve them:

1. A/B Testing. Test different agent configurations (different instructions, different tools, different models) and measure which performs better.

2. User Feedback. Collect feedback from users. Which agent responses are helpful? Which are unhelpful? Use this to improve instructions and tools.

3. Error Analysis. When agents fail, understand why. Was it a tool failure? A reasoning error? An edge case? Use these insights to improve.

4. Tool Optimisation. As you learn more about what agents need, optimise your tools. Add new tools, remove unused ones, improve tool descriptions.

Conclusion

The Claude Agent SDK is a transformative tool for building production AI agents. It eliminates months of custom orchestration work, provides built-in reliability and auditability, and integrates seamlessly with Claude Code for engineering automation.

The choice between the SDK and custom orchestration is clear for most teams: use the SDK unless you have a specific reason not to. The SDK is mature, well-documented, and battle-tested in production. Custom orchestration should only be your choice if you have unusual requirements that the SDK doesn’t support.

If you’re building agents in 2026, the Claude Agent SDK is the right foundation. Start with a simple prototype. Validate that the basic loop works. Then harden it for production—add error handling, logging, security, and monitoring. Deploy carefully, monitor closely, and iterate based on real-world usage.

For teams looking for expert guidance on building production agents, or for help with AI & Agents Automation at scale, PADISO provides fractional CTO leadership and co-build support. We’ve shipped dozens of agents to production and we know the path forward. If you’re ready to move from prototype to production, let’s talk.

Want to talk through your situation?

Book a 30-minute call with Kevin (Founder/CEO). No pitch - direct advice on what to do next.

Book a 30-min call

The Claude Agent SDK: Building Production AI Agents in 2026

The Claude Agent SDK: Building Production AI Agents in 2026

Table of Contents

Why the Claude Agent SDK Matters in 2026

Understanding the Architecture

The Claude Agent SDK Design Philosophy

The Four-Layer Architecture

The Role of Claude Code

Core Agentic Loop Primitives

The Basic Loop: Observe → Decide → Act → Learn

Primitive 1: Tool Definitions and Binding

Primitive 2: Iteration and Termination

Primitive 3: Error Handling and Recovery

Primitive 4: Context and State Management

Primitive 5: Tool Result Interpretation

SDK vs Custom Orchestration: When to Choose Each

When to Use the Claude Agent SDK

When to Build Custom Orchestration

The Hybrid Approach: SDK + Custom Layers

Getting Started with the Claude Agent SDK

Installation and Prerequisites

Setting Up Your First Agent

Configuration and Customisation

Understanding the Response Object

Building Your First Production Agent

A Realistic Example: Customer Support Automation

Adding Observability

Advanced Patterns and Real-World Implementation

Pattern 1: Multi-Step Workflows

Pattern 2: Conditional Logic and Branching

Pattern 3: Integration with Claude Code

Security, Compliance, and Deployment

Security Considerations

Compliance and Audit Readiness

Deployment Patterns

Performance Optimisation and Scaling

Token Efficiency

Latency Optimisation

Scaling to Many Agents

Next Steps: From Prototype to Production

The Path to Production

Building a Team Around Agents

Monitoring and Observability

Continuous Improvement

Conclusion

Want to talk through your situation?