The Claude Agent SDK: Building Production AI Agents in 2026
Deep dive into Claude Agent SDK architecture, agentic loop primitives, and when to use it vs custom orchestration for production AI agents.
The Claude Agent SDK: Building Production AI Agents in 2026
Table of Contents
- Why the Claude Agent SDK Matters in 2026
- Understanding the Architecture
- Core Agentic Loop Primitives
- SDK vs Custom Orchestration: When to Choose Each
- Getting Started with the Claude Agent SDK
- Building Your First Production Agent
- Advanced Patterns and Real-World Implementation
- Security, Compliance, and Deployment
- Performance Optimisation and Scaling
- Next Steps: From Prototype to Production
Why the Claude Agent SDK Matters in 2026
In 2026, the distinction between traditional software and agentic software has become operational reality, not theoretical debate. The Claude Agent SDK represents a fundamental shift in how teams build autonomous systems—moving from hand-coded orchestration loops to purpose-built primitives designed by Anthropic specifically for production workloads.
The market context is clear. Enterprise teams are no longer experimenting with whether agentic AI works; they’re racing to deploy it at scale. The agentic AI vs traditional automation conversation has matured beyond proof-of-concepts. Real companies are replacing legacy RPA platforms, manual workflows, and brittle rule-based systems with autonomous agents that can reason, adapt, and self-correct.
What makes the Claude Agent SDK different from rolling your own orchestration? Three things: speed to production, reliability under uncertainty, and native integration with Claude Code—Anthropic’s agentic coding capability that lets agents reason about and modify codebases in real time.
For Sydney-based founders and operators modernising their tech stack, this matters acutely. Whether you’re a seed-stage startup needing fractional CTO leadership to ship your first AI product, or a mid-market company automating 50+ workflows, the Claude Agent SDK eliminates months of custom plumbing. You focus on business logic. The SDK handles the hard parts: tool calling, context management, error recovery, and state tracking.
This guide walks you through the architecture, explains when to use the SDK versus building custom orchestration, and shows you exactly how to ship production agents that actually work.
Understanding the Architecture
The Claude Agent SDK Design Philosophy
Anthropics’ Claude Agent SDK is not a wrapper around Claude’s API. It’s a purpose-built framework for agentic loops, with three core design principles:
First: Minimal Abstraction Over Maximum Control. The SDK provides opinionated defaults for the 80% of agentic patterns that are common across use cases—tool calling, iteration, state management, error handling. But it doesn’t force you into a box. You can extend, override, and customise every layer. This is critical for production work where edge cases matter.
Second: Native Integration with Claude Code. Unlike generic LLM orchestration frameworks, the Claude Agent SDK is built with Claude Code as a first-class capability. This means agents can reason about your codebase, propose changes, and execute them—all within a single agentic loop. This is transformative for engineering workflows, platform modernisation, and complex automation tasks.
Third: Observability and Auditability from Day One. The SDK is designed for regulated environments. Every agent decision, tool invocation, and state transition is logged and traceable. This matters for SOC 2 compliance and audit readiness—critical requirements for enterprise deployments.
The Agent SDK Foundations course explains this well: the SDK is a programmatic alternative to CLI-based approaches, giving you fine-grained control over agent behaviour while maintaining simplicity in the common case.
The Four-Layer Architecture
The Claude Agent SDK operates across four distinct layers:
Layer 1: The LLM Core. This is Claude 3.5 Sonnet (or newer), tuned specifically for agentic reasoning. The model is optimised for multi-step reasoning, tool use, and self-correction—not just raw instruction-following. The difference matters: agentic Claude has been trained to think in loops, to recognise when it needs to call a tool, to evaluate whether the result was what it expected, and to adapt if not.
Layer 2: Tool Binding and Execution. The SDK provides a declarative interface for defining tools—functions the agent can call. You define the tool’s signature (inputs, outputs, description), and the SDK handles everything else: presenting it to Claude, parsing Claude’s tool-use requests, executing the tool safely, and feeding results back into the agentic loop. This is where the SDK saves you months of work. Tool calling in production is not trivial—you need timeout handling, retry logic, error recovery, and context management. The SDK does this out of the box.
Layer 3: State and Context Management. Agents need memory. The SDK provides structured state management: conversation history, tool results, intermediate reasoning steps, and custom state variables. This layer handles the complexity of maintaining context across dozens or hundreds of iterations while keeping tokens and latency under control. It’s not magic, but it’s non-trivial to get right.
Layer 4: Orchestration and Control Flow. This is where the agentic loop lives. The SDK implements the core loop: send the current state to Claude, receive a response (which may include tool calls), execute tools, update state, and iterate until the agent reaches a terminal condition (success, failure, or max iterations). You can hook into every step of this loop—run custom validation, inject context, modify decisions—but the happy path is simple and fast.
The Role of Claude Code
Claude Code is the secret weapon. It’s Anthropic’s agentic coding tool—an agent that can read files, understand codebases, propose code changes, and execute them. When you integrate Claude Code with the Claude Agent SDK, you get agents that can actually modify your systems.
This is not theoretical. Real teams are using this to:
- Automate refactoring across large codebases (50,000+ lines, 200+ files)
- Generate and execute database migrations with validation
- Build and deploy infrastructure changes
- Debug production issues by reading logs, tracing code, and proposing fixes
The Claude Code overview shows how this integrates into the SDK. Claude Code is a tool that agents can call, just like any other—but it’s a tool that understands software and can reason about it.
Core Agentic Loop Primitives
The Basic Loop: Observe → Decide → Act → Learn
Every agent, whether built with the Claude Agent SDK or custom orchestration, implements this loop:
- Observe: Gather the current state (user input, tool results, context, constraints).
- Decide: Send state to the LLM; receive a decision (which tool to call, or a final response).
- Act: Execute the decision (call the tool, or return the response).
- Learn: Update state based on the outcome; prepare for the next iteration.
The Claude Agent SDK provides primitives for each step. But understanding the primitives separately is crucial—it’s the difference between using a framework and understanding what the framework does.
Primitive 1: Tool Definitions and Binding
Tools are the agent’s interface to the world. A tool is a function the agent can call. When you define a tool, you specify:
- Name: How the agent refers to it (e.g.,
execute_sql_query,send_email,read_file). - Description: What the tool does, in plain language. Claude uses this to decide when to call the tool.
- Input Schema: What parameters the tool accepts (JSON Schema format).
- Implementation: The actual code that runs when the agent calls the tool.
The SDK handles the binding: it presents the tool definitions to Claude, parses Claude’s tool-use requests, validates inputs against the schema, executes the tool, and returns results. This is where you save significant engineering effort. In custom orchestration, you’re writing all of this yourself.
Example: if you’re building an agent to automate customer support, you might define tools like:
lookup_customer_by_email: Queries the customer databaseget_support_tickets: Retrieves open tickets for a customerupdate_ticket_status: Closes or escalates a ticketsend_email_to_customer: Sends a response email
Each tool is a Python function (or JavaScript, or Rust—the SDK supports multiple languages). The SDK wraps it, presents it to Claude, and handles the plumbing.
Primitive 2: Iteration and Termination
Agents don’t usually solve problems in one step. They iterate. They call a tool, examine the result, call another tool, reason about the combined results, and so on. The SDK manages this iteration with built-in termination conditions:
- Max Iterations: Stop after N steps (e.g., 10 iterations) to prevent runaway loops.
- Tool Success: Stop if the agent explicitly signals success (e.g., by calling a
finishtool). - No More Tools: Stop if Claude decides no more tools are needed and returns a final response.
- Error Threshold: Stop if too many tools fail in succession.
You can define custom termination conditions. For example, if you’re building an agent to manage infrastructure, you might terminate when the agent has successfully deployed the application and run health checks. The SDK gives you hooks to implement this logic.
Primitive 3: Error Handling and Recovery
Production agents fail. Tools timeout, APIs return errors, data is missing, Claude makes mistakes. The SDK provides structured error handling:
- Tool Execution Errors: If a tool fails (exception, timeout, invalid input), the SDK catches the error, formats it, and feeds it back to Claude as part of the next iteration. Claude can then decide to retry, try a different tool, or escalate.
- Validation Errors: If Claude’s tool request violates the schema (wrong parameter type, missing required field), the SDK rejects it and asks Claude to correct it.
- Context Errors: If the agent’s state becomes inconsistent (e.g., a required variable is missing), the SDK can halt or inject a recovery action.
This is crucial for reliability. In custom orchestration, you’re writing all of this error handling yourself, and it’s easy to miss edge cases. The SDK makes error recovery a first-class concern.
Primitive 4: Context and State Management
Agents need memory. The SDK provides:
- Conversation History: The full back-and-forth between the agent and Claude, including tool calls and results.
- Tool Result Caching: Recent tool results are cached to avoid redundant API calls.
- Custom State Variables: You can store arbitrary state (e.g.,
user_id,workflow_status,decision_log) that persists across iterations. - Context Injection: You can inject context at any point (e.g., “the user is a premium customer” or “the API is currently degraded”).
The SDK is smart about context. It doesn’t send the entire conversation history to Claude every time—that would be expensive and slow. Instead, it uses intelligent summarisation and truncation to keep the context window manageable while preserving the information Claude needs to make good decisions.
Primitive 5: Tool Result Interpretation
When a tool returns a result, the agent needs to understand it. The SDK provides structured interpretation:
- Success vs Failure: Tools return a status (success or failure) along with the result.
- Structured Results: Tools return data in a format Claude can reason about (JSON, not free-form text).
- Result Validation: The SDK can validate that tool results match expected schemas, catching bugs early.
This matters for reliability. If a tool returns unexpected data, the SDK can catch it and either retry the tool or escalate to human intervention.
SDK vs Custom Orchestration: When to Choose Each
When to Use the Claude Agent SDK
Use the SDK if:
1. You need to ship fast. The SDK eliminates weeks of orchestration plumbing. If you’re a seed-stage startup building your first AI product, or a mid-market company modernising operations with agentic AI automation, the time savings are massive. Real numbers: teams report 4–6 weeks faster to production compared to custom orchestration.
2. Your agent needs to be reliable and auditable. The SDK is built for production. It has built-in error handling, logging, and observability. If you’re building agents for regulated industries (finance, healthcare, legal), or if you need to pass compliance audits, the SDK’s audit trail is invaluable. Every decision, every tool call, every error is logged and traceable.
3. You’re building agents that interact with code. Claude Code is a first-class citizen in the SDK. If your agent needs to read files, understand codebases, propose changes, or execute code, the SDK makes this seamless. This is transformative for engineering automation, platform modernisation, and infrastructure management.
4. You want to avoid common pitfalls. Custom orchestration is full of subtle bugs: context window overflow, infinite loops, tool timeouts, inconsistent state, token waste. The SDK is designed to avoid these. It’s been battle-tested by Anthropic and used in production by hundreds of teams.
5. You’re scaling from one agent to many. The SDK is designed for scaling. Once you’ve built one agent, building the second is much faster. The patterns are consistent, the infrastructure is shared, and you can reuse tools across agents.
When to Build Custom Orchestration
Build custom orchestration if:
1. Your agent needs non-standard control flow. If your agent’s decision logic doesn’t fit the standard observe-decide-act-learn loop, custom orchestration might be simpler. For example, if you’re building a multi-agent system where agents coordinate with each other, or if you need agents to run in parallel with complex synchronisation, the SDK might feel constraining.
2. You need deep integration with legacy systems. If your agent needs to integrate tightly with legacy systems that have non-standard APIs or protocols, custom orchestration gives you more flexibility. You can build the integration layer exactly as you need it.
3. You’re optimising for extreme latency or cost. The SDK adds a small overhead—it’s not much, but it’s there. If you’re building agents that need sub-100ms latency or you’re operating at massive scale (millions of agent runs per day), custom orchestration might be more efficient. That said, the SDK is quite lean, and this is rarely the bottleneck.
4. You’re experimenting with novel agent architectures. If you’re doing research or building something that doesn’t fit standard agentic patterns, custom orchestration gives you the freedom to experiment. Once you’ve validated the architecture, you can often move to the SDK.
The Hybrid Approach: SDK + Custom Layers
In practice, many teams use a hybrid approach. They use the Claude Agent SDK for the core agentic loop, but layer custom code on top for domain-specific logic.
For example, a team building a customer support agent might:
- Use the SDK for the core agent loop (observe → decide → act → learn).
- Build custom middleware to inject company-specific context (customer tier, account status, recent interactions).
- Implement custom tools that wrap legacy CRM and ticketing systems.
- Add a custom decision layer that checks company policies before the agent sends responses.
This gives you the best of both worlds: the SDK’s reliability and speed, plus the flexibility to handle domain-specific requirements.
At PADISO, we’ve found that this hybrid approach works well for enterprise teams. The SDK handles the heavy lifting, and custom layers handle the specifics. This is where AI & Agents Automation expertise comes in—knowing where to use the SDK and where to build custom logic.
Getting Started with the Claude Agent SDK
Installation and Prerequisites
The Claude Agent SDK is available in multiple languages. The official GitHub repository has the Python version, but there’s also JavaScript (via npm), Rust (via Docs.rs), and other languages.
For Python, installation is straightforward:
pip install anthropic-agent-sdk
You’ll need:
- An Anthropic API key (get one at console.anthropic.com)
- Python 3.10 or later
- Basic familiarity with async/await (the SDK is async-first)
For JavaScript, the npm package is:
npm install @anthropic-ai/claude-agent-sdk
For Rust, add to your Cargo.toml:
[dependencies]
claude-agent-sdk = "0.1"
Setting Up Your First Agent
The simplest agent is just a few lines of code. Here’s a Python example:
from anthropic_agent_sdk import Agent
agent = Agent(
name="HelloAgent",
model="claude-3-5-sonnet-20241022",
instructions="You are a helpful assistant.",
)
response = agent.run("What is 2 + 2?")
print(response)
This creates an agent, gives it instructions, and runs it with a simple query. The agent processes the query and returns a response.
But this is the trivial case. Real agents need tools. Here’s an agent with a tool:
from anthropic_agent_sdk import Agent, Tool
import json
def get_weather(location: str) -> dict:
"""Get the current weather for a location."""
# In reality, this would call a weather API
return {"location": location, "temperature": 22, "condition": "sunny"}
agent = Agent(
name="WeatherAgent",
model="claude-3-5-sonnet-20241022",
instructions="You are a helpful weather assistant. Use the get_weather tool to answer questions about weather.",
tools=[
Tool(
name="get_weather",
description="Get the current weather for a location",
function=get_weather,
)
],
)
response = agent.run("What's the weather in Sydney?")
print(response)
Now the agent has a tool. When you ask it about the weather, it will:
- Recognise that it needs weather information
- Call the
get_weathertool with “Sydney” as the location - Receive the result
- Formulate a response based on the result
This is where the SDK shines. You define the tool function, and the SDK handles everything else: presenting it to Claude, parsing the tool call, executing the function, and feeding the result back.
Configuration and Customisation
The Agent class accepts many configuration options:
name: The agent’s name (used in logs and traces)model: Which Claude model to use (3.5 Sonnet is recommended for agents)instructions: The system prompt that guides the agent’s behaviourtools: List of tools the agent can usemax_iterations: Maximum number of agentic loop iterations (default: 10)timeout: How long to wait for each iteration (default: 60 seconds)temperature: Sampling temperature (0.0 for deterministic, 1.0 for creative)context_window: How much history to keep in context
You can also provide custom hooks:
on_iteration: Called after each iteration (useful for logging or custom logic)on_tool_call: Called before each tool execution (useful for validation or rate limiting)on_error: Called when an error occurs (useful for custom error handling)
These hooks are where you layer custom logic on top of the SDK.
Understanding the Response Object
When you call agent.run(), you get back a response object that contains:
message: The final response textiterations: How many iterations the agent ran (useful for understanding complexity)tool_calls: A list of all tools the agent called, in orderstate: The final state of the agent (custom variables, context, etc.)tokens_used: How many tokens were consumed (useful for cost tracking)
This is valuable for observability. You can log these metrics, track agent performance, and debug issues.
Building Your First Production Agent
A Realistic Example: Customer Support Automation
Let’s build a customer support agent that can:
- Look up customers by email
- Retrieve their support tickets
- Answer common questions
- Escalate complex issues
Here’s the full implementation:
from anthropic_agent_sdk import Agent, Tool
import json
from datetime import datetime
# Mock database
CUSTOMERS = {
"alice@example.com": {"id": "cust_001", "name": "Alice", "tier": "premium"},
"bob@example.com": {"id": "cust_002", "name": "Bob", "tier": "standard"},
}
TICKETS = {
"cust_001": [
{"id": "tick_001", "status": "open", "subject": "Billing issue", "created": "2025-01-15"},
{"id": "tick_002", "status": "closed", "subject": "Feature request", "created": "2025-01-10"},
],
"cust_002": [
{"id": "tick_003", "status": "open", "subject": "Login problem", "created": "2025-01-18"},
],
}
# Tool definitions
def lookup_customer(email: str) -> dict:
"""Look up a customer by email address."""
if email in CUSTOMERS:
return {"success": True, "customer": CUSTOMERS[email]}
return {"success": False, "error": f"Customer {email} not found"}
def get_tickets(customer_id: str) -> dict:
"""Get support tickets for a customer."""
tickets = TICKETS.get(customer_id, [])
return {"success": True, "tickets": tickets}
def update_ticket(ticket_id: str, status: str) -> dict:
"""Update a ticket's status."""
# In reality, this would update a database
return {"success": True, "message": f"Ticket {ticket_id} updated to {status}"}
def escalate_to_human(reason: str) -> dict:
"""Escalate an issue to a human agent."""
return {
"success": True,
"message": f"Issue escalated to human agent. Reason: {reason}",
"ticket_id": f"escalated_{datetime.now().timestamp()}"
}
# Create the agent
agent = Agent(
name="CustomerSupportAgent",
model="claude-3-5-sonnet-20241022",
instructions="""You are a customer support agent. Your job is to help customers with their issues.
1. Always start by looking up the customer by email
2. Retrieve their open tickets
3. Try to answer their question or resolve their issue
4. If the issue is complex or requires human judgment, escalate to a human agent
5. Be polite, professional, and empathetic
You have access to customer lookup, ticket retrieval, ticket updates, and escalation tools.""",
tools=[
Tool(
name="lookup_customer",
description="Look up a customer by email address",
function=lookup_customer,
),
Tool(
name="get_tickets",
description="Get all support tickets for a customer",
function=get_tickets,
),
Tool(
name="update_ticket",
description="Update a ticket's status (e.g., 'resolved', 'closed')",
function=update_ticket,
),
Tool(
name="escalate_to_human",
description="Escalate an issue to a human agent",
function=escalate_to_human,
),
],
max_iterations=5,
)
# Run the agent
if __name__ == "__main__":
# Example: customer inquiry
response = agent.run(
"Hi, I'm alice@example.com. I have a billing issue that needs to be resolved."
)
print(f"Agent response: {response.message}")
print(f"Iterations: {response.iterations}")
print(f"Tools called: {[call['tool'] for call in response.tool_calls]}")
print(f"Tokens used: {response.tokens_used}")
Walk through what happens:
- The agent receives the customer’s message
- It recognises that it needs to look up the customer, so it calls
lookup_customerwith “alice@example.com” - It receives the customer object (Alice, premium tier)
- It calls
get_ticketsto see what issues are open - It sees that Alice has a billing issue (ticket_001)
- It reasons about the issue and either resolves it or escalates it
- It returns a response to the customer
This is a realistic agent. It handles multiple tools, maintains state across iterations, and can escalate when needed. And it’s only about 100 lines of code—most of which is tool definitions.
Adding Observability
For production, you want to see what’s happening inside the agent. The SDK provides hooks:
def log_iteration(iteration_data):
print(f"Iteration {iteration_data['number']}: {iteration_data['status']}")
if iteration_data['tool_calls']:
for call in iteration_data['tool_calls']:
print(f" → Called {call['tool']} with {call['input']}")
agent = Agent(
name="CustomerSupportAgent",
# ... other config ...
on_iteration=log_iteration,
)
Now every iteration is logged. You can see exactly what tools the agent called, in what order, and why.
Advanced Patterns and Real-World Implementation
Pattern 1: Multi-Step Workflows
Many real-world problems require multi-step workflows. For example, processing a refund request requires:
- Verify the customer’s identity
- Look up the original order
- Check if the refund is eligible
- Process the refund
- Send a confirmation email
The SDK handles this naturally. Each step is a tool call. The agent iterates through them, and the SDK manages the state and context.
def verify_identity(email: str, verification_code: str) -> dict:
"""Verify customer identity with a code sent to their email."""
# In reality, this would check against a database
return {"success": True, "verified": True}
def lookup_order(customer_id: str, order_id: str) -> dict:
"""Look up an order."""
return {
"success": True,
"order": {
"id": order_id,
"amount": 99.99,
"date": "2025-01-01",
"status": "delivered"
}
}
def check_refund_eligibility(order_id: str) -> dict:
"""Check if an order is eligible for refund."""
# 30-day refund window
return {"success": True, "eligible": True, "reason": "Within 30-day window"}
def process_refund(order_id: str, amount: float) -> dict:
"""Process a refund."""
return {"success": True, "refund_id": f"ref_{order_id}", "amount": amount}
def send_confirmation_email(email: str, refund_id: str) -> dict:
"""Send a confirmation email."""
return {"success": True, "message": f"Email sent to {email}"}
# Agent with all these tools
agent = Agent(
name="RefundAgent",
model="claude-3-5-sonnet-20241022",
instructions="""You are a refund processing agent. Follow these steps:
1. Verify the customer's identity
2. Look up their order
3. Check if they're eligible for a refund
4. If eligible, process the refund
5. Send a confirmation email
Be thorough and follow company policy.""",
tools=[
Tool(name="verify_identity", description="Verify customer identity", function=verify_identity),
Tool(name="lookup_order", description="Look up an order", function=lookup_order),
Tool(name="check_refund_eligibility", description="Check refund eligibility", function=check_refund_eligibility),
Tool(name="process_refund", description="Process a refund", function=process_refund),
Tool(name="send_confirmation_email", description="Send confirmation email", function=send_confirmation_email),
],
max_iterations=10,
)
The agent will work through these steps in order, handling errors and edge cases as they arise.
Pattern 2: Conditional Logic and Branching
Some workflows have conditional branches. For example, a support agent might:
- If the customer is premium tier: resolve immediately
- If the customer is standard tier: offer a discount
- If the issue is complex: escalate to a specialist
You implement this by giving the agent tools that represent each branch, and letting Claude decide which branch to take based on the context.
def resolve_immediately(ticket_id: str) -> dict:
"""For premium customers, resolve immediately."""
return {"success": True, "action": "resolved", "ticket_id": ticket_id}
def offer_discount(customer_id: str, discount_percent: int) -> dict:
"""For standard customers, offer a discount."""
return {"success": True, "action": "discount_offered", "discount_percent": discount_percent}
def escalate_to_specialist(ticket_id: str, reason: str) -> dict:
"""For complex issues, escalate to a specialist."""
return {"success": True, "action": "escalated", "specialist_assigned": True}
# Agent with branching logic
agent = Agent(
name="SmartSupportAgent",
model="claude-3-5-sonnet-20241022",
instructions="""You are a support agent. Based on the customer's tier and issue complexity:
- Premium customers: resolve immediately
- Standard customers: offer a discount
- Complex issues: escalate to a specialist
Use the appropriate tool based on the situation.""",
tools=[
Tool(name="resolve_immediately", description="...", function=resolve_immediately),
Tool(name="offer_discount", description="...", function=offer_discount),
Tool(name="escalate_to_specialist", description="...", function=escalate_to_specialist),
],
)
Claude will automatically choose the right tool based on the customer’s tier and the issue complexity.
Pattern 3: Integration with Claude Code
This is where things get powerful. If your agent needs to interact with code—read files, understand codebases, propose changes—you can use Claude Code.
The Claude Code overview explains the basics, but here’s how it integrates with the SDK:
Claude Code is a tool. You give the agent access to it, and it can call it like any other tool. The agent can:
- Read files from your codebase
- Understand the code structure
- Propose changes
- Execute changes
Example: an agent that automates database migrations
def read_schema_file(path: str) -> dict:
"""Read a database schema file."""
# Returns the schema
return {"success": True, "content": "...schema content..."}
def propose_migration(schema_changes: str) -> dict:
"""Propose a database migration based on schema changes."""
# Claude Code would be called here to generate the migration
return {"success": True, "migration": "...migration SQL..."}
def execute_migration(migration_sql: str) -> dict:
"""Execute a migration against the database."""
return {"success": True, "message": "Migration executed successfully"}
agent = Agent(
name="MigrationAgent",
model="claude-3-5-sonnet-20241022",
instructions="""You are a database migration agent. When asked to migrate a schema:
1. Read the current schema
2. Propose a migration
3. Execute it
Be careful to preserve data and follow best practices.""",
tools=[
Tool(name="read_schema_file", description="...", function=read_schema_file),
Tool(name="propose_migration", description="...", function=propose_migration),
Tool(name="execute_migration", description="...", function=execute_migration),
],
)
This is where the Claude Agent SDK becomes genuinely transformative. Agents that can understand and modify code are agents that can automate engineering work.
Security, Compliance, and Deployment
Security Considerations
Agents are powerful, which means they can be dangerous. Key security considerations:
1. Tool Permissions. Not every agent should have access to every tool. A customer support agent shouldn’t have access to payment processing or user deletion. Implement role-based access control (RBAC) for tools.
def get_tools_for_role(role: str) -> list:
"""Return tools appropriate for a given role."""
if role == "support":
return [lookup_customer, get_tickets, update_ticket]
elif role == "admin":
return [lookup_customer, get_tickets, update_ticket, delete_user, process_refund]
else:
return []
agent = Agent(
name="RoleBasedAgent",
model="claude-3-5-sonnet-20241022",
instructions="...",
tools=get_tools_for_role(current_user_role),
)
2. Input Validation. Always validate tool inputs. Don’t trust Claude to provide valid inputs—validate them server-side.
def delete_user(user_id: str) -> dict:
"""Delete a user (with validation)."""
# Validate user_id format
if not user_id.startswith("user_"):
return {"success": False, "error": "Invalid user ID format"}
# Check if user exists
if user_id not in USERS:
return {"success": False, "error": "User not found"}
# Proceed with deletion
del USERS[user_id]
return {"success": True, "message": f"User {user_id} deleted"}
3. Audit Logging. Every agent decision, every tool call, every error should be logged. This is non-negotiable for regulated environments.
import logging
logger = logging.getLogger(__name__)
def audit_log(event_type: str, details: dict):
"""Log an event for audit purposes."""
logger.info(f"AUDIT: {event_type} - {details}")
def on_tool_call(tool_name: str, input_data: dict, user_id: str):
"""Hook called before each tool execution."""
audit_log("TOOL_CALL", {
"tool": tool_name,
"user_id": user_id,
"timestamp": datetime.now().isoformat(),
})
agent = Agent(
name="AuditedAgent",
model="claude-3-5-sonnet-20241022",
instructions="...",
tools=[...],
on_tool_call=on_tool_call,
)
4. Rate Limiting. Prevent abuse by rate-limiting agent runs and tool calls.
from collections import defaultdict
from datetime import datetime, timedelta
rate_limits = defaultdict(list)
def check_rate_limit(user_id: str, max_calls_per_minute: int = 10) -> bool:
"""Check if a user has exceeded their rate limit."""
now = datetime.now()
minute_ago = now - timedelta(minutes=1)
# Remove old calls
rate_limits[user_id] = [call_time for call_time in rate_limits[user_id] if call_time > minute_ago]
# Check limit
if len(rate_limits[user_id]) >= max_calls_per_minute:
return False
# Record this call
rate_limits[user_id].append(now)
return True
def run_agent_with_rate_limit(user_id: str, query: str):
if not check_rate_limit(user_id):
return {"error": "Rate limit exceeded. Try again in a minute."}
return agent.run(query)
Compliance and Audit Readiness
For regulated environments (finance, healthcare, legal), compliance is critical. The Claude Agent SDK is designed with compliance in mind.
SOC 2 Readiness: The SDK provides:
- Complete audit trails (every decision logged)
- Error handling and recovery (no silent failures)
- Access controls (role-based tool permissions)
- Encryption in transit (via HTTPS)
- Regular security updates (Anthropic maintains the SDK)
For SOC 2 compliance, you’ll also need to implement:
- Data retention policies
- Incident response procedures
- Regular security audits
- Access logging and monitoring
ISO 27001 Readiness: Similar requirements, with additional focus on:
- Information security policies
- Risk assessment and management
- Supplier management (Anthropic’s security practices)
- Incident management procedures
If you’re pursuing ISO 27001 compliance, the SDK makes this easier—it’s already designed with security and auditability in mind. But you still need to implement the organisational practices around it.
Deployment Patterns
Pattern 1: Synchronous API. The simplest pattern—expose the agent as an API endpoint.
from fastapi import FastAPI
app = FastAPI()
@app.post("/agent/run")
async def run_agent(query: str, user_id: str):
if not check_rate_limit(user_id):
return {"error": "Rate limit exceeded"}
response = agent.run(query)
audit_log("AGENT_RUN", {"user_id": user_id, "query": query, "iterations": response.iterations})
return {"response": response.message, "iterations": response.iterations}
Pattern 2: Asynchronous Job Queue. For long-running agents, use a job queue.
from celery import Celery
celery_app = Celery()
@celery_app.task
def run_agent_async(query: str, user_id: str, callback_url: str):
response = agent.run(query)
# Send result to callback URL
import requests
requests.post(callback_url, json={"response": response.message, "user_id": user_id})
Pattern 3: Scheduled Agents. For recurring tasks, schedule agents to run at specific times.
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
@scheduler.scheduled_job('cron', hour=2, minute=0) # Run daily at 2 AM
def daily_cleanup_agent():
response = agent.run("Clean up old support tickets")
audit_log("SCHEDULED_AGENT", {"task": "daily_cleanup", "iterations": response.iterations})
scheduler.start()
Performance Optimisation and Scaling
Token Efficiency
Tokens are money. Each API call to Claude costs tokens. Optimise:
1. Context Window Management. The SDK intelligently manages context, but you can help:
agent = Agent(
name="EfficientAgent",
model="claude-3-5-sonnet-20241022",
instructions="...",
tools=[...],
context_window=2000, # Limit context to 2000 tokens
)
Limiting context forces the agent to focus on the most recent and relevant information. This saves tokens and often improves decision-making.
2. Tool Result Caching. The SDK caches recent tool results. If the agent calls the same tool twice with the same inputs, it uses the cached result. This saves tokens and improves latency.
3. Summarisation. For long conversations, summarise old context to save tokens.
def summarise_old_context(history: list) -> str:
"""Summarise old conversation history."""
# Use Claude to summarise
summary_prompt = f"Summarise this conversation in 1-2 sentences: {history}"
summary = claude.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=200,
messages=[{"role": "user", "content": summary_prompt}]
)
return summary.content[0].text
Latency Optimisation
Agents need to be fast. Typical latency breakdown:
- Network latency to Anthropic API: 100–300ms
- Claude processing: 500–2000ms (depends on complexity)
- Tool execution: Variable (could be 100ms to 10s)
- SDK overhead: 10–50ms
Optimise by:
1. Parallel Tool Calls. If the agent needs to call multiple tools, call them in parallel.
import asyncio
async def parallel_tool_calls(agent: Agent):
# Instead of calling tools sequentially, call them in parallel
task1 = asyncio.create_task(lookup_customer("alice@example.com"))
task2 = asyncio.create_task(get_tickets("cust_001"))
results = await asyncio.gather(task1, task2)
return results
2. Tool Caching. Cache tool results aggressively.
from functools import lru_cache
@lru_cache(maxsize=1000)
def lookup_customer(email: str) -> dict:
# This function's results are cached
return CUSTOMERS.get(email)
3. Model Selection. Claude 3.5 Sonnet is the recommended model for agents—it’s fast and capable. Avoid larger models unless necessary.
Scaling to Many Agents
Once you’ve built one agent, scaling to many is about infrastructure:
1. Agent Registry. Keep track of all agents and their configurations.
agents = {
"support": Agent(name="SupportAgent", ...),
"billing": Agent(name="BillingAgent", ...),
"technical": Agent(name="TechnicalAgent", ...),
}
def get_agent(agent_name: str) -> Agent:
return agents.get(agent_name)
2. Shared Tool Library. Build a library of reusable tools that multiple agents can use.
common_tools = [
Tool(name="lookup_customer", description="...", function=lookup_customer),
Tool(name="get_tickets", description="...", function=get_tickets),
Tool(name="send_email", description="...", function=send_email),
]
# Each agent can use these tools
agent1 = Agent(name="Agent1", tools=common_tools + [...])
agent2 = Agent(name="Agent2", tools=common_tools + [...])
3. Load Balancing. Distribute agent runs across multiple instances.
from load_balancer import round_robin
agent_instances = [Agent(...) for _ in range(10)]
def run_agent_load_balanced(query: str):
agent = round_robin(agent_instances)
return agent.run(query)
Next Steps: From Prototype to Production
The Path to Production
Building a prototype agent is quick—days or weeks. Shipping to production is more involved. Here’s the path:
Week 1–2: Prototype. Build a simple agent with 3–5 tools. Test it manually. Validate that the basic loop works.
Week 3–4: Hardening. Add error handling, logging, and validation. Test edge cases. Set up monitoring.
Week 5–6: Integration. Integrate with your production systems. Set up databases, APIs, and external services. Conduct security review.
Week 7–8: Testing. Run load tests, stress tests, and security tests. Fix bugs. Optimise performance.
Week 9+: Deployment. Deploy to production. Monitor closely. Iterate based on real-world usage.
This is a realistic timeline for a single agent. Scaling to multiple agents takes longer, but the patterns are consistent.
Building a Team Around Agents
Agents aren’t a solo effort. You need:
1. An AI Engineer. Someone who understands the Claude Agent SDK, can build agents, and can optimise them for production. This is a specialist role.
2. A Platform Engineer. Someone who can build the infrastructure around agents—APIs, databases, monitoring, deployment pipelines.
3. A Domain Expert. Someone who understands the business logic—what the agent should do, what edge cases matter, what compliance requirements apply.
At PADISO, we provide fractional CTO leadership to help teams build and scale agent systems. We’ve worked with dozens of teams shipping agents to production, and we know the common pitfalls and how to avoid them.
If you’re building agents at scale, or if you’re uncertain about the path forward, talk to us. We can help you move from prototype to production quickly and reliably.
Monitoring and Observability
Once your agent is in production, you need to see what’s happening. Key metrics:
1. Success Rate. What percentage of agent runs complete successfully?
2. Iteration Count. How many iterations does each run take on average? More iterations = more tokens = higher cost.
3. Tool Call Distribution. Which tools are called most often? Which tools fail most often?
4. Latency. How long does each run take? Are there bottlenecks?
5. Error Rate. What percentage of tool calls fail? What are the most common errors?
6. Token Usage. How many tokens does each run consume? Is this in line with your cost budget?
Set up dashboards to track these metrics. Use them to identify bottlenecks and optimisation opportunities.
import time
from datetime import datetime
def log_metrics(agent_name: str, response: object):
metrics = {
"agent": agent_name,
"timestamp": datetime.now().isoformat(),
"success": response.success,
"iterations": response.iterations,
"tokens_used": response.tokens_used,
"latency_ms": response.latency * 1000,
"tool_calls": len(response.tool_calls),
}
# Send to monitoring system (e.g., Datadog, New Relic, CloudWatch)
monitoring_client.send_metrics(metrics)
Continuous Improvement
Agents aren’t static. As you learn more about your use cases, you’ll want to improve them:
1. A/B Testing. Test different agent configurations (different instructions, different tools, different models) and measure which performs better.
2. User Feedback. Collect feedback from users. Which agent responses are helpful? Which are unhelpful? Use this to improve instructions and tools.
3. Error Analysis. When agents fail, understand why. Was it a tool failure? A reasoning error? An edge case? Use these insights to improve.
4. Tool Optimisation. As you learn more about what agents need, optimise your tools. Add new tools, remove unused ones, improve tool descriptions.
Conclusion
The Claude Agent SDK is a transformative tool for building production AI agents. It eliminates months of custom orchestration work, provides built-in reliability and auditability, and integrates seamlessly with Claude Code for engineering automation.
The choice between the SDK and custom orchestration is clear for most teams: use the SDK unless you have a specific reason not to. The SDK is mature, well-documented, and battle-tested in production. Custom orchestration should only be your choice if you have unusual requirements that the SDK doesn’t support.
If you’re building agents in 2026, the Claude Agent SDK is the right foundation. Start with a simple prototype. Validate that the basic loop works. Then harden it for production—add error handling, logging, security, and monitoring. Deploy carefully, monitor closely, and iterate based on real-world usage.
For teams looking for expert guidance on building production agents, or for help with AI & Agents Automation at scale, PADISO provides fractional CTO leadership and co-build support. We’ve shipped dozens of agents to production and we know the path forward. If you’re ready to move from prototype to production, let’s talk.