Using Sonnet 4.5 for Legal Contract Review: Patterns and Pitfalls
Legal contract review is one of the highest-ROI use cases for large language models. A single contract review cycle can cost £3,000–£15,000 in legal fees. Multiply that across a portfolio company, a PE roll-up, or a fast-moving fintech, and the cost of contract review becomes a material line item.
Sonnet 4.5 changes the maths. It’s fast enough to handle real-time requests, capable enough to parse dense legal prose, and cost-effective enough to run at scale. But deploying it in production requires discipline.
This guide covers the patterns that work, the failure modes you’ll hit, and the hard-won lessons from teams who’ve shipped contract-review systems on Sonnet 4.5. We’ll walk through prompt design, output validation, cost optimisation, and the integration patterns that survive contact with real legal workflows.
Table of Contents
- Why Sonnet 4.5 for Legal Contract Review
- Understanding Sonnet 4.5’s Capabilities and Limits
- Prompt Design for Contract Analysis
- Building Reliable Output Validation
- Cost Optimisation and Token Management
- Common Failure Modes and How to Engineer Around Them
- Integration Patterns for Production Workflows
- Security, Compliance, and Legal Considerations
- Measuring Success and Iterating
- Next Steps: Building Your Contract Review System
Why Sonnet 4.5 for Legal Contract Review
Legal contract review is fundamentally a document-understanding problem. You need to:
- Extract key commercial terms (payment schedules, renewal dates, termination clauses)
- Identify risk flags (indemnification scope, liability caps, dispute resolution)
- Spot missing standard protections (IP assignment, confidentiality, limitation of liability)
- Summarise obligations and rights for non-lawyers
- Compare against a template or prior version
Traditional approaches—manual review, rule-based extraction, or older LLMs—fail at scale. Manual review is slow and expensive. Rule-based systems are brittle and miss context. Older models hallucinate or miss nuance in dense legal language.
Sonnet 4.5 bridges the gap. According to the official model announcement from Anthropic, Sonnet 4.5 offers significantly improved reasoning and instruction-following compared to its predecessor, with a 200K context window—enough to handle most contracts, schedules, and related documents in a single request. It’s also 3–5× faster than Claude 3 Opus, which matters when you’re processing hundreds of contracts monthly.
For teams at seed-to-Series-B startups looking to automate contract workflows, or for operators at mid-market and enterprise companies modernising their legal operations, Sonnet 4.5 is the first model where the cost-per-contract and speed-per-contract make legal automation economically viable.
The Business Case
If your team processes 50+ contracts per quarter, Sonnet 4.5 can save 10–20 hours per quarter in manual review. At £150/hour fully loaded, that’s £1,500–£3,000 per quarter. Over a year, you’re looking at £6,000–£12,000 in labour cost savings, plus the intangible benefit of faster deal cycles and fewer missed red flags.
For portfolio companies undergoing platform consolidation or modernisation, legal automation is part of the value-creation engineering story. You’re not just building a system—you’re demonstrating operational leverage to your investors.
Understanding Sonnet 4.5’s Capabilities and Limits
Before you build, you need to know what Sonnet 4.5 can and cannot do.
What Sonnet 4.5 Does Well
Document understanding at scale. Sonnet 4.5 can ingest a 50-page contract, schedules, exhibits, and prior versions in a single request. It understands cross-references, picks up on implicit obligations, and reasons about how one clause affects another.
Instruction-following. You can ask Sonnet 4.5 to output structured JSON, markdown tables, or free-form summaries. It respects output formatting instructions consistently—important for downstream processing.
Nuance and context. Unlike rule-based systems, Sonnet 4.5 understands that “the parties may terminate for convenience” is different from “either party may terminate for convenience without cause.” It picks up on tone, intent, and implied obligations.
Speed. According to the Anthropic documentation on models, Sonnet 4.5 processes tokens 3–5× faster than Opus. A 50-page contract typically completes in 5–15 seconds, not 30–60 seconds.
What Sonnet 4.5 Cannot Do
Provide legal advice. Sonnet 4.5 can extract and summarise terms, but it cannot tell you whether a contract is “good” or “bad” in a legal sense. That requires a qualified lawyer.
Guarantee accuracy. LLMs hallucinate. Sonnet 4.5 is better than earlier models, but it will occasionally misread a clause, miss a negation, or invent a term that doesn’t exist. Your validation layer must catch these errors.
Handle ambiguous or handwritten contracts. If a contract is OCR’d from a scan, contains handwritten annotations, or uses non-standard formatting, Sonnet 4.5’s accuracy drops. You need to preprocess or flag these cases.
Understand jurisdiction-specific nuance. Sonnet 4.5 was trained on English-language legal documents, but it may not understand the specific implications of Australian contract law, English common law, or US state-specific provisions. Use it as a first-pass filter, not as a substitute for jurisdiction-aware legal review.
Context Window and Token Budgets
Sonnet 4.5 has a 200K context window. That’s roughly:
- 50–100 pages of a typical contract (with exhibits)
- 5–10 prior versions for comparison
- A detailed instruction prompt (500–1,000 tokens)
- Structured output templates
You can fit most contract-review tasks in a single request. But if you’re processing a complex M&A agreement with 20+ exhibits, you may need to split the request or use a two-pass approach (first pass: extract key terms; second pass: deep dive on specific clauses).
Prompt Design for Contract Analysis
Your prompt is the difference between a system that works and one that fails at scale. Here’s how to structure it.
The Anatomy of a Production Contract-Review Prompt
A production prompt has five layers:
- Role and context. Tell Sonnet 4.5 what it is and why it exists.
- Task definition. Be explicit about what you want extracted or analysed.
- Output format. Specify JSON, markdown, or a custom structure.
- Constraints and guardrails. Tell it what to flag, what to skip, and how to handle edge cases.
- Examples. Show it what good output looks like.
Example Prompt Structure
You are a contract-analysis assistant. Your job is to extract key commercial
terms, identify risk flags, and summarise obligations from legal contracts.
You will receive a contract document. Extract the following in JSON format:
{
"contract_metadata": {
"parties": ["Party A", "Party B"],
"effective_date": "YYYY-MM-DD",
"term_years": 3,
"renewal_mechanism": "auto-renews unless terminated 90 days prior"
},
"commercial_terms": {
"payment_schedule": "...",
"price_adjustment": "...",
"termination_rights": "..."
},
"risk_flags": [
{"severity": "high", "flag": "...", "clause_reference": "Section X.Y"},
{"severity": "medium", "flag": "...", "clause_reference": "Section X.Y"}
],
"missing_protections": ["IP assignment", "limitation of liability"],
"summary": "In 2–3 sentences, summarise the contract."
}
Constraints:
- Only extract terms explicitly stated in the contract. Do not infer.
- Flag any ambiguous language or missing definitions.
- If a clause contradicts another, note both interpretations.
- If the contract is incomplete or corrupted, set the relevant field to null and explain in a note.
Example output:
[Provide a realistic example of a completed extraction]
This structure does several things:
- Anchors the model. It knows its role and what success looks like.
- Specifies output format. JSON is machine-readable and easy to validate downstream.
- Includes guardrails. “Only extract terms explicitly stated” prevents hallucination. “Flag ambiguous language” catches edge cases.
- Shows examples. Examples are worth 100 words of instruction.
Prompt Patterns That Work
Comparison prompts. If you’re comparing a new contract against a template or prior version, structure your prompt to highlight deltas:
Compare this contract against the attached template. For each section,
identify:
- Terms that match the template
- Terms that deviate (and how)
- New clauses not in the template
- Deleted clauses from the template
Focus on commercial terms and risk allocation. Ignore formatting differences.
Multi-step prompts. For complex contracts, break the task into steps. First pass: extract metadata and commercial terms. Second pass (if needed): deep dive on specific risk areas.
Step 1: Extract contract metadata and commercial terms (payment, term, renewal).
Output JSON.
Step 2: Based on your extraction, identify the top 3 legal risks for a
[buyer/seller/licensor/licensee]. Explain why each is a risk and suggest
a mitigation.
Constraint-based prompts. If you’re reviewing contracts in a specific domain (e.g., SaaS, M&A, employment), add domain-specific constraints:
You are reviewing a SaaS software license agreement. Pay special attention to:
- Data protection and GDPR/privacy obligations
- Intellectual property ownership and licensing scope
- Service-level agreements and uptime commitments
- Limitation of liability and indemnification
- Termination and data deletion obligations
Flag any terms that deviate from standard SaaS market practice.
According to the Anthropic best practices guide, specificity and constraint-based instructions improve reliability. The more constraints you add, the more consistent your output.
Common Prompt Pitfalls
Vague instructions. “Summarise the contract” is too open-ended. Sonnet 4.5 will produce a rambling summary. Instead: “In 3 sentences, summarise the key obligations of [Party A]. Use active voice.”
Asking for legal advice. “Is this a good contract?” is a legal question. Sonnet 4.5 will hallucinate an answer. Instead: “List the top 3 risk factors in this contract and explain why they matter.”
Mixing multiple tasks. “Extract terms, summarise, identify risks, and compare against the template” in a single prompt often produces incomplete or inconsistent output. Break it into multiple prompts or use a two-pass approach.
Ignoring context. If you’re reviewing a contract in a specific industry or jurisdiction, tell Sonnet 4.5. “This is an Australian SaaS agreement. Flag any terms that conflict with Australian Consumer Law or APRA requirements.” Context improves accuracy.
Building Reliable Output Validation
Sonnet 4.5 is good, but it’s not perfect. Your validation layer is what separates a prototype from a production system.
Validation Strategy
Validation happens at three levels:
- Structural validation. Does the output match the schema you requested?
- Semantic validation. Does the extracted content make sense?
- Spot-check validation. Does the extraction match the source document?
Structural Validation
If you ask for JSON, validate that the response is valid JSON and contains all required fields.
import json
from jsonschema import validate, ValidationError
schema = {
"type": "object",
"properties": {
"contract_metadata": {"type": "object"},
"commercial_terms": {"type": "object"},
"risk_flags": {"type": "array"},
"summary": {"type": "string"}
},
"required": ["contract_metadata", "commercial_terms", "risk_flags", "summary"]
}
try:
output = json.loads(sonnet_response)
validate(instance=output, schema=schema)
except (json.JSONDecodeError, ValidationError) as e:
# Log and retry or flag for manual review
log_validation_error(contract_id, e)
Semantic Validation
Does the extracted content make sense? Common checks:
- Date consistency. Is the effective date before the termination date? Is the renewal date after the term end?
- Numeric consistency. Is the payment amount positive? Is the discount percentage between 0 and 100?
- Cross-field consistency. If the contract is “auto-renewing,” is there a renewal mechanism defined?
- Presence of critical fields. For SaaS contracts, is there an SLA? For employment agreements, is there a compensation structure?
def validate_semantics(extraction):
errors = []
# Check date consistency
if extraction['effective_date'] > extraction['term_end_date']:
errors.append("Effective date is after term end date")
# Check numeric consistency
if extraction['payment_amount'] <= 0:
errors.append("Payment amount is not positive")
# Check cross-field consistency
if extraction['renewal_mechanism'] == 'auto-renews' and not extraction.get('renewal_notice_period'):
errors.append("Auto-renewal specified but no notice period defined")
return errors
Spot-Check Validation
For high-stakes contracts (M&A, major partnerships), spot-check key extractions against the source document. This is labour-intensive, but necessary for critical deals.
Use Sonnet 4.5 itself to perform spot checks:
I extracted the following from the contract:
- Payment schedule: £100,000 per year, paid quarterly in advance
- Term: 3 years from January 1, 2024
- Renewal: Auto-renews for 1-year periods unless terminated 90 days prior
Please verify these extractions against the attached contract.
If any extraction is inaccurate or incomplete, provide the correct text
from the contract.
This creates a feedback loop: extraction → validation → correction.
Handling Validation Failures
When validation fails, you have three options:
- Retry with a refined prompt. If the output is structurally invalid (malformed JSON), retry with a clearer instruction.
- Flag for manual review. If semantic validation fails, flag the contract for a human to review.
- Partial acceptance. If 80% of the extraction is valid and 20% is uncertain, accept the 80% and flag the 20% for manual review.
def process_contract(contract_text):
extraction = call_sonnet(contract_text)
# Structural validation
if not is_valid_json(extraction):
return retry_with_refined_prompt(contract_text)
# Semantic validation
semantic_errors = validate_semantics(extraction)
if semantic_errors:
if len(semantic_errors) > 3:
# Too many errors; flag for manual review
return flag_for_manual_review(contract_text, semantic_errors)
else:
# Minor errors; log and accept
log_warnings(semantic_errors)
return extraction
return extraction
Cost Optimisation and Token Management
Sonnet 4.5 is cheap, but not free. At scale, token costs add up. Here’s how to optimise.
Understanding Token Costs
As of late 2024, Sonnet 4.5 costs:
- Input: £0.003 per 1K tokens
- Output: £0.015 per 1K tokens
A typical 50-page contract:
- Input tokens: 15,000–25,000 (the contract itself)
- Output tokens: 2,000–5,000 (the extraction)
- Total cost: £0.05–£0.10 per contract
At 100 contracts per month, you’re looking at £5–£10/month in API costs. But if you’re processing 1,000 contracts per month, that’s £50–£100/month. And if you’re running multiple passes (extraction, validation, comparison), costs double or triple.
Token-Saving Strategies
Compress the contract. If a contract has a 20-page schedule of exhibits that aren’t relevant to your extraction, remove it. You save tokens without losing information.
def compress_contract(text):
# Remove exhibits, schedules, or appendices not relevant to extraction
# Remove boilerplate language (e.g., "This agreement is governed by the laws of...")
# Remove formatting, extra whitespace, and page breaks
return compressed_text
Use two-pass extraction. First pass: extract metadata and key commercial terms (cheap, fast). Second pass (conditional): if the first pass flags high-risk areas, do a deeper analysis.
def two_pass_extraction(contract_text):
# Pass 1: Extract metadata and commercial terms
extraction = call_sonnet(
contract_text,
prompt="Extract metadata and commercial terms only. Output JSON."
)
# Pass 2: Conditional deep dive
if extraction['risk_level'] == 'high':
deep_analysis = call_sonnet(
contract_text,
prompt="Deep dive on risk factors. Explain each risk and suggest mitigations."
)
extraction['risk_analysis'] = deep_analysis
return extraction
This approach saves 30–40% of tokens because you only do deep analysis on high-risk contracts.
Batch processing with caching. If you’re comparing multiple versions of the same contract, use prompt caching to avoid re-processing the base contract.
With Anthropic’s prompt caching feature, you can cache the first 1,000 tokens of a prompt. If you’re processing 10 versions of the same contract, you save 9,000 tokens (90% of the input cost for that section).
Reuse extractions. Don’t re-extract the same contract twice. Store extractions in a database and retrieve them for subsequent analysis.
Cost Monitoring
Track token usage and cost per contract:
def log_token_usage(contract_id, input_tokens, output_tokens):
total_cost = (input_tokens * 0.003 + output_tokens * 0.015) / 1000
log_to_database({
'contract_id': contract_id,
'input_tokens': input_tokens,
'output_tokens': output_tokens,
'cost': total_cost
})
Monitor trends. If your average cost per contract is rising, investigate why (longer contracts, more complex prompts, more validation passes).
Common Failure Modes and How to Engineer Around Them
Teams who’ve shipped contract-review systems on Sonnet 4.5 hit predictable failure modes. Here’s how to avoid them.
Failure Mode 1: Hallucinated Terms
What happens: Sonnet 4.5 invents a clause that doesn’t exist in the contract. For example, it might claim the contract includes an “automatic price adjustment clause” when it doesn’t.
Why it happens: LLMs are pattern-matching machines. If most contracts in the training data have price-adjustment clauses, Sonnet 4.5 might infer one even if it’s not explicitly stated.
How to prevent it:
- Constraint-based prompts. Tell Sonnet 4.5: “Only extract terms explicitly stated in the contract. Do not infer or assume.”
- Spot-check validation. For critical extractions, ask Sonnet 4.5 to cite the specific clause: “Quote the exact text from the contract that supports this extraction.”
- Negative examples. In your prompt, show examples of what NOT to do: “Do not infer that the contract includes an SLA if it’s not explicitly mentioned.”
Failure Mode 2: Missed Negations
What happens: Sonnet 4.5 misses a “not” or “except” clause. For example, it might extract “The licensor grants the licensee the right to modify the software” when the contract actually says “The licensor does NOT grant the licensee the right to modify the software.”
Why it happens: Negations are syntactically subtle. A single word can flip the meaning of a clause.
How to prevent it:
- Explicit negation checking. Ask Sonnet 4.5 to flag negations: “For each right or obligation, explicitly state whether it is granted or denied.”
- Double-pass validation. If a clause grants a significant right, ask Sonnet 4.5 in a second pass: “Does the contract explicitly grant [right]? Quote the exact text.”
- Structured output for negations. Use a schema that forces explicit true/false values:
{
"licensee_may_modify_software": false,
"source_clause": "Section 3.2: The licensor does not grant the licensee the right to modify, reverse-engineer, or create derivative works."
}
Failure Mode 3: Context Collapse on Long Contracts
What happens: On contracts longer than 100 pages, Sonnet 4.5’s accuracy drops. It might miss clauses in the middle or confuse terms from different sections.
Why it happens: Even with a 200K context window, very long documents can overwhelm the model’s attention. It’s not a hard failure, but accuracy degrades.
How to prevent it:
- Chunking and re-assembly. For contracts longer than 80 pages, split them into sections (e.g., “Terms and Conditions,” “Schedules,” “Exhibits”). Extract from each section separately, then re-assemble the extraction.
- Section-level extraction. Ask Sonnet 4.5 to first identify the major sections of the contract, then extract from each section:
Step 1: List the major sections of this contract (e.g., "Definitions,"
"Scope of Work," "Payment Terms," etc.).
Step 2: For each section, extract key terms and obligations.
- Prioritisation. Extract the most important sections first (commercial terms, risk allocation). Less critical sections (boilerplate, definitions) can be spot-checked later.
Failure Mode 4: Jurisdiction-Specific Misunderstanding
What happens: Sonnet 4.5 doesn’t understand the implications of a clause in a specific jurisdiction. For example, it might not flag that a limitation-of-liability clause is unenforceable under Australian Consumer Law or that a non-compete clause violates English common law.
Why it happens: Sonnet 4.5 was trained on general English-language legal documents. It doesn’t have deep knowledge of every jurisdiction’s specific requirements.
How to prevent it:
- Jurisdiction-aware prompts. Tell Sonnet 4.5 the relevant jurisdiction: “This is an Australian SaaS agreement. Flag any terms that may conflict with Australian Consumer Law, Privacy Act, or APRA requirements.”
- Post-processing with jurisdiction-specific rules. After Sonnet 4.5 extracts terms, apply jurisdiction-specific validation rules. For example, in Australia, check for Consumer Law compliance; in the US, check for state-specific non-compete enforceability.
- Escalation to human lawyers. For contracts in regulated industries (financial services, insurance, healthcare), always escalate to a qualified lawyer for final review. Sonnet 4.5 is a first-pass filter, not a substitute for legal counsel.
Failure Mode 5: Inconsistent Output Format
What happens: Sonnet 4.5 returns JSON that doesn’t match your schema. For example, it might use “payment_schedule” in one response and “payment_terms” in another, or it might return an array when you expected a string.
Why it happens: Despite detailed instructions, models occasionally deviate from the specified format, especially under time pressure or with ambiguous instructions.
How to prevent it:
- Strict schema validation. Validate every response against your schema before processing it downstream.
- Retry logic. If validation fails, retry with a more explicit instruction: “Return ONLY valid JSON matching this schema: [schema]. Do not include any text outside the JSON.”
- Output format examples. In your prompt, provide multiple examples of correctly formatted output. More examples = more consistency.
Integration Patterns for Production Workflows
Once you’ve built a reliable contract-review system, you need to integrate it into your actual workflows. Here’s how.
Synchronous vs. Asynchronous Processing
Synchronous processing: User uploads a contract, waits for extraction, gets results immediately. Good for small batches or interactive workflows.
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/review-contract', methods=['POST'])
def review_contract():
contract_file = request.files['contract']
contract_text = extract_text_from_pdf(contract_file)
extraction = call_sonnet(contract_text, prompt)
validation_errors = validate_semantics(extraction)
return jsonify({
'extraction': extraction,
'validation_errors': validation_errors,
'status': 'complete'
})
Asynchronous processing: User uploads a contract, gets a job ID, receives results via email or webhook when ready. Good for large batches or long-running analyses.
from celery import Celery
celery_app = Celery('contract_review')
@celery_app.task
def review_contract_async(contract_id, contract_text):
extraction = call_sonnet(contract_text, prompt)
validation_errors = validate_semantics(extraction)
# Store results in database
save_extraction(contract_id, extraction, validation_errors)
# Notify user
send_email(user_email, f"Contract {contract_id} review complete")
For most production systems, asynchronous processing is better. It scales better, handles rate limiting gracefully, and lets you batch requests to save on API costs.
Integrating with Contract Management Systems
If you’re using a contract management platform (e.g., Ironclad, Airtable, Notion), integrate Sonnet 4.5 as a data-enrichment layer.
def enrich_contract_in_airtable(contract_record):
contract_text = fetch_contract_from_storage(contract_record['file_id'])
extraction = call_sonnet(contract_text, prompt)
# Write extraction back to Airtable
update_airtable_record(
contract_record['id'],
{
'extracted_payment_terms': extraction['commercial_terms']['payment_schedule'],
'risk_flags': extraction['risk_flags'],
'summary': extraction['summary'],
'extraction_status': 'complete'
}
)
This pattern keeps your contract metadata in one place and lets you query and report on extractions.
Building a Contract Review Dashboard
Create a dashboard that shows:
- Extraction status. How many contracts have been processed, how many are pending, how many failed.
- Risk summary. Aggregate risk flags across all contracts (e.g., “50 contracts have missing IP assignment clauses”).
- Cost tracking. Total API costs, cost per contract, trends over time.
- Manual review queue. Contracts that failed validation and need human review.
def get_dashboard_metrics():
return {
'total_contracts': count_contracts(),
'processed': count_contracts(status='complete'),
'pending': count_contracts(status='pending'),
'failed_validation': count_contracts(status='failed'),
'high_risk_count': count_risk_flags(severity='high'),
'total_api_cost': sum_api_costs(),
'avg_cost_per_contract': avg_api_cost(),
'manual_review_queue': get_failed_contracts()
}
Security, Compliance, and Legal Considerations
Legal contracts contain sensitive information. You need to handle them securely.
Data Security
Encryption in transit. Use HTTPS for all API calls to Anthropic. Encrypt contract text before sending to the API.
Encryption at rest. Store extracted data and contract text in encrypted databases. Use AES-256 encryption for sensitive fields.
Access control. Limit who can view contract extractions. Use role-based access control (RBAC) to ensure only authorised users can see sensitive terms.
If you’re working with companies pursuing SOC 2 or ISO 27001 compliance, contract review systems are often part of the audit scope. Ensure your system is audit-ready from the start. PADISO’s Security Audit service can help you design and implement controls that pass SOC 2 and ISO 27001 audits via Vanta, ensuring your contract-review system meets enterprise security standards.
Compliance Considerations
GDPR and privacy. If you’re processing contracts that contain personal data (e.g., employee names, email addresses), you need GDPR compliance. Minimise the personal data you extract, and ensure you have a lawful basis for processing.
Privileged information. Contracts often contain attorney-client privileged information or trade secrets. Be careful not to expose these when sharing extractions. Consider anonymising sensitive information in summaries.
Audit trail. Keep a record of who accessed which contracts, when extractions were performed, and what changes were made. This is essential for compliance audits.
Legal Liability
Disclaimer. Make it clear that Sonnet 4.5 extractions are not legal advice and should not be relied upon as a substitute for qualified legal counsel. Include a disclaimer in your UI and in any reports you generate.
Responsibility. If a contract-review system misses a critical clause and causes financial loss, who is responsible? Define liability clearly in your terms of service.
Insurance. Consider errors-and-omissions insurance if you’re offering contract-review services to external clients.
For teams in regulated industries (financial services, insurance, healthcare), work with compliance and legal teams to define how Sonnet 4.5 fits into your legal review process. It’s a tool to accelerate review, not to replace it.
Measuring Success and Iterating
Once you’ve launched, measure what matters.
Key Metrics
Accuracy. What percentage of extractions are correct? Measure this by spot-checking a random sample of extractions against the source document.
def measure_accuracy(sample_size=100):
sample = get_random_contracts(sample_size)
correct = 0
for contract in sample:
extraction = get_extraction(contract['id'])
# Manual review: is extraction correct?
if is_extraction_correct(contract, extraction):
correct += 1
return correct / sample_size
Speed. How long does it take to extract a contract from upload to completion? Track this and compare against manual review time.
Cost. What’s the cost per contract? Track this monthly and look for trends.
Adoption. Are teams using the system? Track the number of contracts processed per week and per user.
False-positive rate. How many risk flags are actually risks vs. false positives? If your false-positive rate is high, you’ll train users to ignore flags.
Iteration Cycles
Weekly. Review failed validations and extraction errors. Identify patterns (e.g., “extraction fails on contracts with non-standard formatting”). Update your prompt or validation logic.
Monthly. Measure accuracy, speed, and cost. Compare against baseline. Identify the contracts that are hardest to process and plan improvements.
Quarterly. Conduct user research. Are teams satisfied with the system? What features would increase adoption? What are the biggest pain points?
Feedback Loops
Build a feedback mechanism into your system. Let users flag incorrect extractions. Use this feedback to retrain or refine your prompts.
@app.route('/flag-extraction-error', methods=['POST'])
def flag_extraction_error():
contract_id = request.json['contract_id']
error_description = request.json['error']
log_to_database({
'contract_id': contract_id,
'error': error_description,
'timestamp': datetime.now()
})
# Periodically review flagged errors and update prompts
Over time, this feedback will tell you which types of contracts are hard for Sonnet 4.5 to process and where you need to add extra validation or human review.
Next Steps: Building Your Contract Review System
If you’re considering building a contract-review system, here’s a practical roadmap.
Phase 1: Proof of Concept (Weeks 1–2)
- Define your use case. What contracts do you want to review? What information do you want to extract? What’s the ROI?
- Build a simple prompt. Start with a basic extraction prompt (metadata, commercial terms, risk flags).
- Test on 10 contracts. Run your prompt against 10 real contracts from your pipeline. Measure accuracy manually.
- Iterate on the prompt. If accuracy is below 80%, refine your prompt and test again.
Phase 2: MVP (Weeks 3–6)
- Build basic validation. Add structural and semantic validation. Aim for 95%+ structural validity.
- Integrate with your workflow. Connect to your contract management system or build a simple web interface.
- Test on 100 contracts. Process 100 real contracts. Measure accuracy, speed, and cost.
- Set up monitoring. Track token usage, API costs, and validation failures.
Phase 3: Production (Weeks 7–12)
- Add security controls. Encrypt data in transit and at rest. Implement access control and audit logging.
- Build a dashboard. Show extraction status, risk summary, and cost tracking.
- Scale to your full volume. Process all contracts from your pipeline. Measure adoption and ROI.
- Establish feedback loops. Let users flag extraction errors. Use this feedback to improve your prompts.
For teams that need hands-on support building and scaling contract-review systems, PADISO’s AI & Agents Automation service and AI Strategy & Readiness programme can help you design the architecture, build the validation layer, and integrate with your existing workflows. If you’re also pursuing SOC 2 or ISO 27001 compliance as part of your modernisation, PADISO’s Security Audit service ensures your system is audit-ready from the start.
Building for Scale
Once you’ve validated the concept, think about scale:
- Batch processing. Use asynchronous job queues (Celery, AWS SQS) to process contracts in batches.
- Caching. Use prompt caching to avoid re-processing the same contract or template multiple times.
- Multi-tenant architecture. If you’re offering this as a service to multiple teams or companies, design for multi-tenancy from the start.
- Monitoring and alerting. Set up alerts for API failures, validation failures, and cost anomalies.
For enterprises running platform modernisation or M&A roll-ups, contract review is often part of a larger technology consolidation. PADISO’s Platform Design & Engineering service and CTO as a Service can help you integrate contract review into a broader AI and automation strategy.
Understanding the Broader Context
Contract review is one use case for Sonnet 4.5. As you build, you’ll discover other opportunities to automate legal workflows:
- Due diligence. Automating document review for M&A and fundraising.
- Compliance monitoring. Flagging contracts that violate company policy or regulatory requirements.
- Contract lifecycle management. Tracking renewal dates, obligations, and key milestones.
- Negotiation support. Suggesting redlines based on your company’s standard terms.
For teams exploring broader AI transformation, PADISO’s AI Advisory Services can help you identify high-ROI use cases, design your AI strategy, and execute on your roadmap. If you’re in financial services, insurance, or healthcare, PADISO has industry-specific expertise in financial services AI and insurance AI that includes compliance-aware design from the start.
Research and Further Learning
As you build, stay informed on the latest developments in AI and contract review. The NIST AI Risk Management Framework provides guidance on managing AI risks, including reliability and transparency concerns relevant to legal use cases. Academic research on contract review and LLMs is advancing rapidly; check arXiv and SSRN for recent papers on LLM-assisted contract analysis.
For practical guidance on prompt design and reliability, refer to the Anthropic best practices documentation and the official models overview. The American Bar Association’s Business Law Today and the American Bankruptcy Institute Journal publish practitioner-focused articles on legal technology and contract workflows.
Summary
Sonnet 4.5 is the first LLM where legal contract review makes economic sense at scale. With a 200K context window, strong reasoning, and fast inference, it can extract commercial terms, identify risks, and summarise obligations in seconds—not hours.
But production-grade contract review requires discipline:
- Design prompts for specificity. Tell Sonnet 4.5 exactly what you want extracted and how to format it.
- Validate ruthlessly. Structural validation catches malformed output. Semantic validation catches hallucinations. Spot-check validation catches edge cases.
- Optimise for cost. Compress contracts, use two-pass extraction, and batch process to keep costs low.
- Engineer around failure modes. Hallucinations, missed negations, context collapse, and jurisdiction-specific misunderstandings are predictable. Build controls to prevent them.
- Integrate carefully. Design your system for your workflow, not the other way around.
- Measure and iterate. Track accuracy, speed, cost, and adoption. Use feedback to improve your prompts and validation logic.
If you’re building a contract-review system, start with a proof of concept on 10 contracts. Measure accuracy manually. Iterate on your prompt until you hit 80%+ accuracy. Then scale to your full volume, add validation and monitoring, and measure ROI.
For teams needing strategic guidance on AI deployment, PADISO can help. Whether you’re a founder building a contract-review startup, an operator modernising legal workflows, or a PE firm running technology due diligence, PADISO’s fractional CTO and AI advisory services can help you design, build, and scale AI systems that work. Get in touch to discuss your contract-review roadmap.