Table of Contents
- Why Sonnet 4.5 for Meeting Summarisation
- Core Architecture and Integration Points
- Prompt Design Patterns That Actually Work
- Output Validation and Quality Gates
- Cost Optimisation Without Cutting Corners
- Failure Modes and How to Dodge Them
- Real-World Implementation: A Case Study
- Security, Compliance, and Data Handling
- Next Steps and Scaling Your Summarisation Pipeline
Why Sonnet 4.5 for Meeting Summarisation
Meeting note summarisation is one of the highest-ROI use cases for large language models in the enterprise. A 60-minute meeting generates 5,000–8,000 words of transcript. Asking humans to distil that into three actionable bullets takes 15–20 minutes of focused work. Multiply that across 50+ meetings per week in a mid-sized organisation, and you’re burning 12–15 hours weekly on busy work that adds no strategic value.
Sonnet 4.5 is purpose-built for this problem. Unlike earlier Claude models, Sonnet 4.5 strikes the critical balance between speed, cost, and reasoning depth. It processes meeting transcripts in 2–3 seconds, costs roughly 60% less than GPT-4 Turbo per token, and—crucially—it understands context deeply enough to distinguish between throwaway comments and actual decisions.
We’ve deployed Sonnet 4.5 for meeting summarisation across 15+ client workflows at PADISO. The pattern holds: teams ship production summaries in 4 weeks, cut meeting-admin overhead by 35–40%, and recover 8–12 hours per person per month. The catch? The gap between “working prototype” and “production-grade pipeline” is wider than most engineering teams expect.
This guide walks you through the patterns we’ve battle-tested, the pitfalls we’ve hit, and the validation gates that separate a toy demo from a system your team will actually trust.
Core Architecture and Integration Points
Where Sonnet 4.5 Sits in Your Meeting Workflow
Meeting summarisation doesn’t exist in isolation. It’s a node in a larger workflow: capture → transcription → summarisation → distribution → archival. Where you place Sonnet 4.5 in that chain, and how you connect it, determines whether it ships in four weeks or four months.
The simplest pattern is synchronous, single-call summarisation. A meeting ends. The audio or transcript is sent to Sonnet 4.5 via the Anthropic API. A summary comes back within seconds. It lands in Slack, email, or a shared doc. This works well for teams with <20 meetings per week and low latency requirements.
The more resilient pattern is asynchronous batch processing with retry logic. Transcripts queue in a job system (AWS SQS, GCP Pub/Sub, or even a simple PostgreSQL queue). A worker pool calls Sonnet 4.5 with exponential backoff. Summaries are stored in a database, tagged with metadata (attendees, duration, topics), and indexed for search. This pattern scales to 500+ meetings per week and tolerates transient API failures.
Most teams we work with start synchronous and migrate to async within 3–6 months, once they realise they need audit trails, retry logic, and the ability to re-summarise meetings when prompts improve.
Integration with Transcription Services
Sonnet 4.5 doesn’t transcribe audio. It needs a transcript first. The transcription layer matters more than most teams realise.
Automatic Speech Recognition (ASR) quality is your ceiling. If the transcript has 15% error rate, Sonnet 4.5 can’t magic that into a coherent summary. It will hallucinate context, miss key decisions, or produce nonsense.
We recommend:
- Zoom, Google Meet, Microsoft Teams native transcription for internal meetings. These are 90%+ accurate for clear speech, baked into your existing tooling, and require zero additional infrastructure.
- Deepgram, AssemblyAI, or Rev for external meetings or poor audio. These services offer 95%+ accuracy and speaker diarisation (who said what). The cost is $0.10–0.30 per minute, which is negligible compared to the value of accurate summaries.
- Avoid generic AWS Transcribe or Google Cloud Speech-to-Text unless you have a specific reason. They’re cheaper ($0.02–0.04 per minute) but 5–10% less accurate on natural speech, which compounds errors downstream.
Once you have a clean transcript, Sonnet 4.5 needs it in a structured format. Plain text works, but speaker-labelled text (“Alice: …”, “Bob: …”) is better. Timestamps are a bonus—they let you link summaries back to video for verification.
API Integration Patterns
The Claude models overview documents Sonnet 4.5’s exact specs: 200K token context window, 4,096 token output limit, and pricing of $3 per 1M input tokens and $15 per 1M output tokens.
For a typical 60-minute meeting transcript (5,000–8,000 words = 1,500–2,400 tokens), the API call costs $0.005–0.01 and returns a 200–400 token summary ($0.003–0.006). Total: $0.008–0.016 per meeting.
Integration is straightforward. Call the API with the transcript as the user message and your summarisation prompt as the system prompt. Handle rate limits (100 requests per minute for most accounts; request a higher tier if you’re doing 500+ meetings/week). Cache the transcript if you’re re-summarising the same meeting multiple times (e.g., refining the prompt)—Sonnet 4.5 supports prompt caching, which reduces input token cost by 90% on cache hits.
Prompt Design Patterns That Actually Work
The Anatomy of a Production Summarisation Prompt
This is where most teams stumble. A prompt that works for a demo doesn’t work in production. Here’s why: demos use one or two “nice” transcripts—clear speech, logical flow, few tangents. Production gets everything: rambling meetings, cross-talk, off-topic jokes, people joining late, people dropping off.
A production prompt must be explicit about structure, tone, and failure modes. Here’s the pattern we use:
You are a meeting summarisation assistant. Your job is to extract the essential information from a meeting transcript and present it in a structured, scannable format.
IMPORTANT CONSTRAINTS:
1. Focus on decisions, action items, and key insights. Ignore pleasantries, off-topic tangents, and meta-discussion about the meeting itself.
2. If a decision or action item is unclear, flag it as "[UNCLEAR]" rather than guessing.
3. If no decisions or action items exist, say so explicitly.
4. Use Australian English spelling (colour, organisation, realise, etc.).
5. Attribute action items to specific people where named. If unnamed, use "TBD".
6. Keep the summary to 300 words maximum. Prioritise decision clarity over comprehensiveness.
OUTPUT FORMAT:
## Summary
[2–3 sentences on the meeting's purpose and main outcome]
## Decisions
- [Decision 1]
- [Decision 2]
[etc.]
## Action Items
- [Owner]: [Task] (due [date if mentioned])
[etc.]
## Risks or Blockers
- [Issue 1]
[etc.]
## Next Meeting
[Date/time if scheduled, else "TBD"]
Now, summarise this transcript:
---
[TRANSCRIPT]
---
Let’s break down why this works:
Explicit constraints. “Ignore pleasantries” tells Sonnet 4.5 not to waste tokens on “Thanks for joining” or “Great to see everyone.” “Flag unclear items as [UNCLEAR]” is critical—it’s better to ask for clarification than to hallucinate a decision.
Structured output. A fixed format (Summary, Decisions, Action Items, Risks) makes it easy to parse, validate, and feed downstream. Your team knows where to look. Your parsing logic doesn’t have to guess.
Australian English. If you’re in Australia or have Australian clients, specify the spelling convention. Sonnet 4.5 respects this.
Attribution. “Attribute to specific people” prevents summaries that say “We should hire a senior engineer” without saying who’s actually doing the hiring.
Word limit. 300 words forces prioritisation. A 1,500-word summary is useless; no one reads it. A 300-word summary that captures the three decisions and five action items gets reviewed, acted on, and trusted.
Prompt Variations for Different Meeting Types
One prompt doesn’t fit all meetings. A board meeting needs different focus than a sprint planning session.
Board/executive meetings: Add “Include financial impact, strategic implications, and stakeholder concerns.” Emphasise decisions over action items.
Engineering standups: Add “Prioritise blockers and dependencies. Ignore routine status updates unless they affect other teams.” Emphasise action items over decisions.
Sales/customer calls: Add “Note customer pain points, objections, and next steps. Include deal stage if mentioned.” Emphasise customer context and follow-ups.
All-hands meetings: Add “Summarise announcements, policy changes, and Q&A themes. Note any personnel or org changes.” Emphasise broadcast information.
The core prompt stays the same; you layer meeting-type-specific guidance on top. Sonnet 4.5 handles this gracefully—it doesn’t get confused by role-specific instructions.
Handling Edge Cases in Prompts
Production transcripts are messy. People interrupt. People go off-topic. People say “um” and “like” every other word. Your prompt needs to handle these gracefully.
For cross-talk and interruptions: Add “If multiple people speak simultaneously, treat the first speaker’s intent as primary and note the interruption only if it changes the decision.” This prevents Sonnet 4.5 from trying to synthesise contradictory statements.
For off-topic tangents: Add “If a topic is discussed for <2 minutes and not revisited, treat it as tangential and exclude it unless it directly affects a decision.” This lets Sonnet 4.5 distinguish between a 10-minute debate on a real issue and a 30-second joke.
For incomplete information: Add “If a decision is made but the implementation details are unclear, summarise the decision and flag the missing details as [UNCLEAR: implementation timeline].” This prevents the summary from being either vague or hallucinated.
For very long meetings (90+ minutes): Split the transcript into 30-minute chunks, summarise each, then ask Sonnet 4.5 to synthesise the chunk summaries into a master summary. This avoids context-window issues and improves quality—Sonnet 4.5 is better at synthesising summaries than at summarising very long transcripts in one pass.
Output Validation and Quality Gates
Why Manual Spot-Checks Aren’t Enough
Sonnet 4.5 is good, but it’s not perfect. It hallucinates action items that weren’t in the transcript. It misses decisions buried in long explanations. It sometimes attributes decisions to the wrong person.
If you deploy without validation gates, you’ll catch these errors in production: someone acts on a hallucinated action item, wastes a week, and stops trusting the system. The system dies.
Validation gates catch errors before they propagate. They also give you data to improve prompts and retrain your team on how to run meetings better.
Automated Validation Rules
Start with these checks:
1. Structure validation. Does the output have all required sections (Summary, Decisions, Action Items, Risks)? Parse the output as structured text. If a section is missing, flag it as a validation failure and re-summarise with a stricter prompt.
2. Action item attribution. Every action item should have an owner. Use regex to check for “TBD” or unnamed owners. If >20% of action items are unattributed, flag the summary and ask the meeting organiser to clarify in a follow-up.
3. Hallucination detection. Search the original transcript for key phrases from the summary. If a decision appears in the summary but not in the transcript (fuzzy match, allowing for paraphrasing), flag it. This is imperfect—some inference is valid—but it catches gross hallucinations.
4. Length validation. If the summary is >400 words, it’s too long and probably includes unnecessary detail. If it’s <50 words, it’s too short and probably misses key points. Flag both.
5. Urgency detection. Search for keywords like “urgent”, “ASAP”, “blocker”, “critical”. If these appear in the transcript but not in the Risks section, flag it.
These checks are deterministic and fast. They catch 60–70% of obvious errors without human review.
Human Review Workflows
Automated checks catch obvious failures. Human review catches subtle ones.
Tier 1: Meeting organiser spot-check. The person who ran the meeting reviews the summary within 24 hours. They’re the ground truth. If they say the summary is wrong, you retrain the prompt or escalate to the team.
This takes 2–3 minutes per summary. It’s fast enough that it doesn’t create a bottleneck. And it builds trust: people see their feedback improve the system.
Tier 2: Random sampling. Every week, pull 5–10 random summaries and have someone (not the organiser) review them blind. This catches systematic errors that the organiser might miss because they’re too close to the meeting.
Tier 3: Escalation on disagreement. If the organiser says a summary is wrong, don’t just delete it. Log the error, compare the original transcript to the summary, and figure out why Sonnet 4.5 got it wrong. Was it a prompt issue? A transcription error? A genuinely ambiguous decision? Use this data to improve the prompt or the transcription service.
Feedback Loops and Continuous Improvement
Validation isn’t a one-time gate. It’s a feedback loop.
Every time someone corrects a summary, log it. After 50–100 corrections, analyse the patterns. Are action items consistently misattributed? Is Sonnet 4.5 missing decisions about specific topics (e.g., hiring, budget)? Are certain meeting types consistently worse?
Use this data to refine your prompt. Add meeting-type-specific guidance. Add examples of tricky decisions and how to handle them. Re-test on your corpus of corrected summaries. Measure improvement.
We’ve seen teams improve summary quality by 30% within 8 weeks just by iterating on the prompt based on real feedback. The key is treating validation as a learning signal, not a compliance checkbox.
Cost Optimisation Without Cutting Corners
The Math: Where Your Money Actually Goes
At $3 per 1M input tokens and $15 per 1M output tokens, a 60-minute meeting costs $0.01–0.02 to summarise. For 50 meetings per week, that’s $25–50 per week, or $1,300–2,600 per year.
That’s cheap. But if you’re processing 500+ meetings per week (large enterprise), you’re spending $13,000–26,000 per year. If you’re processing 5,000+ meetings per week (very large enterprise or SaaS with embedded summaries), you’re looking at $130,000–260,000 per year.
At that scale, cost optimisation matters.
Prompt Caching: The 90% Cost Reduction
Sonnet 4.5 supports prompt caching. If you send the same system prompt and prefix multiple times, the API caches it and charges 90% less for cached tokens on subsequent calls.
For meeting summarisation, this is powerful. Your system prompt (the summarisation instructions) is the same for every call. If you process 100 meetings per week, you’re sending the same 500–600 token prompt 100 times. With caching, you pay full price once and 10% price for the other 99.
Savings: roughly 45% on total input token cost, or $6–12 per week per 50 meetings.
To enable caching, add "cache_control": {"type": "ephemeral"} to your API request. The cache lasts 5 minutes, which is fine for batch processing. For longer caches (hours or days), use "type": "static" and manage cache invalidation yourself.
Batch Processing and Rate Limiting
If you’re processing 100+ meetings per week, don’t call the API synchronously for each one. Queue them, batch them, and process asynchronously.
Benefits:
- Rate limit headroom. The API has a rate limit (100 requests/minute for most accounts). Batching lets you stay under the limit without upgrading your tier.
- Retry logic. If an API call fails, you can retry without losing the summary. Synchronous calls fail and disappear.
- Cost tracking. Batch processing makes it easy to log each call, track cost per meeting, and spot anomalies (e.g., a 10,000-word transcript that costs 10x the average).
- Parallelism. You can process 20 meetings in parallel (respecting rate limits) instead of one at a time.
Use a job queue: AWS SQS, GCP Pub/Sub, or a simple PostgreSQL queue with a worker process. Queue transcripts as they arrive. Workers pick up jobs, call Sonnet 4.5, store results, and mark jobs complete. If a worker crashes, the job re-enters the queue.
This is standard infrastructure. Most teams already have it for other async tasks. Reuse it.
Token Counting and Budgeting
You can’t optimise what you don’t measure. Implement token counting on every call.
Before calling the API, count tokens in the transcript and prompt using the Anthropic tokeniser (available in the SDK). Log the count. After the call, log the output tokens. Over time, you’ll see patterns:
- Average input tokens per meeting type
- Average output tokens per summary
- Outliers (meetings that are 5x the average token count)
Use this to budget. If you’re processing 100 meetings/week with an average of 1,500 input tokens and 300 output tokens per meeting:
- Input cost: (100 × 1,500) × ($3 / 1M) = $0.45/week
- Output cost: (100 × 300) × ($15 / 1M) = $0.45/week
- Total: $0.90/week, or $47/year
Small. But if you’re processing 5,000 meetings/week:
- Input cost: (5,000 × 1,500) × ($3 / 1M) = $22.50/week
- Output cost: (5,000 × 300) × ($15 / 1M) = $22.50/week
- Total: $45/week, or $2,340/year
Now cost matters. Optimising the prompt to reduce output tokens by 20% saves $468/year. Enabling caching saves another $1,000+/year. These add up.
When to Use Cheaper Models (and When Not To)
Sonnet 4.5 is fast and cheap, but it’s not the cheapest. Haiku is cheaper. GPT-4o mini is cheaper. Should you use them?
Use Sonnet 4.5 if:
- You need high-quality summaries (decisions must be accurate, action items must be clear)
- Your meetings are complex or ambiguous (strategy, customer calls, board meetings)
- You’re willing to pay $0.01–0.02 per meeting for reliability
Use Haiku if:
- Your meetings are routine and predictable (standups, status updates)
- You can tolerate occasional errors (you have human review anyway)
- You’re processing 1,000+ meetings/week and cost is a constraint
Haiku costs 50% less than Sonnet 4.5 but produces lower-quality summaries on complex meetings. We’ve tested both on the same corpus: Sonnet 4.5 gets decisions right 92% of the time; Haiku gets them right 78% of the time. That 14% difference matters if you’re relying on summaries to run the business.
For most teams, Sonnet 4.5 is the right choice. Use it.
Failure Modes and How to Dodge Them
Hallucination: The Silent Killer
Sonnet 4.5 sometimes invents action items, decisions, or details that weren’t in the transcript. This is rare (we see it in <5% of summaries) but catastrophic when it happens.
Example: A meeting discusses hiring a senior engineer “sometime this year.” Sonnet 4.5 summarises it as “Action: John to post job description by Friday.” John never said he’d do it by Friday. He’s now accountable for something he didn’t commit to.
How to dodge it:
-
Validate against the transcript. Every summary should be spot-checked against the original. Use fuzzy matching to find phrases in the summary that don’t appear in the transcript. Flag them.
-
Use [UNCLEAR] liberally. Your prompt should tell Sonnet 4.5 to flag ambiguous decisions as [UNCLEAR] rather than guessing. Review [UNCLEAR] flags with the meeting organiser.
-
Keep summaries short. Long summaries have more room for hallucination. Force yourself to 300 words max. This makes hallucinations obvious.
-
Use examples in the prompt. Show Sonnet 4.5 examples of good summaries (with no hallucination) and bad ones (with hallucination). It learns from examples.
Missed Decisions: The Opposite Problem
Sonnet 4.5 sometimes misses important decisions because they’re buried in long explanations or stated implicitly.
Example: Someone says, “We’ve been thinking about moving to a new vendor. I talked to the team, and everyone agrees it makes sense. The cost is lower, the support is better, and we can migrate in Q2. Let’s do it.” Sonnet 4.5 might summarise this as “Discussed vendor options” without capturing the actual decision to migrate.
How to dodge it:
-
Train your team to state decisions explicitly. “We’ve decided to migrate to Vendor X in Q2” is clearer than “Let’s do it.” This is a one-time investment in meeting discipline that pays off forever.
-
Add decision-detection examples to your prompt. Show Sonnet 4.5 examples of implicit decisions and how to capture them.
-
Use a second pass for long meetings. Summarise the transcript once, then ask Sonnet 4.5 to review its own summary and answer: “What are the 3 most important decisions in this meeting?” It often catches things it missed the first time.
Transcript Quality Issues
Garbage in, garbage out. If the transcript has 20% error rate, the summary will be incoherent.
How to dodge it:
-
Use a good transcription service. Zoom/Teams/Meet native transcription is fine for internal meetings. For external meetings or noisy environments, use Deepgram or AssemblyAI.
-
Pre-process the transcript. Remove filler words (um, uh, like) and fix obvious errors (“Jone” → “John”). This takes 1–2 minutes per transcript and improves summary quality by 10–15%.
-
Detect and flag low-quality transcripts. If the transcript has unusual patterns (very short, lots of [inaudible], high error rate), flag it before sending to Sonnet 4.5. Ask the organiser to re-record or manually clean up.
Context Window and Token Limits
Sonnet 4.5 has a 200K token context window. A 60-minute meeting is 1,500–2,400 tokens. You’re nowhere near the limit.
But if you’re processing a 4-hour meeting (rare, but it happens), you might hit the limit. And if you’re sending multiple transcripts or adding examples to your prompt, token count adds up.
How to dodge it:
-
Count tokens before calling the API. Use the Anthropic tokeniser. If you’re over 180K tokens, split the transcript.
-
Split long meetings into chunks. For a 4-hour meeting, split it into four 1-hour chunks, summarise each, then synthesise. This is actually better for quality—Sonnet 4.5 is better at synthesising summaries than at summarising very long transcripts.
-
Don’t include the full transcript in your prompt. Only include the transcript itself, not metadata, examples, or other context. Keep the prompt lean.
Rate Limiting and API Failures
The Anthropic API has rate limits. If you hit them, your requests get queued or rejected. If you don’t handle this gracefully, your summaries disappear.
How to dodge it:
-
Implement exponential backoff. If an API call fails, wait 1 second, then try again. If it fails again, wait 2 seconds, then try again. Keep doubling until you hit a max (e.g., 60 seconds).
-
Queue requests. Don’t call the API synchronously. Queue requests and process them asynchronously with a worker pool that respects rate limits.
-
Monitor rate limit headers. The API returns headers telling you how many requests you have left. Log these. If you’re consistently hitting the limit, request a higher tier.
-
Have a fallback. If Sonnet 4.5 is unavailable, can you fall back to Haiku or another model? Can you serve a cached summary from a previous meeting? Think about graceful degradation.
Real-World Implementation: A Case Study
The Setup
A mid-market financial services firm (50 employees, $10M ARR) was drowning in meeting notes. They had 80–100 meetings per week across sales, engineering, operations, and executive teams. Each meeting generated 5–10 pages of notes, often incomplete or unclear. Action items got lost. Decisions got misremembered. Meetings were re-run to clarify what was actually decided.
They estimated they were spending 15–20 hours per week on meeting admin: taking notes, cleaning them up, distributing them, chasing down clarifications.
They came to PADISO to build a meeting summarisation system. Their brief: ship in 4 weeks, integrate with Zoom and Google Meet, produce summaries that the team would actually use, and keep cost under $500/month.
Week 1: Architecture and MVP
We designed a simple architecture:
- Transcription: Zoom and Google Meet native transcription (already available in their account).
- Summarisation: Sonnet 4.5 via the Anthropic API.
- Storage: PostgreSQL for transcripts and summaries.
- Distribution: Slack integration (summary posted to a #meetings channel within 30 minutes of meeting end).
We built an MVP in 3 days:
- A webhook listener for Zoom/Meet meeting end events
- A transcript fetcher (Zoom and Meet APIs)
- A Sonnet 4.5 summariser with a basic prompt
- A Slack poster
The basic prompt was:
Summarise this meeting in 3 bullet points: key decision, action items, and risks.
We tested it on 10 real meeting transcripts from their archive. Results were… rough. Summaries were vague, action items were unattributed, and some decisions were missed.
Week 2: Prompt Refinement
We spent the week iterating on the prompt. We added:
- Explicit structure (Summary, Decisions, Action Items, Risks)
- Constraints (300 words max, flag unclear items, attribute action items)
- Meeting-type-specific guidance (different prompts for sales calls vs. engineering standups vs. board meetings)
We tested each iteration on the same 10 transcripts and compared outputs. We brought in team members to review summaries and give feedback.
After 40 iterations, we landed on a prompt that produced summaries the team actually wanted to read. Key insight: the team cared more about clarity and actionability than comprehensiveness. A 3-sentence summary with 5 clear action items beat a 10-sentence summary with 20 vague action items.
Week 3: Validation and Deployment
We built validation gates:
- Structure validation. Check that all required sections are present.
- Action item attribution. Flag unattributed action items.
- Length validation. Flag summaries that are too long or too short.
- Hallucination detection. Search the transcript for key phrases from the summary.
We also built a human review workflow. Every summary was reviewed by the meeting organiser within 24 hours. If they marked it as incorrect, we logged the error and used it to refine the prompt.
We deployed to production with Slack integration. Summaries started appearing in #meetings within 30 minutes of each meeting.
Week 4: Iteration and Optimisation
In the first week of production, we processed 85 meetings. We collected feedback from the team:
- 78 summaries were marked as “good” (used by attendees, acted on)
- 5 summaries were marked as “needs clarification” (some decisions were unclear)
- 2 summaries were marked as “wrong” (hallucinated action items)
We analysed the 7 problematic summaries:
- The 5 “needs clarification” summaries were from complex strategy meetings where decisions were genuinely ambiguous. We added a note to the prompt: “If a decision is ambiguous, flag it as [UNCLEAR] rather than trying to resolve it.”
- The 2 “wrong” summaries had hallucinated action items that didn’t appear in the transcript. We added hallucination detection to the validation pipeline and re-summarised with a stricter prompt.
After these tweaks, the next 50 summaries had a 96% “good” rate. The team started relying on them.
Results
After 8 weeks (4 weeks to build + 4 weeks to refine), the system was processing 80–100 meetings per week. Metrics:
- Time saved: 12–15 hours per week (meeting organisers no longer wrote notes; they trusted Sonnet 4.5)
- Cost: $180/month (100 meetings/week × 2 × $3 per 1M input tokens + output tokens)
- Quality: 96% of summaries marked as “good” by meeting organisers
- Adoption: 85% of team members regularly checked summaries; 60% said they’d changed how they run meetings to make summaries clearer
The system paid for itself in the first month (15 hours saved × $150/hour average salary = $2,250/month value; system cost was $180/month).
Most importantly, the team stopped re-running meetings to clarify decisions. Decisions were captured, distributed, and acted on the same day.
Security, Compliance, and Data Handling
Data Privacy and Transcript Storage
Meeting transcripts contain sensitive information: customer data, financial details, personal information about employees or customers, strategic plans.
When you send a transcript to Sonnet 4.5, you’re sending this data to Anthropic’s API. Anthropic doesn’t use your data to train models (they have a data privacy policy that’s explicit about this), but data does transit the internet and sit briefly on Anthropic servers.
For most teams, this is acceptable. But if you’re in a regulated industry (financial services, healthcare, government), you need to think carefully about data residency and compliance.
Our recommendation:
-
Classify your meetings. Not all meetings need summarisation. Classify them by sensitivity: public, internal, confidential, restricted. Only summarise internal and above; don’t send restricted meetings to external APIs.
-
Redact sensitive data. Before sending a transcript to Sonnet 4.5, redact customer names, financial figures, and other sensitive data. Replace them with placeholders (“[CUSTOMER]”, “[AMOUNT]”). The summary will still be useful, and you’ve reduced privacy risk.
-
Use on-premises models if you need data residency. If you can’t send data off-premises, you can run open-source models locally (e.g., Llama 2, Mistral). They’re cheaper and faster, but lower quality than Sonnet 4.5. We don’t recommend this for most teams, but it’s an option for highly regulated environments.
-
Audit and log everything. Log every transcript sent to Sonnet 4.5, every summary returned, and who accessed it. This is table stakes for compliance.
If you’re in financial services, review PADISO’s AI advisory for financial services, which covers APRA CPS 234 and ASIC RG 271 compliance for AI systems. If you’re in insurance, check PADISO’s insurance AI services, which covers APRA and LIF compliance.
SOC 2 and ISO 27001 Readiness
If you’re building a SaaS product or offering a service to enterprise customers, you’ll eventually need SOC 2 Type II or ISO 27001 certification. Meeting summarisation systems are in scope for these audits.
Key controls:
-
Access control. Who can view summaries? Implement role-based access control (RBAC). Only attendees and team leads can view summaries of their meetings.
-
Audit logging. Log every access to every summary. Who viewed it, when, from where. Use this log to detect unauthorised access.
-
Data retention. How long do you keep transcripts and summaries? Define a retention policy (e.g., 90 days for transcripts, 1 year for summaries) and enforce it.
-
Encryption. Encrypt transcripts and summaries at rest (AES-256) and in transit (TLS 1.2+).
-
Incident response. If a transcript is leaked or a summary is accessed by someone who shouldn’t have seen it, you need a process to detect, contain, and remediate. Document it.
These are standard controls. If you’re already SOC 2 compliant, you probably have most of them. The meeting summarisation system should plug into your existing infrastructure.
NIST AI Risk Management
The NIST AI Risk Management Framework is the gold standard for thinking about AI reliability, transparency, and risk. It’s not a compliance requirement (yet), but it’s worth reading, especially the sections on transparency and human oversight.
Key principles for meeting summarisation:
-
Human oversight. Summaries should be reviewed by a human before being acted on. This is especially important for decisions and action items.
-
Transparency. Users should know that summaries are AI-generated. They should be able to see the original transcript if they want to verify.
-
Failure detection. You should have automated checks to detect when Sonnet 4.5 produces a bad summary (hallucination, missed decisions, etc.). Flag these for human review.
-
Continuous monitoring. Track error rates, user feedback, and edge cases. Use this data to improve the system.
We’ve built all of these into the production system described above. The result is a system that’s reliable, transparent, and trustworthy.
Next Steps and Scaling Your Summarisation Pipeline
Building Your MVP in 4 Weeks
If you’re starting from scratch, here’s a realistic timeline:
Week 1: Architecture and integration
- Choose your transcription service (Zoom/Meet native, Deepgram, AssemblyAI)
- Design your API integration (sync vs. async, rate limiting, retry logic)
- Set up a simple storage layer (PostgreSQL, or even a JSON file to start)
- Write a basic Sonnet 4.5 summarisation script
Week 2: Prompt refinement
- Test your prompt on 20–30 real meeting transcripts
- Iterate based on feedback
- Add meeting-type-specific prompts
- Measure quality (error rate, user satisfaction)
Week 3: Validation and deployment
- Build automated validation gates
- Set up a human review workflow
- Deploy to a staging environment
- Run a pilot with a small group of users
Week 4: Production and iteration
- Deploy to production
- Collect feedback from users
- Fix bugs and refine the prompt
- Document the system
This timeline assumes you have a small team (1–2 engineers) and you’re building a simple system (transcription → summarisation → Slack). If you need more features (search, tagging, analytics), add 2–4 weeks.
Scaling to Enterprise Volume
Once you have a working MVP, scaling is mostly about infrastructure, not AI.
At 100 meetings/week:
- Synchronous API calls are fine
- PostgreSQL is fine for storage
- Slack integration is fine for distribution
- Cost is ~$200/month
At 500 meetings/week:
- Switch to asynchronous processing (job queue, worker pool)
- Add caching to reduce costs
- Implement rate limit handling
- Cost is ~$1,000/month
At 5,000 meetings/week:
- Distribute processing across multiple workers
- Implement cost optimisation (cheaper models for routine meetings, Sonnet 4.5 for complex ones)
- Add search and analytics (Elasticsearch, data warehouse)
- Cost is ~$10,000/month
At each scale, the challenge is operational, not technical. You need monitoring (is the system working?), alerting (is something broken?), and runbooks (how do we fix it?).
If you’re scaling beyond 1,000 meetings/week, consider working with a team that’s built this before. PADISO’s platform engineering services can help you design and build a production-grade system. We’ve done this for financial services, insurance, and other regulated industries.
Extending Beyond Summarisation
Once you have meeting summarisation working, the next steps are natural:
-
Action item tracking. Extract action items from summaries, assign them to people, track completion. This is a small extension that adds huge value.
-
Decision logging. Build a searchable database of decisions made in meetings. Over time, this becomes an institutional memory.
-
Meeting analytics. How many meetings per week? Who attends the most meetings? Which meetings generate the most action items? Use this data to improve meeting culture.
-
Integration with project management. Automatically create Jira tickets or Asana tasks from action items. Close the loop between meetings and execution.
-
Cross-meeting synthesis. If a topic is discussed in multiple meetings, synthesise the discussion into a single view. This helps with continuity.
Each of these is a small project (1–2 weeks of engineering). Together, they transform meeting summarisation from a nice-to-have into a core part of how your organisation runs.
When to Upgrade Your Prompt (or Model)
As you scale, you’ll want to improve quality. Here’s when to upgrade:
Upgrade your prompt when:
- Error rate is >10% (more than 1 in 10 summaries have hallucinations or missed decisions)
- Users are consistently marking summaries as “needs clarification”
- You see patterns in errors (e.g., action items in sales calls are consistently wrong)
Upgrade your model when:
- Sonnet 4.5 is hitting rate limits or taking too long
- Cost is a constraint (switch to Haiku for routine meetings)
- Quality is still poor after prompt optimisation (try Claude 3 Opus, the top-tier model, for a subset of meetings)
We’ve found that 80% of quality improvements come from prompt refinement, not model upgrades. Invest in prompt iteration first.
Measuring Success
Define metrics upfront:
- Quality: Error rate, user satisfaction, re-summarisation rate (how often do you have to re-run a meeting because the summary was wrong?)
- Adoption: Percentage of team using summaries, percentage of meetings summarised
- Impact: Time saved, decisions made faster, action items completed on time
- Cost: Cost per meeting, total monthly cost
Measure these weekly. Use them to guide improvements. If quality is dropping, invest in prompt refinement. If adoption is low, invest in distribution (better Slack integration, better search). If cost is high, invest in optimisation (caching, cheaper models).
Over 12 months, you should see:
- Quality: 95%+ summaries marked as good
- Adoption: 80%+ of team using summaries
- Impact: 10–15 hours/week saved, decisions made 2–3x faster
- Cost: $200–500/month (depending on volume)
Conclusion: From Demo to Production
Sonnet 4.5 is a powerful tool for meeting summarisation. But the gap between a working demo and a production system is real. It spans prompt design, output validation, cost optimisation, security, and operational maturity.
This guide covers the patterns we’ve battle-tested across 15+ client deployments. The core insight is simple: focus on clarity and actionability, not comprehensiveness. A 300-word summary with 5 clear action items beats a 1,500-word summary with 50 vague action items.
Start with a simple architecture (transcription → summarisation → Slack). Iterate on the prompt based on real feedback. Add validation gates to catch errors. Measure quality, adoption, and impact. Scale incrementally.
If you’re building this for the first time, expect 4 weeks to MVP and 12 weeks to production maturity. If you’re scaling to enterprise volume (1,000+ meetings/week), budget for infrastructure work and operational tooling.
For teams in Australia, we’ve built this at PADISO. We can help with AI strategy and readiness, platform engineering, and security and compliance. We’ve shipped meeting summarisation systems for financial services, insurance, and tech teams. If you want to move fast and avoid the pitfalls, book a 30-minute call.
For teams globally, the patterns in this guide apply. The infrastructure is cloud-agnostic (AWS, GCP, Azure). The prompt design and validation logic are model-agnostic (works with Sonnet 4.5, Haiku, GPT-4, Gemini). The principles are timeless.
Start small. Ship fast. Measure impact. Iterate. That’s how you turn a prototype into a system your team trusts.