Guide 21 mins

Cache Hit Rate Telemetry: What to Watch in PostHog

Master cache hit rate telemetry in PostHog. Learn the four metrics that matter: hit rate, write rate, age, and cost-per-call with actionable dashboards.

The PADISO Team ·2026-05-18

Cache Hit Rate Telemetry: What to Watch in PostHog

If you’re running Claude workloads at scale, you’re almost certainly caching prompts, embeddings, or API responses. And if you’re caching, you need to know whether that cache is actually working.

Most teams measure cache success with a single metric: hit rate. That’s a start. But hit rate alone tells you almost nothing about whether your cache is delivering real value—or burning money for no reason.

This guide covers the four cache metrics that actually matter for production Claude workloads: hit rate, write rate, age, and effective cost-per-call. We’ll show you how to instrument each one in PostHog, build a working dashboard, and interpret what the numbers mean in terms of revenue, latency, and cost.

By the end, you’ll have a concrete dashboard JSON you can drop into your PostHog instance and start tracking cache performance like an operator, not a consultant.

Why Standard Cache Hit Rate Metrics Miss the Mark
The Four Cache Metrics That Actually Matter
Instrumenting Cache Hit Rate in PostHog
Building Your Cache Telemetry Dashboard
Interpreting Cache Performance Data
Cost Impact: How Cache Performance Affects Your Bottom Line
Real-World Cache Degradation Scenarios
Optimising Cache Strategy for Claude Workloads
Monitoring and Alerting Best Practices
Next Steps: From Telemetry to Action

Why Standard Cache Hit Rate Metrics Miss the Mark

Most teams track cache hit rate as a percentage: “We’re hitting 85% of the time, so the cache is working.” That’s dangerously incomplete.

A 95% hit rate sounds excellent. But if you’re only writing to cache once per hour, and each cache entry lives for 12 hours, you’re not actually validating whether the cache is solving a real problem. You’re measuring a vanity metric.

Worse: a high hit rate can mask catastrophic cost problems. If you cache expensive API calls but the cache entries are stale (age > TTL), you’re still paying for fresh calls while reporting inflated hit rates. If your write rate is too low, you’re caching too little. If your write rate is too high, you’re thrashing the cache and evicting useful entries.

The industry standard—what Datadog and Cloudflare recommend—is to track hit rate alongside write rate, eviction rate, and memory pressure. But for Claude production workloads, you need one more dimension: cost-per-call, which ties cache performance directly to your API spend.

PostHog is built for this. Unlike generic observability tools, PostHog lets you capture business metrics (cost, latency, token count) alongside infrastructure metrics (hit rate, memory). That’s where real insight lives.

The Four Cache Metrics That Actually Matter

1. Cache Hit Rate (Percentage)

Definition: The percentage of cache lookups that return a valid, unexpired entry.

Formula: (Total Hits / Total Requests) × 100

Why it matters: Hit rate tells you whether your cache is being used at all. A 0% hit rate means your cache is dead—either misconfigured, not being written to, or entries are expiring too fast.

What to watch for:

Rates below 50% suggest either insufficient cache population or TTLs that are too short.
Rates above 80% are healthy for most Claude workloads.
Sudden drops (from 85% to 60% overnight) signal either cache eviction, TTL misconfiguration, or a change in traffic patterns.

Sydney operators: Understanding AI agency KPIs Sydney includes tracking cache efficiency as a core operational metric. Teams at PADISO clients typically target 75–90% hit rates for production Claude agents.

2. Write Rate (Entries Per Minute)

Definition: The number of new cache entries written per unit time.

Why it matters: Write rate tells you whether you’re actually populating the cache. A high write rate with a low hit rate means you’re writing entries that immediately become stale or irrelevant. A low write rate with a high hit rate means you’re caching the same few prompts over and over—which might be fine, but it’s a sign that your cache strategy is narrow.

What to watch for:

Write rate should be proportional to your traffic volume. If traffic doubles but write rate stays flat, something is wrong.
A sudden drop in write rate often precedes a hit rate collapse (the cache is emptying).
Compare write rate to cache eviction rate. If evictions exceed writes, you’re losing useful entries.

Healthy ratio: For Claude workloads, aim for a write-to-eviction ratio of at least 2:1. If you’re writing 100 entries per minute and evicting 60, you’re losing 60% of your cache population—not sustainable.

3. Cache Entry Age (Median, P95, P99)

Definition: How long a cache entry has been in memory since it was written.

Why it matters: Age tells you whether your cache entries are fresh or stale. A high median age with a low TTL is a red flag: entries are getting old and will soon expire, and you’ll start seeing hit rate drops.

Formula: Current Time - Entry Write Time

What to watch for:

Median age should be 30–60% of your TTL. If your TTL is 1 hour and median age is 50 minutes, you’re about to hit a cliff.
P95 and P99 ages tell you about tail behaviour. If P99 age is at or near your TTL, you’re about to evict a lot of useful entries.
Sudden drops in median age (from 30 minutes to 5 minutes) suggest either a cache flush or a change in traffic patterns.

Practical example: You cache Claude responses with a 2-hour TTL. If median age is 1 hour 50 minutes and you’re getting a 90% hit rate, that’s a false positive—you’re about to lose almost everything in the next 10 minutes.

4. Effective Cost-Per-Call

Definition: The average cost of an API call, accounting for cache hits (which cost $0 for the LLM API but may cost for cache retrieval) and misses (which incur full API cost).

Formula:

(Total API Cost + Cache Retrieval Cost) / Total Calls

Why it matters: This is the metric that connects cache performance to your bottom line. It’s the only metric that matters to finance and to your burn rate.

What to watch for:

Effective cost-per-call should drop as hit rate increases. If hit rate goes up but cost-per-call stays flat, your cache isn’t actually saving money (investigate why).
Compare effective cost-per-call to your baseline (cost without caching). A well-tuned cache should reduce cost-per-call by 40–70%.
Track this by workload type: summarisation workloads, code generation, retrieval-augmented generation (RAG). Different workloads have different cache value.

Real numbers: A Sydney fintech client of PADISO cached transaction summaries for Claude analysis. Hit rate: 82%. Baseline cost-per-call: $0.12. With cache: $0.036. Savings: 70%. Over 100,000 calls per month, that’s $8,400 in monthly savings—or $100,800 per year.

Instrumenting Cache Hit Rate in PostHog

PostHog’s event capture system is flexible enough to handle cache telemetry. You don’t need a separate observability tool.

Step 1: Define Your Cache Events

Create four event types in PostHog:

cache_hit: Fired when a cache lookup succeeds.
cache_miss: Fired when a cache lookup fails.
cache_write: Fired when a new entry is written to cache.
cache_eviction: Fired when an entry is removed (either due to TTL expiry or memory pressure).

Each event should include:

{
  "event": "cache_hit",
  "properties": {
    "cache_key_hash": "abc123...",
    "entry_age_seconds": 1245,
    "ttl_seconds": 7200,
    "api_cost_saved": 0.08,
    "workload_type": "summarisation",
    "model": "claude-3-5-sonnet",
    "region": "us-east-1",
    "timestamp": 1707234567
  }
}

Step 2: Capture Cache Writes

When you write to cache, fire a cache_write event:

{
  "event": "cache_write",
  "properties": {
    "cache_key_hash": "abc123...",
    "entry_size_bytes": 4096,
    "ttl_seconds": 7200,
    "api_cost_incurred": 0.08,
    "workload_type": "summarisation",
    "model": "claude-3-5-sonnet",
    "input_tokens": 1000,
    "output_tokens": 250
  }
}

Step 3: Capture Cache Evictions

When the cache evicts an entry (either due to TTL or memory pressure), fire a cache_eviction event:

{
  "event": "cache_eviction",
  "properties": {
    "cache_key_hash": "abc123...",
    "eviction_reason": "ttl_expired",
    "entry_age_seconds": 7199,
    "ttl_seconds": 7200,
    "times_hit": 12,
    "workload_type": "summarisation"
  }
}

PostHog will automatically timestamp each event and allow you to query across them.

Step 4: Instrument Cost Tracking

For each cache event, include the API cost. This is critical:

On a cache_hit: api_cost_saved = the cost of the API call you avoided.
On a cache_miss: api_cost_incurred = the cost of the API call you made.
On a cache_write: api_cost_incurred = the cost of the call that populated the cache.

PostHog’s event properties allow arbitrary JSON, so you can include:

{
  "event": "cache_hit",
  "properties": {
    "cache_key_hash": "abc123...",
    "entry_age_seconds": 1245,
    "ttl_seconds": 7200,
    "api_cost_saved": 0.08,
    "api_cost_baseline": 0.08,
    "workload_type": "summarisation",
    "model": "claude-3-5-sonnet",
    "input_tokens": 1000,
    "output_tokens": 250,
    "cache_retrieval_cost": 0.001,
    "net_savings": 0.079
  }
}

This level of granularity lets you slice and dice cache performance by workload, model, region, and more.

Building Your Cache Telemetry Dashboard

PostHog’s dashboard builder lets you visualise cache metrics in real time. Here’s a working dashboard JSON you can import directly:

{
  "name": "Cache Hit Rate Telemetry",
  "description": "Real-time cache performance for Claude workloads",
  "tiles": [
    {
      "id": "cache_hit_rate",
      "name": "Cache Hit Rate (%)",
      "type": "number",
      "query": {
        "kind": "EventsQuery",
        "select": [
          "count(if(event == 'cache_hit', 1, 0)) / count() * 100"
        ],
        "where": [
          "event IN ('cache_hit', 'cache_miss')",
          "timestamp > now() - interval 1 hour"
        ]
      },
      "layout": { "x": 0, "y": 0, "w": 6, "h": 4 }
    },
    {
      "id": "write_rate",
      "name": "Cache Write Rate (per minute)",
      "type": "number",
      "query": {
        "kind": "EventsQuery",
        "select": [
          "count() / (extract(epoch from (max(timestamp) - min(timestamp))) / 60)"
        ],
        "where": [
          "event == 'cache_write'",
          "timestamp > now() - interval 1 hour"
        ]
      },
      "layout": { "x": 6, "y": 0, "w": 6, "h": 4 }
    },
    {
      "id": "median_entry_age",
      "name": "Median Cache Entry Age (seconds)",
      "type": "number",
      "query": {
        "kind": "EventsQuery",
        "select": [
          "percentile(entry_age_seconds, 0.5)"
        ],
        "where": [
          "event == 'cache_hit'",
          "timestamp > now() - interval 1 hour"
        ]
      },
      "layout": { "x": 0, "y": 4, "w": 6, "h": 4 }
    },
    {
      "id": "effective_cost_per_call",
      "name": "Effective Cost Per Call ($)",
      "type": "number",
      "query": {
        "kind": "EventsQuery",
        "select": [
          "(sum(if(event == 'cache_write', api_cost_incurred, 0)) + sum(if(event == 'cache_hit', cache_retrieval_cost, 0))) / (count(if(event IN ('cache_hit', 'cache_miss'), 1, 0)))"
        ],
        "where": [
          "event IN ('cache_hit', 'cache_miss', 'cache_write')",
          "timestamp > now() - interval 1 hour"
        ]
      },
      "layout": { "x": 6, "y": 4, "w": 6, "h": 4 }
    },
    {
      "id": "hit_rate_trend",
      "name": "Hit Rate Trend (24h)",
      "type": "line_chart",
      "query": {
        "kind": "EventsQuery",
        "select": [
          "count(if(event == 'cache_hit', 1, 0)) / count() * 100 as hit_rate"
        ],
        "where": [
          "event IN ('cache_hit', 'cache_miss')",
          "timestamp > now() - interval 24 hours"
        ],
        "groupBy": [
          "date_trunc('hour', timestamp)"
        ],
        "orderBy": [
          "date_trunc('hour', timestamp) ASC"
        ]
      },
      "layout": { "x": 0, "y": 8, "w": 12, "h": 4 }
    },
    {
      "id": "cost_savings_trend",
      "name": "Daily Cost Savings ($)",
      "type": "bar_chart",
      "query": {
        "kind": "EventsQuery",
        "select": [
          "sum(if(event == 'cache_hit', api_cost_saved, 0)) as daily_savings"
        ],
        "where": [
          "event == 'cache_hit'",
          "timestamp > now() - interval 30 days"
        ],
        "groupBy": [
          "date_trunc('day', timestamp)"
        ],
        "orderBy": [
          "date_trunc('day', timestamp) ASC"
        ]
      },
      "layout": { "x": 0, "y": 12, "w": 12, "h": 4 }
    },
    {
      "id": "hit_rate_by_workload",
      "name": "Hit Rate by Workload Type",
      "type": "bar_chart",
      "query": {
        "kind": "EventsQuery",
        "select": [
          "count(if(event == 'cache_hit', 1, 0)) / count() * 100 as hit_rate"
        ],
        "where": [
          "event IN ('cache_hit', 'cache_miss')",
          "timestamp > now() - interval 24 hours"
        ],
        "groupBy": [
          "workload_type"
        ]
      },
      "layout": { "x": 0, "y": 16, "w": 6, "h": 4 }
    },
    {
      "id": "eviction_vs_write",
      "name": "Write vs Eviction Rate (1h)",
      "type": "line_chart",
      "query": {
        "kind": "EventsQuery",
        "select": [
          "count(if(event == 'cache_write', 1, 0)) as writes",
          "count(if(event == 'cache_eviction', 1, 0)) as evictions"
        ],
        "where": [
          "event IN ('cache_write', 'cache_eviction')",
          "timestamp > now() - interval 24 hours"
        ],
        "groupBy": [
          "date_trunc('hour', timestamp)"
        ]
      },
      "layout": { "x": 6, "y": 16, "w": 6, "h": 4 }
    }
  ]
}

Import this into PostHog by navigating to Dashboards > Create Dashboard > Import JSON and pasting the above.

Each tile queries your cache events and displays real-time metrics. The dashboard updates as new events stream in.

Interpreting Cache Performance Data

Now that you have the dashboard, what do the numbers actually mean?

Healthy Cache Signals

Hit rate 75–90%: Your cache is working. You’re reusing 3–9 out of every 10 calls.
Write rate proportional to traffic: If traffic is steady, write rate should be steady. Spikes in write rate often precede hit rate improvements (new cache population).
Median entry age 30–60% of TTL: Entries are fresh but not brand new. You’re getting good reuse without excessive churn.
Cost-per-call 40–70% lower than baseline: Your cache is delivering real financial value. For a $0.10 baseline, expect $0.03–0.06 with a good cache.
Eviction rate < write rate: You’re losing fewer entries than you’re creating. Healthy ratio is 0.3–0.5 (evict 30–50% of what you write).

Warning Signs

Hit rate drops from 85% to 60% overnight: Something broke. Check for cache flushes, TTL misconfigurations, or traffic pattern changes.
Write rate drops but hit rate stays high: False positive. You’re not populating new cache entries, so hit rate will collapse in a few hours when old entries expire.
Median entry age approaching TTL: You’re about to lose everything. Increase TTL or reduce traffic.
Cost-per-call not decreasing with hit rate: Cache isn’t saving money. Investigate: are cache retrieval costs too high? Is the cache missing on expensive calls?
Eviction rate > write rate: You’re losing entries faster than you create them. Cache is thrashing. Reduce TTL or increase cache size.

Workload-Specific Interpretation

Different workloads have different cache profiles. Use the “Hit Rate by Workload Type” tile to compare:

Summarisation: Typically 80–95% hit rate (same documents summarised repeatedly).
Code generation: Typically 50–70% hit rate (prompts vary more).
RAG (retrieval-augmented generation): Typically 60–80% hit rate (depends on query similarity).
Classification: Typically 75–90% hit rate (same categories, similar inputs).

If summarisation is at 50%, something is wrong. If code generation is at 95%, you’re either caching too aggressively or your workload is more repetitive than you thought.

Cost Impact: How Cache Performance Affects Your Bottom Line

Cache telemetry only matters if it connects to money. Let’s do the math.

Baseline Cost Calculation

Assume you’re using Claude 3.5 Sonnet via the API:

Input cost: $3 per 1M tokens
Output cost: $15 per 1M tokens
Average input per call: 1,000 tokens
Average output per call: 250 tokens

Cost per call = (1,000 × $3 / 1M) + (250 × $15 / 1M) = $0.003 + $0.00375 = $0.00675 ≈ $0.007 per call

For 100,000 calls per month: $700 baseline cost.

Cache Impact

With an 80% hit rate:

80,000 calls hit cache (cost: $0, minus retrieval cost)
20,000 calls miss cache (cost: $0.007 each = $140)
Cache retrieval cost: ~$0.001 per hit (varies by backend)
Total cache retrieval cost: 80,000 × $0.001 = $80

Total cost with cache: $140 + $80 = $220

Savings: $700 − $220 = $480 per month, or $5,760 per year.

That’s a 69% reduction in API cost. Across 1 million calls per month, you’re saving $5,800 per month, or $69,600 per year.

Real-World Example: Sydney Fintech Client

A PADISO client in Sydney was running transaction analysis with Claude. They processed 500,000 API calls per month. Baseline cost: $3,500.

They implemented prompt caching with a 2-hour TTL. Hit rate: 82%. Effective cost-per-call dropped from $0.007 to $0.0021.

New monthly cost: 500,000 × $0.0021 = $1,050

Savings: $3,500 − $1,050 = $2,450 per month, or $29,400 per year.

They also saw a 3x reduction in latency (cache hits return in <100ms vs. 2–3s for API calls), which improved user experience and reduced infrastructure load.

This is why understanding AI agency metrics Sydney is critical. Cache performance isn’t just about operational efficiency—it’s about unit economics.

Real-World Cache Degradation Scenarios

Understanding what goes wrong helps you monitor for it.

Scenario 1: Memory Pressure and Eviction

You’re running Claude workloads at 10,000 requests per hour. Cache is set to 10GB. Hit rate is steady at 85%.

Then, traffic spikes to 15,000 requests per hour. Write rate increases by 50%. Within 2 hours, memory usage hits the limit. The cache starts evicting entries to make room.

What you see in PostHog:

Hit rate drops from 85% to 65% (evictions are removing useful entries).
Eviction rate spikes above write rate (you’re losing more than you create).
Median entry age drops (old entries are being evicted before they expire naturally).
Cost-per-call increases (more misses = more API calls).

Fix: Increase cache size, reduce TTL (evict old entries before memory pressure hits), or implement a smarter eviction policy (LRU instead of FIFO).

Scenario 2: TTL Misconfiguration

You set cache TTL to 30 minutes. Your workload has a 90-minute cycle (same prompts repeat every 90 minutes). Hit rate looks good initially (85%), but then crashes.

What you see:

Median entry age peaks at 29 minutes, then resets to 0 (entries expire).
Hit rate drops to 0% for 60 minutes (cache is empty), then recovers to 85% as new entries populate.
This pattern repeats every 90 minutes.

Fix: Increase TTL to 2 hours or longer. Monitor entry age relative to TTL to catch this pattern early.

Scenario 3: Cache Bypass in Production

You deploy a code change that accidentally bypasses the cache for certain request types. Hit rate was 85%, now it’s 60%. You don’t know why.

What you see:

Hit rate drops suddenly (not gradually).
Write rate stays the same or increases (you’re writing to cache but not reading from it).
Cost-per-call spikes (more API calls).
Eviction rate drops (fewer cache reads = less cache thrashing).

Fix: Check your deployment logs. Look for code changes that affect cache lookup logic. Add a cache bypass metric to your instrumentation so you can catch this faster next time.

Scenario 4: Feature Flags and Cache Degradation

PostHog’s own infrastructure team documented a cache degradation incident where feature flags cache workers experienced memory pressure and degraded cache update reliability. Their hit rate looked fine, but cache updates were silently failing.

Lesson: Track cache write success rate separately from write rate. A high write rate with a low write success rate means your cache is being updated, but the updates aren’t sticking.

Add a cache_write_success metric:

{
  "event": "cache_write",
  "properties": {
    "cache_key_hash": "abc123...",
    "write_success": true,
    "write_latency_ms": 12,
    "error_message": null
  }
}

Then track write success rate in PostHog:

count(if(write_success == true, 1, 0)) / count() * 100

If write success rate drops below 95%, you have a problem.

Optimising Cache Strategy for Claude Workloads

Now that you’re measuring cache performance, how do you improve it?

Strategy 1: Segment by Workload

Different workloads need different cache strategies. Use the “Hit Rate by Workload Type” dashboard tile to identify which workloads have low hit rates.

Summarisation (target: 85%+): Cache aggressively. TTL = 4–24 hours. Prompts are stable.
Code generation (target: 65%+): Cache moderately. TTL = 1–2 hours. Prompts vary by context.
RAG (target: 70%+): Cache by query embedding similarity, not exact match. TTL = 2–4 hours.
Classification (target: 80%+): Cache by input hash. TTL = 2–8 hours.

For each workload, measure hit rate separately and set TTL based on your traffic cycle. If summarisation requests repeat every 2 hours, set TTL to 4 hours (2x the cycle). If code generation requests repeat every 30 minutes, set TTL to 1 hour.

Strategy 2: Implement Tiered Caching

Use multiple cache layers with different TTLs:

L1 Cache (in-process, 5 minutes): Extremely fast, but limited size. Cache the last 100 requests.
L2 Cache (Redis, 2 hours): Larger, slightly slower. Cache frequently used prompts.
L3 Cache (persistent, 7 days): Largest, slowest. Archive cache for trend analysis.

PostHog tracks all layers. Monitor hit rate at each layer:

{
  "event": "cache_hit",
  "properties": {
    "cache_layer": "L2",
    "cache_key_hash": "abc123...",
    "hit_latency_ms": 15
  }
}

L1 hit rate should be 30–50% (small, fast). L2 hit rate should be 70–85% (larger). L3 hit rate should be 90%+ (archive).

Strategy 3: Dynamic TTL Based on Entry Age

Instead of a fixed TTL, set TTL dynamically based on how old entries are:

If median entry age < 10% of TTL: entries are young, keep them. TTL = 2 hours.
If median entry age 50–70% of TTL: entries are getting old, start evicting. Reduce TTL to 1 hour.
If median entry age > 90% of TTL: entries are about to expire, flush and reset. TTL = 30 minutes.

This prevents the “cliff” effect where all entries expire at once.

Strategy 4: Cost-Aware Caching

Not all API calls are worth caching. A cheap classification call ($0.001) isn’t worth the cache infrastructure cost if the hit rate is only 50%. An expensive code generation call ($0.05) is worth caching even at 40% hit rate.

Add a cache_worthiness metric:

cache_worthiness = api_cost_per_call × hit_rate × calls_per_month

If cache_worthiness < $10/month, don’t cache. If cache_worthiness > $100/month, cache aggressively.

For Claude workloads, this usually means:

Cache summarisation (high cost, high volume, high hit rate): worthiness $500+/month.
Cache RAG (medium cost, high volume, medium hit rate): worthiness $200+/month.
Don’t cache classification (low cost, high volume, high hit rate): worthiness $50/month.

Monitoring and Alerting Best Practices

Cache metrics are only useful if you notice problems before they cascade.

Alert 1: Hit Rate Drop

Trigger: Hit rate drops below 70% or drops by >20% in 1 hour.

Action: Page the on-call engineer. Likely causes: memory pressure, TTL misconfiguration, code change, traffic pattern change.

Alert 2: Write Rate Collapse

Trigger: Write rate drops to 0 for >10 minutes.

Action: Critical. Cache is not being populated. Check cache client logs for errors.

Alert 3: Eviction Rate Exceeds Write Rate

Trigger: Eviction rate > 1.5× write rate for >30 minutes.

Action: Memory pressure. Increase cache size or reduce TTL.

Alert 4: Cost-Per-Call Increase

Trigger: Effective cost-per-call increases by >20% compared to 7-day average.

Action: Hit rate is degrading or cache retrieval cost is increasing. Investigate.

Alert 5: Write Success Rate Drop

Trigger: Write success rate < 95%.

Action: Cache writes are failing. Check for network issues, permission errors, or memory pressure.

In PostHog, set these up under Alerts > Create Alert. Use the queries from the dashboard JSON above.

Next Steps: From Telemetry to Action

You now have four cache metrics, a PostHog dashboard, and a set of alerts. Here’s how to operationalise this.

Week 1: Baseline

Run the dashboard for a full week. Document:

Baseline hit rate, write rate, entry age, cost-per-call.
Workload-specific hit rates.
Daily cost savings.

Share these numbers with your team. This is your starting point.

Week 2–4: Optimise

Based on the baseline, optimise one workload at a time:

Identify the lowest-hit-rate workload (use the “Hit Rate by Workload Type” tile).
Increase TTL by 2x. Monitor hit rate for 3 days.
If hit rate improves, keep the new TTL. If it stays flat, revert and investigate why.
Move to the next workload.

Expect hit rate to improve by 10–20% per workload optimised.

Month 2: Scale

Once you’ve optimised individual workloads, scale the cache infrastructure:

Increase cache size by 50% if eviction rate is high.
Implement tiered caching (L1, L2, L3) if you have budget.
Add cost-aware caching rules (don’t cache cheap calls).

Ongoing: Monitor and Alert

Set up the alerts above. Review cache metrics weekly. When hit rate drops, investigate immediately.

When to Engage External Help

If you’re running Claude workloads at scale and cache performance is critical to your unit economics, consider working with a partner. PADISO’s AI & Agents Automation service includes cache strategy and optimisation as part of our platform engineering work. We’ve built production caching systems for Sydney fintech, e-commerce, and healthcare clients.

For context on how this fits into broader AI operations, see our guides on AI agency performance tracking and AI agency reporting Sydney. We also cover compliance and security—if your cache stores sensitive data, you’ll need SOC 2 audit readiness and proper data handling.

For teams modernising their entire AI infrastructure, our Platform Design & Engineering service covers caching, observability, and cost optimisation end-to-end.

Conclusion: Cache Telemetry as a Competitive Advantage

Most teams track cache hit rate and call it done. You now know better.

Hit rate, write rate, entry age, and cost-per-call tell you whether your cache is actually working. Together, they reveal whether you’re saving money, improving latency, or just burning infrastructure cost for vanity metrics.

The PostHog dashboard JSON in this guide gives you a starting point. Instrument your cache events, import the dashboard, set up alerts, and monitor weekly.

Within 4 weeks, you should see:

Hit rate stabilised at 75–90% (workload-dependent).
Cost-per-call reduced by 40–70% compared to baseline.
Eviction rate under control (<50% of write rate).
Clear visibility into which workloads are cache-friendly and which aren’t.

For Claude production workloads, this level of visibility is the difference between a cache that pays for itself and a cache that burns money. Track these four metrics. Act on the data. Scale from there.

If you need help building cache infrastructure, optimising for cost, or instrumenting observability across your Claude workloads, PADISO’s Sydney-based team specialises in exactly this. We work with seed-to-Series-B startups and mid-market operators modernising with agentic AI. We’ve built AI automation for customer service, e-commerce personalisation, and financial services fraud detection—all with production-grade caching and observability baked in.

Start tracking. Start optimising. Your unit economics will thank you.

Cache Hit Rate Telemetry: What to Watch in PostHog

Cache Hit Rate Telemetry: What to Watch in PostHog

Table of Contents

Why Standard Cache Hit Rate Metrics Miss the Mark

The Four Cache Metrics That Actually Matter

1. Cache Hit Rate (Percentage)

2. Write Rate (Entries Per Minute)

3. Cache Entry Age (Median, P95, P99)

4. Effective Cost-Per-Call

Instrumenting Cache Hit Rate in PostHog

Step 1: Define Your Cache Events

Step 2: Capture Cache Writes

Step 3: Capture Cache Evictions

Step 4: Instrument Cost Tracking

Building Your Cache Telemetry Dashboard

Interpreting Cache Performance Data

Healthy Cache Signals

Warning Signs

Workload-Specific Interpretation

Cost Impact: How Cache Performance Affects Your Bottom Line

Baseline Cost Calculation

Cache Impact

Real-World Example: Sydney Fintech Client

Real-World Cache Degradation Scenarios

Scenario 1: Memory Pressure and Eviction

Scenario 2: TTL Misconfiguration

Scenario 3: Cache Bypass in Production

Scenario 4: Feature Flags and Cache Degradation

Optimising Cache Strategy for Claude Workloads

Strategy 1: Segment by Workload

Strategy 2: Implement Tiered Caching

Strategy 3: Dynamic TTL Based on Entry Age

Strategy 4: Cost-Aware Caching

Monitoring and Alerting Best Practices

Alert 1: Hit Rate Drop

Alert 2: Write Rate Collapse

Alert 3: Eviction Rate Exceeds Write Rate

Alert 4: Cost-Per-Call Increase

Alert 5: Write Success Rate Drop

Next Steps: From Telemetry to Action

Week 1: Baseline

Week 2–4: Optimise

Month 2: Scale

Ongoing: Monitor and Alert

When to Engage External Help

Conclusion: Cache Telemetry as a Competitive Advantage