User Tools

Site Tools


skills:cache

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
skills:cache [2026/05/26 06:49] – created phong2018skills:cache [2026/05/26 06:58] (current) – [Sequence Diagram] phong2018
Line 3: Line 3:
 ===== Overview ===== ===== Overview =====
  
-Cache invalidation is the process of keeping cached data consistent with the database when data changes.+Cache invalidation ensures cached data stays consistent with the database in distributed systems.
  
-In real production systems, the main challenge is not caching itself, but:+In real production systems, the hardest problem is:
  
-Ensuring consistency between cache and database at scale in distributed systems.+Handling concurrency and distributed timing while keeping cache consistent with DB.
  
-Most real systems use multiple patterns together, not just one.+Most systems use a combination of patterns, not a single approach.
  
 ===== 1. Cache-Aside (Most Common Pattern) ===== ===== 1. Cache-Aside (Most Common Pattern) =====
Line 15: Line 15:
 ==== Concept ==== ==== Concept ====
  
-Application manages cache manually. +Application manages cache manually. Cache is a read-through optimization layer, DB is the source of truth.
- +
-Cache is treated as a read-through layer, and DB remains the source of truth.+
  
 ==== Read Flow ==== ==== Read Flow ====
Line 26: Line 24:
  
 <code> Client → Application → Database update → delete cache key </code> <code> Client → Application → Database update → delete cache key </code>
- 
-==== Key Idea ==== 
- 
-Cache is disposable. If wrong → just delete it. 
  
 ==== Where it is used ==== ==== Where it is used ====
  
-E-commerce systems +Microservices systems 
-Social platforms +E-commerce platforms 
-Microservices architectures +Social applications
-Most production APIs+
  
 ==== Pros ==== ==== Pros ====
Line 43: Line 36:
 Safe Safe
 Easy to debug Easy to debug
-Works well in distributed systems+Flexible
  
 ==== Cons ==== ==== Cons ====
  
-First read after cache miss is slow +Cache miss causes DB hit 
-Possible stale window (very short)+Small stale window possible
  
 ===== 2. Write-Through Cache ===== ===== 2. Write-Through Cache =====
Line 54: Line 47:
 ==== Concept ==== ==== Concept ====
  
-Every write goes through cache first, then DB.+Write goes through cache first, then DB.
  
 ==== Flow ==== ==== Flow ====
  
 <code> Client → Cache → Database </code> <code> Client → Cache → Database </code>
- 
-==== Where it is used ==== 
- 
-Strong consistency systems 
-Limited financial systems 
-Some controlled caching layers 
  
 ==== Pros ==== ==== Pros ====
  
-Cache always consistent +Strong consistency 
-No stale data+No stale cache
  
 ==== Cons ==== ==== Cons ====
  
 Slower writes Slower writes
-Higher write cost+Higher cost
  
-===== 3. Write-Behind (Write-Back) Cache =====+===== 3. Write-Behind Cache =====
  
 ==== Concept ==== ==== Concept ====
  
-Write goes to cache first, DB update happens asynchronously.+Write goes to cache first, DB updated asynchronously.
  
 ==== Flow ==== ==== Flow ====
  
-<code> Client → Cache (ACK immediately) ↓ async Database </code> +<code> Client → Cache (ACK) ↓ async Database </code>
- +
-==== Where it is used ==== +
- +
-Gaming leaderboards +
-Analytics systems +
-High-throughput systems+
  
 ==== Pros ==== ==== Pros ====
  
 Very fast writes Very fast writes
-High performance+High throughput
  
 ==== Cons ==== ==== Cons ====
  
 Risk of data loss Risk of data loss
-Complex recovery logic+Complex recovery
  
 ===== 4. TTL-Based Invalidation ===== ===== 4. TTL-Based Invalidation =====
Line 106: Line 87:
 ==== Concept ==== ==== Concept ====
  
-Cache automatically expires after a fixed time.+Cache expires automatically after time.
  
 ==== Example ==== ==== Example ====
  
-<code> user:123 → expires in 5 minutes </code> +<code> user:123 → TTL = 5 minutes </code>
- +
-==== Where it is used ==== +
- +
-Everywhere as fallback safety +
-Session caching +
-API response caching+
  
 ==== Pros ==== ==== Pros ====
  
-Very simple +Simple 
-No coordination needed+Safe fallback
  
 ==== Cons ==== ==== Cons ====
  
-Data may be stale until expiration+Stale data until expiration
  
 ===== 5. Event-Driven Invalidation ===== ===== 5. Event-Driven Invalidation =====
Line 131: Line 106:
 ==== Concept ==== ==== Concept ====
  
-When database changes, an event is published to notify cache systems to invalidate or update data.+Database changes emit events to invalidate cache.
  
 ==== Flow ==== ==== Flow ====
  
-<code> Service → Database update → Event (UserUpdated) ↓ Cache Service → delete/update cache </code> +<code> Service → Database update → Event (UserUpdated) ↓ Cache Service → delete/update cache </code>
- +
-==== Technologies ==== +
- +
-Kafka +
-RabbitMQ +
-AWS SNS/SQS +
-Redis Pub/Sub +
- +
-==== Where it is used ==== +
- +
-Uber-like systems +
-Airbnb-like systems +
-Large-scale microservices+
  
 ==== Pros ==== ==== Pros ====
  
-Decoupled architecture +Decoupled systems 
-Scales well +Scalable
-Works across services+
  
 ==== Cons ==== ==== Cons ====
  
-Eventual consistency only +Eventual consistency 
-Possible event delay or loss+Event delay or loss possible
  
-==== Production fixes ====+===== 6. Update-on-write vs Invalidate-on-write =====
  
-Retry mechanisms +==== 🟥 6.1 Update-on-write (RISKY under concurrency====
-Idempotent consumers +
-Dead Letter Queue (DLQ) +
-Event versioning+
  
-===== 6. Hybrid Pattern (Real Production Standard) =====+===== Problem =====
  
-==== Concept ====+Concurrent updates can cause race conditions between DB and cache.
  
-Real systems combine multiple patterns together.+===== Scenario =====
  
-==== Typical architecture ====+Request A updates value "A" 
 +Request B updates value "B"
  
-<code> Write Path: Client → Service → Database → publish event → delete cache 
  
-Read Path: +===== Problem =====
-Client → Service → Cache +
-↓ miss +
-Database → Cache+
  
-Safety Net: +If execution order is mixed:
-TTL expiration always enabled +
-</code>+
  
-==== Why hybrid is used ====+DB 
 +Cache A ❌
  
-Each pattern solves a different problem:+===== Conclusion =====
  
-Cache-aside → simplicity +Update-on-write is unsafe due to:
-Event-driven → consistency +
-TTL → safety fallback+
  
-===== 7. Outbox Pattern (Reliability Layer) =====+Race conditions 
 +Out-of-order execution 
 +Distributed timing issues
  
-==== Concept ====+==== 🟩 6.2 Invalidate-on-write (SAFE pattern) ====
  
-Ensures database updates and events are consistent.+===== Concept =====
  
-==== Flow ====+Do NOT update cache. Only delete it.
  
-<code> Service → Database transaction → Outbox table insert 
  
-Worker → reads outbox → publishes event +===== Why it is safe =====
-</code>+
  
-==== Why it is used ====+Only DELETE operations on cache 
 +No stale value can be written
  
-Prevents lost events +===== Read Flow =====
-Ensures reliable event delivery+
  
-==== Where it is used ====+<code> Client → Redis (miss) → Database → Set cache </code>
  
-Large microservices systems +Always consistent with DB.
-Financial systems +
-E-commerce platforms+
  
-===== 8Common Production Problems =====+===== 7Comparison Table =====
  
-==== 1. Cache Stampede ====+^ Aspect ^ Update-on-write ^ Invalidate-on-write ^ 
 +Cache operation | SET | DELETE | 
 +| Race condition risk | ❌ High | ✅ Low | 
 +| Out-of-order updates | ❌ Dangerous | ❌ Harmless | 
 +| Cache correctness | ❌ Fragile | ✅ Reliable | 
 +| Debug complexity | High | Low |
  
-Many requests miss cache at the same time.+===== 8Key Insight =====
  
-Fix:+Cache is not the source of truth.
  
-Mutex lock +DB = source of truth 
-Single-flight +Cache = disposable optimization layer
-Request coalescing+
  
-==== 2. Race Condition (Stale Cache Overwrite) ====+Invalidate-on-write works because:
  
-Old data overwrites new cache value.+Even if cache is wrong, it will be deleted anyway.
  
-Fix:+===== 9. Production Pattern =====
  
-Versioning +Most systems use:
-Timestamp checks +
-Event ordering+
  
-==== 3Event Loss ====+<code> WRITE: 1Update Database 2. Delete Cache
  
-Fix:+READ: 
 +Cache hit → return 
 +Cache miss → DB → set cache 
 +</code>
  
-Kafka persistence +===== 10. Why Update-on-write still exists =====
-Retry + DLQ +
-Outbox pattern+
  
-===== 9. Summary =====+Used only in limited cases:
  
-In real production systems:+Simple data 
 +Low concurrency 
 +Ultra-fast read-after-write requirement
  
-Cache-aside is the base +Examples:
-Event-driven invalidation handles updates +
-TTL provides safety net +
-Outbox ensures reliability+
  
-==== Final Mental Model ====+Session storage 
 +Feature flags 
 +Simple counters
  
-<code> Event-driven invalidation ↓ Cache-aside ←→ Database ↓ TTL fallback ↓ Outbox pattern (reliability) </code>+===== 11. Bonus Topics ===== 
 + 
 +Cache stampede problem 
 +Mutex / singleflight 
 +Request coalescing 
 +Kafka-based cache invalidation 
 +Outbox pattern for reliability
skills/cache.1779778156.txt.gz · Last modified: by phong2018