Differences

This shows you the differences between two versions of the page.

--- skills:interview [2026/06/13 06:14] – [Large Scale Databases] phong2018
+++ skills:interview [2026/06/15 10:07] (current) – [Transactions] phong2018
@@ Line 62: / Line 62: @@
 . Explain the lifecycle of an HTTP request in Go.
+Go's HTTP server accepts a connection, creates goroutines to handle requests, executes middleware and handlers, writes the response, and then either reuses or closes the connection.
 . What is a goroutine?
+A goroutine is a lightweight thread managed by the Go runtime that enables concurrent execution.
 . Goroutine vs Thread?
+Goroutines are much lighter than OS threads and are multiplexed onto threads by Go's scheduler.
 . What is a channel?
+Channels allow goroutines to safely communicate and synchronize without shared memory.
 . Buffered vs Unbuffered channels?
+Unbuffered channels synchronize sender and receiver immediately, while buffered channels allow limited asynchronous communication.
 . What is a select statement?
+Select allows a goroutine to wait on multiple channel operations and execute whichever becomes ready first.
 . What are common goroutine leaks and how do you prevent them?
+Goroutine leaks occur when goroutines wait forever. I prevent them using context cancellation, proper channel management, timeouts, and cleanup logic.
 . What is context.Context and why is it important?
+Context enables cancellation, deadlines, and request-scoped metadata across API calls and goroutines.
 . How does cancellation propagate through contexts?
+Contexts form a tree. When a parent context is cancelled, all derived child contexts are automatically cancelled.
 . What is an interface in Go?
+An interface specifies a set of methods that a type must implement.
 . What is interface segregation in Go?
+Go encourages small interfaces that represent a single responsibility rather than large general-purpose interfaces.
 . What are type assertions and type switches?
+Type assertions extract a concrete type from an interface, while type switches handle multiple possible types safely.
 . How does dependency injection work in Go?
+Dependency injection in Go is typically done through constructors and interfaces rather than frameworks.
 . How does Go's scheduler work?
+Go's scheduler maps many goroutines onto a smaller number of OS threads using an M:N scheduling model.
 . Explain GOMAXPROCS.
+GOMAXPROCS defines the number of OS threads that can execute Go code concurrently.
 . How does garbage collection work in Go?
+Go uses a concurrent mark-and-sweep garbage collector designed to minimize pause times while reclaiming unused memory.
 . How do you gracefully shut down a Go service?
+Graceful shutdown stops new traffic, allows in-flight requests to complete, cleans up resources, and then terminates the service.
 . How do you handle SIGTERM and SIGINT?
+I listen for SIGTERM and SIGINT, trigger graceful shutdown, wait for cleanup, and then exit safely.
 . During server shutdown, how do you finish remaining requests safely?
+During shutdown I stop accepting new requests and allow existing requests to complete within a configurable timeout.
 . What are worker pools and when would you use them?
+A worker pool limits the number of concurrent goroutines processing jobs. It is useful for CPU-intensive tasks, background jobs, and protecting external systems from overload.
 ===== Authentication & Authorization =====
@@ Line 220: / Line 260: @@
 . What are deadlocks?
+<code>
+- Deadlock: A situation where two or more transactions hold locks and wait for each other indefinitely.
+- How it occurs: Transactions access the same resources in different orders. Long-running transactions hold locks too long. Missing indexes cause unnecessary locking.
+- How to detect it: The database detects deadlocks automatically. MySQL returns: ERROR 1213: Deadlock found when trying to get lock. Investigate with SHOW ENGINE INNODB STATUS.
+- How to prevent it: Keep transactions short. Always access tables/rows in a consistent order. Add proper indexes. Avoid large batch operations in a single transaction.
+- How to recover: The database rolls back one transaction automatically. Catch the error in the application and retry the transaction with backoff.
+</code>
 ==== Large Scale Databases ====
@@ Line 228: / Line 274: @@
 . What migration risks should be considered?
+Before running a migration on a table with billions of rows, I would assess locking behavior, replication impact, rollback strategy, disk usage, application compatibility, and database load. For schema changes, I typically use the Expand → Backfill → Contract pattern and batch updates to achieve zero or near-zero downtime. I would also monitor replication lag, query latency, error rates, and storage utilization throughout the migration.
 . How would you backfill data safely?
+I backfill data in small batches, avoid long transactions, monitor database load, and verify correctness before enabling new constraints.
 . How do online schema migrations work?
+Online schema migration creates a shadow table, copies data incrementally while syncing live changes, and then performs a quick cutover to minimize downtime.
 . How do you avoid downtime during migrations?
+I avoid downtime using backward-compatible schema changes, phased deployments, batch backfills, and the Expand-Migrate-Contract pattern.
 . How would you design monthly/yearly statistics tables?
+I would keep raw transactional data separate from reporting tables and maintain monthly/yearly aggregate tables that are updated incrementally. Reports read from aggregates instead of scanning the full dataset.
 . How would you generate reports with billions of rows?
+For billions of rows, I would move reporting workloads to a data warehouse and use pre-aggregation instead of scanning the transactional database.
 . Realtime reporting vs batch reporting?
+Use realtime when business decisions depend on current data. Use batch when slight delays are acceptable and cost efficiency is important.
 . When should you use materialized views?
+I use materialized views for expensive aggregations that are read frequently but don't require perfectly realtime data.
 . How would you implement pre-aggregation?
+I would maintain summary tables and incrementally update them rather than repeatedly aggregating billions of rows.
 . How would you partition very large tables?
+For large datasets I usually partition by date because most queries filter by time ranges and old data can be archived easily.
 . Sharding vs Partitioning?
+Partitioning splits data within a database for performance, while sharding distributes data across multiple databases to achieve horizontal scalability. Partitioning is usually tried before sharding because it is much simpler to operate.
+Example
+<code>
+Partitioning:
+DB1
+ ├─ orders_2025
+ ├─ orders_2026
+ └─ orders_2027
+Sharding:
+DB1 → Users A-F
+DB2 → Users G-M
+DB3 → Users N-Z
+</code>
 ===== REST API =====
@@ Line 416: / Line 497: @@
 . What backend technology trends are you currently following?
+====== System Design Interview Questions ======
+===== Interview Framework =====
+Always follow this order:
+  - Clarify requirements
+  - Estimate scale
+  - Define APIs
+  - Design data model
+  - Draw high-level architecture
+  - Explain data flow
+  - Identify bottlenecks
+  - Discuss scaling
+  - Discuss trade-offs
+  - Explain failure handling
+===== Questions =====
+==== Beginner to Intermediate ====
+  - Design a URL shortener
+  - Design a file storage service
+  - Design a chat application
+  - Design a notification system
+  - Design a rate limiter
+  - Design a distributed cache
+  - Design a search autocomplete service
+  - Design an API gateway
+  - Design a job queue system
+  - Design a payment processing system
+==== Intermediate to Advanced ====
+  - Design a ride-sharing platform
+  - Design a food delivery system
+  - Design a social media news feed
+  - Design a video streaming platform
+  - Design an e-commerce platform
+  - Design a real-time collaboration tool
+  - Design a monitoring and logging platform
+  - Design a recommendation engine
+  - Design a distributed lock service
+  - Design a multi-tenant SaaS platform
+==== Senior Backend Engineer Topics ====
+  - Design an order management system
+  - Design an inventory system that prevents overselling
+  - Design a coupon and promotion engine
+  - Design a loyalty points system
+  - Design an invoice generation system
+  - Design a webhook processing platform
+  - Design an OAuth2 / SSO authentication system
+  - Design an event-driven microservices architecture
+  - Design a distributed scheduler service
+  - Design a monolith-to-microservices migration strategy
+===== Drawing Template =====
+For every question, draw the following components:
+<code>
+Users / Mobile App / Web Browser
+                |
+                v
+        CDN / Load Balancer
+                |
+                v
+            API Gateway
+                |
+                v
+         Application Services
+          /        |        \
+         /         |         \
+        v          v          v
+     Cache      Database    Queue
+      |             |          |
+      |             |          |
+      v             v          v
+    Redis      MySQL/NoSQL   Workers
+                |
+                v
+      Monitoring / Logging
+</code>
+===== What Interviewers Expect =====
+For each component, explain:
+  * Why it exists
+  * How it scales
+  * Single points of failure
+  * Data consistency requirements
+  * Availability requirements
+  * Security considerations
+  * Monitoring and alerting
+===== Non-Functional Requirements Checklist =====
+  * Availability (99.9%, 99.99%, etc.)
+  * Scalability
+  * Reliability
+  * Latency
+  * Throughput
+  * Durability
+  * Consistency
+  * Security
+  * Cost
+===== Estimation Checklist =====
+Estimate before designing:
+  * Daily active users
+  * Requests per second (RPS)
+  * Peak traffic
+  * Storage requirements
+  * Read/write ratio
+  * Bandwidth requirements
+===== Deep Dive Topics =====
+Discuss when prompted:
+  * Database sharding
+  * Caching strategy
+  * Queue design
+  * Event-driven architecture
+  * Replication
+  * Multi-region deployment
+  * Disaster recovery
+  * Rate limiting
+  * Idempotency
+  * Distributed locking
+  * CAP theorem
+  * Eventual consistency
+===== Common Trade-offs =====
+^ Choice ^ Pros ^ Cons ^
+| SQL | Strong consistency | Harder to scale horizontally |
+| NoSQL | High scalability | Eventual consistency |
+| Sync communication | Simple | Tight coupling |
+| Async communication | Resilient | Increased complexity |
+| Cache aside | Simple | Stale data risk |
+| Write through cache | Consistent cache | Higher write latency |
+===== Example: URL Shortener =====
+==== Requirements ====
+  * Shorten long URLs
+  * Redirect users quickly
+  * High read traffic
+  * Custom aliases (optional)
+  * Analytics (optional)
+==== APIs ====
+<code>
+POST /api/v1/shorten
+Request:
+{
+  "url": "https://example.com/very/long/url"
+}
+Response:
+{
+  "shortUrl": "https://short.ly/abc123"
+}
+</code>
+<code>
+GET /abc123
+</code>
+==== High-Level Design ====
+<code>
+Client
+  |
+  v
+Load Balancer
+  |
+  v
+URL Service
+  |       \
+  |        \
+  v         v
+Redis     MySQL
+</code>
+==== Data Model ====
+<code>
+urls
+----
+id
+short_code
+long_url
+created_at
+expires_at
+</code>
+==== Scaling ====
+  * Cache popular URLs in Redis
+  * Read replicas for MySQL
+  * Shard by short_code
+  * CDN for global traffic
+==== Failure Handling ====
+  * Retry failed writes
+  * Circuit breaker for dependencies
+  * Database replication
+  * Multi-region backup
+===== Whiteboard Tips =====
+  * Start simple
+  * Draw before explaining
+  * Label every component
+  * State assumptions clearly
+  * Ask clarifying questions
+  * Explain trade-offs
+  * Think out loud
+  * Optimize only after the basic design works
+===== Golden Rule =====
+Do not jump directly into technology choices.
+Always follow:
+Requirements -> Scale -> APIs -> Data Model -> Architecture -> Bottlenecks -> Trade-offs
 ====== More detail versions ======
@@ Line 436: / Line 760: @@
 . What is Swoole or RoadRunner and how do they differ from the traditional PHP-FPM request model?
 . What do the three numbers in Semantic Versioning (MAJOR.MINOR.PATCH) mean, and when should each be incremented?
 ===== Laravel & Symfony =====