====== Kafka Offset ====== ===== Theory ===== An ''offset'' is a sequential number that identifies the position of a message inside a partition. Important: * One partition contains many messages. * One message has exactly one offset. * Offsets are unique only within a partition. * Offsets increase monotonically. * Offsets never change after assignment. Relationship: Topic └── Partition └── Offset └── Message Formula: Message ID = (topic, partition, offset) Offset alone is not globally unique. Example: Partition 0, Offset 5 Partition 1, Offset 5 These are two different messages. You must use both: (partition, offset) to uniquely identify a message. ===== Storage Model ===== Kafka stores messages as append-only logs. When a producer sends a new message: Producer --> Topic --> Partition --> Append to end Example: Topic: orders Partition 0 Offset 0 --> ORD-1001 Offset 1 --> ORD-1002 Offset 2 --> ORD-1003 New message: ORD-1004 Kafka appends it: Topic: orders Partition 0 Offset 0 --> ORD-1001 Offset 1 --> ORD-1002 Offset 2 --> ORD-1003 Offset 3 --> ORD-1004 ===== Consumer Theory ===== Consumers do not remove messages. Instead, each consumer group stores its progress. Formula: (group.id, partition) --> committed offset Example: Group: email-service Partition 0 --> Offset 2 Meaning: email-service has processed messages up to offset 2 Next message: Offset 3 ===== Internal Offset Storage ===== Kafka stores committed offsets in an internal topic: __consumer_offsets Example: Group: email-service orders-P0 --> 2 orders-P1 --> 5 Group: analytics-service orders-P0 --> 10 orders-P1 --> 12 Each consumer group has independent offsets. ===== Complete Example ===== Topic: orders Partitions: Partition 0 Offset 0 --> ORD-1001 Offset 1 --> ORD-1002 Offset 2 --> ORD-1003 Partition 1 Offset 0 --> ORD-1004 Offset 1 --> ORD-1005 Consumer group: email-service Workers: worker-1 worker-2 Partition assignment: worker-1 --> Partition 0 worker-2 --> Partition 1 Committed offsets: email-service Partition 0 --> 1 Partition 1 --> 0 Worker-1 calls: poll() Kafka logic: 1. Find assigned partitions: worker-1 --> Partition 0 2. Find committed offset: Partition 0 --> 1 3. Calculate next offset: 1 + 1 = 2 4. Read message: Partition 0, Offset 2 Kafka returns: ORD-1003 After processing: commit(offset=2) Kafka updates: email-service Partition 0 --> 2 ===== Batch Consumption ===== One offset always represents one message. Example: Partition 0 Offset 10 --> M1 Offset 11 --> M2 Offset 12 --> M3 A single poll request may return multiple messages: poll() [ (P0, 10, M1), (P0, 11, M2), (P0, 12, M3) ] But the rule remains: 1 offset = 1 message Batching is only a performance optimization. ===== Summary ===== ^ Concept ^ Description ^ | Topic | Logical stream of messages | | Partition | Physical shard of a topic | | Offset | Position of a message in a partition | | Consumer Group | Logical application consuming messages | | Consumer | Worker process inside a group | | Committed Offset | Last processed offset for a group and partition | Key formulas: Message = topic + partition + offset Progress = group.id + partition --> committed offset Next message offset = committed offset + 1 Golden rules: 1 partition --> many offsets 1 offset --> 1 message 1 consumer group --> 1 committed offset per partition