Table of Contents
Kafka Offset
Theory
An offset is a sequential number that identifies the position of a message inside a partition.
Important:
- One partition contains many messages.
- One message has exactly one offset.
- Offsets are unique only within a partition.
- Offsets increase monotonically.
- Offsets never change after assignment.
Relationship:
Topic
└── Partition
└── Offset
└── Message
Formula:
Message ID = (topic, partition, offset)
Offset alone is not globally unique.
Example:
Partition 0, Offset 5 Partition 1, Offset 5
These are two different messages.
You must use both:
(partition, offset)
to uniquely identify a message.
Storage Model
Kafka stores messages as append-only logs.
When a producer sends a new message:
Producer --> Topic --> Partition --> Append to end
Example:
Topic: orders Partition 0 Offset 0 --> ORD-1001 Offset 1 --> ORD-1002 Offset 2 --> ORD-1003
New message:
ORD-1004
Kafka appends it:
Topic: orders Partition 0 Offset 0 --> ORD-1001 Offset 1 --> ORD-1002 Offset 2 --> ORD-1003 Offset 3 --> ORD-1004
Consumer Theory
Consumers do not remove messages.
Instead, each consumer group stores its progress.
Formula:
(group.id, partition) --> committed offset
Example:
Group: email-service Partition 0 --> Offset 2
Meaning:
email-service has processed messages up to offset 2
Next message:
Offset 3
Internal Offset Storage
Kafka stores committed offsets in an internal topic:
__consumer_offsets
Example:
Group: email-service orders-P0 --> 2 orders-P1 --> 5 Group: analytics-service orders-P0 --> 10 orders-P1 --> 12
Each consumer group has independent offsets.
Complete Example
Topic:
orders
Partitions:
Partition 0 Offset 0 --> ORD-1001 Offset 1 --> ORD-1002 Offset 2 --> ORD-1003 Partition 1 Offset 0 --> ORD-1004 Offset 1 --> ORD-1005
Consumer group:
email-service
Workers:
worker-1 worker-2
Partition assignment:
worker-1 --> Partition 0 worker-2 --> Partition 1
Committed offsets:
email-service Partition 0 --> 1 Partition 1 --> 0
Worker-1 calls:
poll()
Kafka logic:
1. Find assigned partitions: worker-1 --> Partition 0 2. Find committed offset: Partition 0 --> 1 3. Calculate next offset: 1 + 1 = 2 4. Read message: Partition 0, Offset 2
Kafka returns:
ORD-1003
After processing:
commit(offset=2)
Kafka updates:
email-service Partition 0 --> 2
Batch Consumption
One offset always represents one message.
Example:
Partition 0 Offset 10 --> M1 Offset 11 --> M2 Offset 12 --> M3
A single poll request may return multiple messages:
poll() [ (P0, 10, M1), (P0, 11, M2), (P0, 12, M3) ]
But the rule remains:
1 offset = 1 message
Batching is only a performance optimization.
Summary
| Concept | Description |
|---|---|
| Topic | Logical stream of messages |
| Partition | Physical shard of a topic |
| Offset | Position of a message in a partition |
| Consumer Group | Logical application consuming messages |
| Consumer | Worker process inside a group |
| Committed Offset | Last processed offset for a group and partition |
Key formulas:
Message = topic + partition + offset
Progress = group.id + partition --> committed offset
Next message offset = committed offset + 1
Golden rules:
1 partition --> many offsets 1 offset --> 1 message 1 consumer group --> 1 committed offset per partition
