====== Kafka Offset ======
===== Theory =====
An ''offset'' is a sequential number that identifies the position of a message inside a partition.
Important:
* One partition contains many messages.
* One message has exactly one offset.
* Offsets are unique only within a partition.
* Offsets increase monotonically.
* Offsets never change after assignment.
Relationship:
Topic
└── Partition
└── Offset
└── Message
Formula:
Message ID = (topic, partition, offset)
Offset alone is not globally unique.
Example:
Partition 0, Offset 5
Partition 1, Offset 5
These are two different messages.
You must use both:
(partition, offset)
to uniquely identify a message.
===== Storage Model =====
Kafka stores messages as append-only logs.
When a producer sends a new message:
Producer --> Topic --> Partition --> Append to end
Example:
Topic: orders
Partition 0
Offset 0 --> ORD-1001
Offset 1 --> ORD-1002
Offset 2 --> ORD-1003
New message:
ORD-1004
Kafka appends it:
Topic: orders
Partition 0
Offset 0 --> ORD-1001
Offset 1 --> ORD-1002
Offset 2 --> ORD-1003
Offset 3 --> ORD-1004
===== Consumer Theory =====
Consumers do not remove messages.
Instead, each consumer group stores its progress.
Formula:
(group.id, partition) --> committed offset
Example:
Group: email-service
Partition 0 --> Offset 2
Meaning:
email-service has processed messages up to offset 2
Next message:
Offset 3
===== Internal Offset Storage =====
Kafka stores committed offsets in an internal topic:
__consumer_offsets
Example:
Group: email-service
orders-P0 --> 2
orders-P1 --> 5
Group: analytics-service
orders-P0 --> 10
orders-P1 --> 12
Each consumer group has independent offsets.
===== Complete Example =====
Topic:
orders
Partitions:
Partition 0
Offset 0 --> ORD-1001
Offset 1 --> ORD-1002
Offset 2 --> ORD-1003
Partition 1
Offset 0 --> ORD-1004
Offset 1 --> ORD-1005
Consumer group:
email-service
Workers:
worker-1
worker-2
Partition assignment:
worker-1 --> Partition 0
worker-2 --> Partition 1
Committed offsets:
email-service
Partition 0 --> 1
Partition 1 --> 0
Worker-1 calls:
poll()
Kafka logic:
1. Find assigned partitions:
worker-1 --> Partition 0
2. Find committed offset:
Partition 0 --> 1
3. Calculate next offset:
1 + 1 = 2
4. Read message:
Partition 0, Offset 2
Kafka returns:
ORD-1003
After processing:
commit(offset=2)
Kafka updates:
email-service
Partition 0 --> 2
===== Batch Consumption =====
One offset always represents one message.
Example:
Partition 0
Offset 10 --> M1
Offset 11 --> M2
Offset 12 --> M3
A single poll request may return multiple messages:
poll()
[
(P0, 10, M1),
(P0, 11, M2),
(P0, 12, M3)
]
But the rule remains:
1 offset = 1 message
Batching is only a performance optimization.
===== Summary =====
^ Concept ^ Description ^
| Topic | Logical stream of messages |
| Partition | Physical shard of a topic |
| Offset | Position of a message in a partition |
| Consumer Group | Logical application consuming messages |
| Consumer | Worker process inside a group |
| Committed Offset | Last processed offset for a group and partition |
Key formulas:
Message = topic + partition + offset
Progress = group.id + partition --> committed offset
Next message offset = committed offset + 1
Golden rules:
1 partition --> many offsets
1 offset --> 1 message
1 consumer group --> 1 committed offset per partition