User Tools

Site Tools


kafka:offset

Kafka Offset

Theory

An offset is a sequential number that identifies the position of a message inside a partition.

Important:

  • One partition contains many messages.
  • One message has exactly one offset.
  • Offsets are unique only within a partition.
  • Offsets increase monotonically.
  • Offsets never change after assignment.

Relationship:

Topic
  └── Partition
        └── Offset
              └── Message

Formula:

Message ID = (topic, partition, offset)

Offset alone is not globally unique.

Example:

Partition 0, Offset 5
Partition 1, Offset 5

These are two different messages.

You must use both:

(partition, offset)

to uniquely identify a message.

Storage Model

Kafka stores messages as append-only logs.

When a producer sends a new message:

Producer --> Topic --> Partition --> Append to end

Example:

Topic: orders

Partition 0

Offset 0 --> ORD-1001
Offset 1 --> ORD-1002
Offset 2 --> ORD-1003

New message:

ORD-1004

Kafka appends it:

Topic: orders

Partition 0

Offset 0 --> ORD-1001
Offset 1 --> ORD-1002
Offset 2 --> ORD-1003
Offset 3 --> ORD-1004

Consumer Theory

Consumers do not remove messages.

Instead, each consumer group stores its progress.

Formula:

(group.id, partition) --> committed offset

Example:

Group: email-service

Partition 0 --> Offset 2

Meaning:

email-service has processed messages up to offset 2

Next message:

Offset 3

Internal Offset Storage

Kafka stores committed offsets in an internal topic:

__consumer_offsets

Example:

Group: email-service

orders-P0 --> 2
orders-P1 --> 5

Group: analytics-service

orders-P0 --> 10
orders-P1 --> 12

Each consumer group has independent offsets.

Complete Example

Topic:

orders

Partitions:

Partition 0

Offset 0 --> ORD-1001
Offset 1 --> ORD-1002
Offset 2 --> ORD-1003

Partition 1

Offset 0 --> ORD-1004
Offset 1 --> ORD-1005

Consumer group:

email-service

Workers:

worker-1
worker-2

Partition assignment:

worker-1 --> Partition 0
worker-2 --> Partition 1

Committed offsets:

email-service

Partition 0 --> 1
Partition 1 --> 0

Worker-1 calls:

poll()

Kafka logic:

1. Find assigned partitions:
   worker-1 --> Partition 0

2. Find committed offset:
   Partition 0 --> 1

3. Calculate next offset:
   1 + 1 = 2

4. Read message:
   Partition 0, Offset 2

Kafka returns:

ORD-1003

After processing:

commit(offset=2)

Kafka updates:

email-service

Partition 0 --> 2

Batch Consumption

One offset always represents one message.

Example:

Partition 0

Offset 10 --> M1
Offset 11 --> M2
Offset 12 --> M3

A single poll request may return multiple messages:

poll()

[
  (P0, 10, M1),
  (P0, 11, M2),
  (P0, 12, M3)
]

But the rule remains:

1 offset = 1 message

Batching is only a performance optimization.

Summary

Concept Description
Topic Logical stream of messages
Partition Physical shard of a topic
Offset Position of a message in a partition
Consumer Group Logical application consuming messages
Consumer Worker process inside a group
Committed Offset Last processed offset for a group and partition

Key formulas:

Message = topic + partition + offset
Progress = group.id + partition --> committed offset
Next message offset = committed offset + 1

Golden rules:

1 partition --> many offsets

1 offset --> 1 message

1 consumer group --> 1 committed offset per partition
kafka/offset.txt · Last modified: by phong2018