exactly once
Some notes I took while watching Introducing Exactly Once Semantics talk, presented by Apurva Mehta of Confluent.
1) Idempotent Producer
producer.sendwill always lead to one copy in the log- not in the consumer - you can still consume a message multiple times
Done with configuration:
enable.idempotence = true
max.inflight.requests.per.connection = 1
acks = all
retries = MAX_INTHow is this done? Metadata on each message.
- producer ID (assigned by the broker)
- sequence ID for the message - producer and topic leader agree. valid for the producer session only.
- this metadata is kept in the log (enabling resilience around changing leaders for example)
- If producer doesn’t get an ack, it resends the message. If broker has already processed the message, it just sends an
ackwithout writing it to the log
2) Transaction API
New components in the Kafka ecosystem
- Transaction coordinator - maintains transaction state on a per-producer basis. Runs within broker.
- Transaction log - persists the state
Producer side:
producer.initTransactions()
// register transactional.id with the transaction coordinator component
// resolve outstanding transactions before accepting new messages
// done once when producer comes online
producer.beginTransaction()
// a producer can be part of only one transaction at a time
producer.send()
// partitions are registered with the Transaction coordinator
// data is written to the log as usual
producer.commitTransaction()
// two phase commit - 1) producer sends a commit RPC to the Transaction coordinator
// 2) Transaction coordinator writes transaction "commit markers" to the data logsConsumers can read_committed or read_uncommitted which is a configuration
Messages are still in offset order. Transaction order can differ but is invisible to consumers.