Some notes I took while watching Introducing Exactly Once Semantics talk, presented by Apurva Mehta of Confluent.

1) Idempotent Producer

  • producer.send will always lead to one copy in the log
  • not in the consumer - you can still consume a message multiple times

Done with configuration:

enable.idempotence = true
max.inflight.requests.per.connection = 1
acks = all
retries = MAX_INT

How is this done? Metadata on each message.

  • producer ID (assigned by the broker)
  • sequence ID for the message - producer and topic leader agree. valid for the producer session only.
  • this metadata is kept in the log (enabling resilience around changing leaders for example)
  • If producer doesn’t get an ack, it resends the message. If broker has already processed the message, it just sends an ack without writing it to the log

2) Transaction API

New components in the Kafka ecosystem

  • Transaction coordinator - maintains transaction state on a per-producer basis. Runs within broker.
  • Transaction log - persists the state

Producer side:

producer.initTransactions()
  // register transactional.id with the transaction coordinator component
  // resolve outstanding transactions before accepting new messages
  // done once when producer comes online
producer.beginTransaction()
  // a producer can be part of only one transaction at a time   
producer.send()
  // partitions are registered with the Transaction coordinator
  // data is written to the log as usual
producer.commitTransaction()
  // two phase commit - 1) producer sends a commit RPC to the Transaction coordinator  
  // 2) Transaction coordinator writes transaction "commit markers" to the data logs

Consumers can read_committed or read_uncommitted which is a configuration

Messages are still in offset order. Transaction order can differ but is invisible to consumers.