And here’s a docker-compose file to get up-and-running quickly with the Confluent 4.0 platform which is the latest as of this writing.
Copy the following content into a file called docker-compose.yml
execute the following command from the same directory: docker-compose up -d
*Note: this docker-compose file using some nifty settings that enable traffic to Kafka from inside and outside the Docker network. *
Now let’s create a compacted topic. Here I am fiddling with values such as delete.retention.ms and min.cleanable.dirty.ratio just to prove the point for the example. For production these values would have to be tuned.
Execute this from the terminal:
Now lets send a few messages to the topic with the same key for each message.
Read more here about how a Kafka background process determines how it removes records with dup’ed keys from the log:
Log compaction is handled by the log cleaner, a pool of background threads that recopy log segment files, removing records whose key appears in the head of the log
This requires us to use a pass special properties using the -property flag in the Kafka Console Producer
parse.key=true
key.separator=:
Now let’s check out the results:
I see:
Our bash for loop ran 10 times but only entries 9 and 10 are present on the topic. Messages 1 - 8 have been cleaned up by the log cleaner.
I was expected to see ONLY the latest messages. (not the last two)
Well, this is only the “hello world” of Log Compaction and the settings would need tuning based on more research and testing to ensure a reliable cache.
But my reaction for now: pause and think that each application using a compacted Kafka topic as a cache may encounter a situation where they read the cache and see the same key twice (this is what happpened in the example above).
But that is topic-tuning and some unit tests away. Very cool to see the potential of using Kafka as a distributed systems cache.
unanswered: zookeeper provides reentrant locking on zNodes, preventing cache-update race conditions. How would the group arrive at a consensus with no guarantee of resource locking?