How to create idempotent Kafka Consumer in Spring

When working with Apache Kafka, ensuring message processing is idempotent is crucial to maintain data consistency and prevent unintended bugs. We need to be sure that even if a message is processed multiple times, the outcome remains the same, which is particularly vital in distributed systems where duplicate messages can occur.

Before we jump into the implementation details we need to know few terms from every solid Kafka developer kit.

Idempotency is used in mathematics to describe a function that produces the same result if it is applied to itself, i.e. f(x) = f(f(x)). In Messaging this concepts translates into a message that has the same effect whether it is received once or multiple times. This means that a message can safely be resent without causing any problems even if the receiver receives duplicates of the same message.

https://www.enterpriseintegrationpatterns.com/patt...

Consider some meaningful example.

Assume you enter an elevator and want to select the floor you want to stop at. You press the button for the 5th floor.

  1. The first press of the 5th-floor button causes the elevator to start moving to the 5th floor.
  2. The second press of the 5th-floor button does nothing – the elevator is still going to the 5th floor, as it's already planned to stop there.
  3. The third press – again, nothing changes, the elevator continues to go to the 5th floor.

You can see that pressing the 5th-floor button multiple times does not change the outcome – the elevator will still stop at the 5th floor. The operation is idempotent because, no matter how many times the button is pressed, the elevator will always stop at the same floor

Kafka Message Delivery Semantic define how many times each message stored in a Kafka topic will be delivered to your application, of course in the ideal world we would like to have each message delivered to our application exactly once and maybe you have heard some rumors about Apache Kafka exactly once processing guarantee, but let's focus on some more common scenarios:
  • At least once - Messages are never lost but may be redelivered.
  • At most once - Messages may be lost but are never redelivered

Since most of the applications we work with cannot afford to lose any data, we can safely disregard the at most once guarantee

https://kafka.apache.org/08/do...

And finally, it's time to connect the dots:

We know that every Kafka message may be redelivered so we need to have our business processes idempotent.

How do we do it? Basically, each time we process a Kafka message, we need to check some persistent storage to see if the message has already been processed. If not we can easily process the message and then save its id. This ensures that in case of a redelivery, the message will not be processed twice.

Implementation

To begin, let's create a database table specifically designed to store all events application processed.

Next, implement RecordFilterStrategy to query the database and filter events.

And finally consume some Kafka records with filter configured and execute business logic.

The full source code is available at github.com/pszymczyk/spring-kafka-idempotent-consumer