By   December 9, 2017

Purging Data in a Kafka Topics

If you have been working with Kafka for sometime, you might need to purge data in a particular topic. In a dev environment, you might worked it around by just publishing and consuming from a different topic. But there are some cases when you might want to purge data from a topic. Some of the cases, are as follows:

  • You just released with a new topic and there are messages now in the new topic which have some data generated with code with error. Some wrong calculations have generated incorrect data which might need to be wiped out.
  • It’s a micro-service based environment and you have no control over the consumers. They have updated and released but haven’t subscribed yet. But you have published some data that you might not be so proud of and want the topic cleaned up.
  • You are fed-up with creating a new topic every time you need to consume or you just believe that as you create a new topic, a kangaroo dies in Australia.

So, Kafka doesn’t have to keep the data forever. As a matter of fact, the default setting is to automatically purge data after 7 days period. But we can always change that. Here we are changing it to 1000 milliseconds.

Just try to consume the data from the topic. I have realized that kafka doesn’t just purge the data after 1000ms in this case but it is just qualified for purging. But the data is indeed purged in a few minutes.

If you start a consumer and it keeps waiting without consuming any messages, the topic is empty. Just update the configuration to set a bigger value for retention. Or you can just use Kafka Topics UI.