To stop duplicate video edits in our pipeline, we give every job a unique key and track its status in a database. We also use Kafka's idempotent producer. This setup has almost completely eliminated accidental re-processing. If you're new to streaming, using consistent event keys and built-in idempotence features is a great first step.
Here's what we did for exactly-once processing in Kafka at CLDY. We combined a transactional outbox pattern with strong key deduplication. Every event gets a unique ID, which makes retries safe, and the outbox writes all changes atomically. Now message duplication is rare, even when systems crash. I'd pair this with good monitoring so you can catch any weird issues quickly.
Here's how I handle exactly-once processing in Kafka. I use a transactional outbox plus deduplication by unique event keys. The outbox handles the batch writes, so even if the network hiccups, you don't send duplicates downstream. We started doing this for our graph update workflows at Superpencil and the double-processing problem basically vanished. Honestly, just focus on clear key schemas and turn on Kafka's idempotent producer feature. It makes things way easier.
At Apps Plus, we had to stop duplicate user events from messing up our Kafka streams. We started giving every action a unique client-side ID, then kept a short-term cache of those IDs to filter out retries. That was it. No more double-counting usage or sending repeat notifications. It made our user tracking way more accurate without making things complicated.
One excellent pattern we're used to is to combine the transactional outbox on the producer side with idempotent consumers. The principle is to ensure that the change to the business state and the creation of the outbound event occur inside the same atomic database transaction. Because of this, then, an event is only queued for delivery if the core business operation (like creating an invoice, for example) has been a success. For the consumer side of messages, these must be tagged with a unique identifier. The consumer maintains a durable list of message IDs it has seen. Before doing its logic, it checks whether this ID has been seen before. If so, then it simply acknowledges and discards the message. This two-part mechanism keeps the guarantee from the producer to send being decoupled from the guarantee from the consumer to only process once and handles gracefully the realities of distributed systems--like restarts of the services and the inevitable retries in the network--without cascading to duplicate business transactions.