Apache Spark 2:Data Processing and Real-Time Analytics
上QQ阅读APP看书,第一时间看更新

Idempotent sinks prevent data duplication

Another key to end-to-end exactly-once delivery guarantee is idempotent sinks. This basically means that sinks are aware of which particular write operation has succeeded in the past. This means that such a smart sink can re-request data in case of a failure and also drop data in case the same data has been sent multiple times.