Apache Spark 2：Data Processing and Real-Time Analytics

上QQ阅读APP看书，第一时间看更新

State versioning guarantees consistent results after reruns

What about the state? Imagine that a machine learning algorithm maintains a count variable on all the workers. If you replay the exact same data twice, you will end up counting the data multiple times. Therefore, the query planner also maintains a versioned key-value map within the workers, which are persisting their state in turn to HDFS--which is by design fault tolerant.

So, in case of a failure, if data has to be replaced, the planner makes sure that the correct version of the key-value map is used by the workers.

本周热推：

C语言程序设计教程（第5版）Python 3.8从零开始学写给程序员的Python教程产品架构评估原理与方法剑指Java：核心原理与应用实践