
上QQ阅读APP看书,第一时间看更新
State versioning guarantees consistent results after reruns
What about the state? Imagine that a machine learning algorithm maintains a count variable on all the workers. If you replay the exact same data twice, you will end up counting the data multiple times. Therefore, the query planner also maintains a versioned key-value map within the workers, which are persisting their state in turn to HDFS--which is by design fault tolerant.
So, in case of a failure, if data has to be replaced, the planner makes sure that the correct version of the key-value map is used by the workers.