Apache Spark 2：Data Processing and Real-Time Analytics

上QQ阅读APP看书，第一时间看更新

True unification - same code, same engine

So a continuous application could also be implemented on top of RDDs and DStreams but would require the use of use two different APIs. In Apache Spark Structured Streaming the APIs are unified. This unification is achieved by seeing a structured stream as a relational table without boundaries where new data is continuously appended to the bottom of it. In batch processing on DataFrames using the relational API or SQL, intermediate DataFrames are created. As stream and batch computing are unified on top of the Apache SparkSQL engine, when working with structured streams, intermediate relational tables without boundaries are created.

It is important to note that one can mix (join) static and incremental data within the same query called a continuous application, which is an application taking static and dynamic data into account and never stops, producing output all the time or, at least when new data arrives. A continuous application doesn't necessarily need access to static data, it can also process streaming data only. But an example for using static data on streams is when getting GPS locations in as a stream and matching those GPS locations to addresses stored in persistent storage. The output of such an operation is a stream of addresses.

本周热推：

uni-app跨平台开发与应用从入门到实践 Linux C编程：一站式学习 Three.js权威指南：在网页上创建3D图形和动画的方法与实践（原书第4版）UG 12.0数控编程实例教程零基础学FPGA设计：理解硬件编程思想