
上QQ阅读APP看书,第一时间看更新
True unification - same code, same engine
So a continuous application could also be implemented on top of RDDs and DStreams but would require the use of use two different APIs. In Apache Spark Structured Streaming the APIs are unified. This unification is achieved by seeing a structured stream as a relational table without boundaries where new data is continuously appended to the bottom of it. In batch processing on DataFrames using the relational API or SQL, intermediate DataFrames are created. As stream and batch computing are unified on top of the Apache SparkSQL engine, when working with structured streams, intermediate relational tables without boundaries are created.
It is important to note that one can mix (join) static and incremental data within the same query called a continuous application, which is an application taking static and dynamic data into account and never stops, producing output all the time or, at least when new data arrives. A continuous application doesn't necessarily need access to static data, it can also process streaming data only. But an example for using static data on streams is when getting GPS locations in as a stream and matching those GPS locations to addresses stored in persistent storage. The output of such an operation is a stream of addresses.