The last year has seen phenomenal progress of the R APIs for Spark. RStudio’s sparkapi package and associated dplyr backend, sparklyr, exposes a far richer set of features for calling Spark packages through R than was possible with the original SparkR package. Moreover, Microsoft has also released a new Spark API called RxSpark, which provides a highly performant set of machine learning algorithms and data processing functions for use in a Spark application, and can be used in conjunction with sparklyr and other Spark APIs.
In this talk, Ali will talk about the lessons learned using R with Spark. In particular, he’ll show how to write reproducible documents in Spark, that take advantage of R Markdown’s caching mechanism and RxSpark’s `persistentRun` feature, as well as how to develop shiny applications that work reactively with Spark DataFrames and RxSpark algorithms.