RStudio is becoming Posit in October. Learn more at posit.co
RStudio is becoming Posit in October.
Learn more at posit.co
The premier IDE for R
RStudio anywhere using a web browser
Put Shiny applications online
Shiny, R Markdown, Tidyverse and more
Next level training for you and your team
Do, share, teach and learn data science
An easy way to access R packages
Let us host your Shiny applications
A single home for R & Python Data Science Teams
Scale, develop, and collaborate across R & Python
Easily share your insights
Control and distribute packages
RStudio Public Package Manager
RStudio Package Manager
Working with Spark
RStudio Pro Administration
Part 4 - Understanding sparklyr deployment modes
August 30, 2017
RStudio recently announced a new open-source package called sparklyr that facilitates a connection between R and Spark using a full-fledged dplyr backend with support for the entirety of Spark’s MLlib library. Due to Spark’s ability to interact with distributed data with little latency, it is becoming an attractive tool for interfacing with large datasets in an interactive environment. In addition to handling the storage of data, Spark also incorporates a variety of other tools including stream processing, computing on graphs, and a distributed machine learning framework. Some of these tools are available to R programmers via the sparklyr package.
In this four-part series, we’ll discuss how to leverage Spark’s capabilities in a modern R environment. The sparklyr Series:
Edgar Ruiz is a solutions engineer at RStudio with a background in deploying enterprise reporting and business intelligence solutions. He is the author of multiple articles and blog posts sharing analytics insights and server infrastructure for data science. Edgar is the author and administrator of the https://db.rstudio.com web site, and current administrator of the sparklyr web site: https://spark.rstudio.com. Co-author of the dbplyr package, and creator of the dbplot, tidypredict and modeldb package.