Organizations of all sizes use cloud services for data science, to mitigate challenges such as:

  • Long delays and high startup costs for new projects and new data science teams
  • Obstacles to collaboration between organizations or groups
  • High costs of computing infrastructure, including hardware, software and manpower
  • Difficulty scaling to meet variable demand
  • Excessive time and costs moving the data to the analysis

Choose your Cloud Strategy based on your Data Science needs

Depending on the circumstances of your organization and what specific challenges you are trying to address, there are multiple cloud options to consider:

  • Hosted and Software as a Service (SaaS) offerings:
    A fully hosted service, such as RStudio Cloud, can minimize the cost and time required to start up a new project, workshop or class.
  • Deployment to a Virtual Private Cloud (VPC) provider:
    Deploying software on a major cloud platform such as Amazon Web Services (AWS) or Azure can provide the full flexibility and customization of on premise software.
  • Cloud marketplace Offerings:
    Pre-built applications offered on services such as the AWS and Azure Marketplaces make it easy to get started at a pay-as-you-go hourly cost, but require careful management to ensure the software is available and running only when needed.
  • Fully-Managed Services:
    These offerings, such as RStudio on Amazon SageMaker, provide the convenience and scalability of the cloud, while offloading the maintenance and administration to the cloud provider or a third party.
  • Data science in your data lake:
    By embedding your data science tools into your existing data platform, your computations can be run close to the data, minimize overhead, and easily tie into your data pipeline.

Want to learn more?

RStudio supports your Data Science Cloud Strategy

Regardless of which approach you choose, RStudio provides multiple options to support your cloud journey:

Simplify and reduce startup costs with a SaaS solution:

Promote collaboration and instruction between organizations and groups

Mitigate high costs of computing infrastructure

Scale to meet variable demand. In addition to the above options (marketplaces, fully-managed services, VPCs, Docker and Cloud storage), RStudio's pro products provide specific functionality to help:

  • RStudio Workbench's Launcher integrates with Kubernetes, an industry-standard clustering solution that allows efficient scalinge
  • RStudio provides Helm charts to help you manage your Kubernetes configurations
  • RStudio Connect provides many options to scale and tune performance, including being part of an autoscaling group. These options allow Connect to deliver dashboards, Shiny applications, and other types of content to large numbers of users

Minimize data movement. By running your computations close to the data, you can minimize overhead and tie your data science directly into your data pipelines:

  • Run your data science tools on your cloud provider, whether in marketplaces, fully-managed services or VPCs as listed above, to help minimize data movement
  • Native R interface to Spark: Sparklyr allows you to use easily filter and aggregate Spark datasets and streams then bring them into R for analysis and visualization, to train models at scale and productionize machine learning pipelines in Spark. Learn more at spark
  • Connect to cloud based data storage, such as Snowflake, Redshift or S3, using RStudio Professional Drivers
  • Use Amazon EFS (Elastic File System) as your shared file system for RStudio Team

More resources:

  • Read the blog post, "Where does RStudio fit into your Cloud Journey"
  • Watch the webinar, "Why Data Science in the Cloud?", copresented with RStudio partner ProCogia
  • Learn more about RStudio Cloud, where you can get started for free, or check our Available Plans . If you are interested in using RStudio Cloud for teaching, watch the webinar, "Teaching R online with RStudio Cloud"
Set up a call with RStudio

RStudio Cloud stories from your peers