Who should I reach out to?

Who exactly will manage the RStudio Pro Products varies by organization. Often they are part of the internal DevOps or hosting team. Some large organizations have roles specific to managing data science and analytics tools, or to managing Linux servers.

In large organizations, the different functions are split out and there will often be separate subteams for the actual server hosting and product install, authentication, database access configuration, networking, and security.

Here are a few examples of IT roles that other data scientists have worked with:

  • Linux Admin
  • DevOps Engineer
  • Cloud Engineer
  • Software Engineering/Infrastructure
  • We can help you identify the team to work with

Our sales team can help organize a call with one of our solutions engineers to talk with IT about best practices for managing RStudio Team. Depending on how you’re managing your server, your team will also need to think about how your data science workflows will integrate into existing IT workflows.

How do I phrase what they will need to do?

RStudio’s Professional Products are Linux-based server software products. In order to successfully install, configure, and manage them, you’ll need to provision servers (in your own servers on-premise or in the cloud) and do a variety of Linux SysAdmin activities. Here are the RStudio professional product requirements.

Offering an architecture review to your IT contact with one of RStudio’s solution engineers is helpful for highlighting any requirements and how that will fit into your current environment. Contact our team for an initial discussion and to help set that up.

Depending on the complexity of your configuration, you may need to do additional tasks like:

  • Deploying multiple servers with load balancing
  • Deploying in docker + kubernetes
  • Configuring authentication against corporate SSO using LDAP/AD, SAML, or OAuth
  • Configuring database connections
  • Setting up proxies or reverse proxies, e.g. using Nginx or Apache
  • Automated, scripted deployment, e.g. using Chef, Puppet or Ansible

In some organizations these tasks may be spread across several teams.

How do I talk about open-source technologies?

An incredible variety of the world’s computation runs on top of open source software. Open source means that the code for these programming languages is developed in public and is available for public review. This does not mean that these bits of code are ill-maintained or unloved.

R and Python have been around since the early 1990s and have millions of users every year. Both R and Python are complete programming languages that are able to do a wide array of complex statistical calculations, machine learning tasks, dashboarding and reporting, and more.

They form the foundation of data science practices for many different kinds of organizations that include governmental organizations, major pharmaceutical companies, banks and other financial institutions.

The reality is that most organizations are already supporting open-source software. The 2021 State of Enterprise Open Source report survey shares that 90% of IT leaders (1,250 surveyed) are using enterprise open source today.

It can be helpful to highlight other companies in your industry that are using R/Python today and speaking publicly about it.

Is it secure? Do we need to do a security review?

Many organizations have very stringent security requirements based on Software-as-a-Service products, where you share your data with the vendor to use their tools. All RStudio products (Pro and Open Source) run entirely in your environment. Your data never leaves servers you control and configure. For many organizations, the security review starts and ends with that fact.

On top of that, RStudio’s Pro Products support all industry standard security and authentication tooling including the latest SSL/TLS standards, authentication against LDAP/AD or corporate SSO, and the ability to be deployed in any networking configuration including completely offline/air-gapped.

If you do have a security review process or form to complete, we’re happy to have a conversation about requirements as well.

How can we trust R packages?

This is a fascinating question that deserves a detailed response. Many in the R community are actively working on this challenging question, just as people in other open-source ecosystems tackle these challenges.

While not extensive, we offer these 4 considerations for users or admins wondering about package security:

  1. RStudio Package Manager allows you to control exactly what packages are brought into your organization through curated sources.
  2. RStudio provides R packages to RStudio Package Manager through an upstream RStudio service designed specifically for this task. The connection between this service and RStudio Package Manager is encrypted. Daily updates to CRAN are reviewed by our team before they are made available through this service. The review process checks for consistent package metadata and also updates the package checksum file, used by the R client to ensure downloaded package files are correct. We highly recommend that the connection between your R clients and RStudio Package Manager be encrypted by hosting your RStudio Package Manager instance over HTTPS.
  3. CRAN requires all submitted R packages to pass a series of checks prior to accepting them into the CRAN repository. These checks include installing the package alongside other CRAN packages and running package unit tests. While these tests do not specifically target malicious code, the tests provide a significant hurdle to uploading malicious packages to CRAN.
  4. R code is almost always executed as a non-privileged user. The majority of R code, especially code run in RStudio Server Pro or RStudio Connect, is executed on behalf of a restricted service or user accounts. RStudio Server Pro, for example, runs under an AppArmor profile that is inherited by the R processes it invokes on behalf of non-privileged users. Similarly, RStudio Connect provides an extensive sandboxing process to run user code in an isolated environment. Additionally, while RStudio Package Manager provides a means for users to download packages originating on the internet, most R code is executed in offline environments, often dedicated analytic sandboxes. These measures not only prevent malicious code, but also keep analysts from accidentally interfering with one another.
  5. Learn more about RStudio’s security policy, common security FAQs, and RStudio Package Manager.

What about review boards?

If you are part of a large organization, your IT department probably has a review board (for example: Architecture Review Board, Decision Review Board) whose purpose is to review and make decisions about new tools.

The review board is responsible for:

  • Reviewing new software initiatives and approving expenditures. Does this tool increase or decrease costs? What line items will this go under? What is the long-term cost projected to be? What is the cost of support?
  • Supporting the organization’s strategic vision. Does the tool help satisfy a customer's needs? Does it help us remain competitive? Can it help us attract better talent? Does it make existing systems more efficient and agile?
  • Complying with existing systems architectures. Does the tool integrate with other supported tools? Will it be used in development and/or production? Does it duplicate the capabilities of other supported tools?
  • Managing risk and ensuring security. Does the tool comply with our formal security policies? Do the software licenses meet our legal requirements?
  • Defining roles and responsibilities for support. What groups own the tool? What support is offered with the tool? What internal resources will be required to maintain it? Who will provide training?

If your organization is already friendly toward data science tools but has not made it an official part of the organization, a formal review process is still valuable. The review process gives IT a formal stake in the ground when it comes to supporting R for the long term. It also makes future decisions about growth and investment much easier.

How will RStudio fit into our existing architecture?

As part of the enterprise product licensing cycle, our sales team will schedule a call with a member of our Solutions Engineering team. The call is specifically for your IT leads to talk about how to deploy our products in accordance with your IT setup, and how to integrate them with your user authentication and scaling stratagies.

Questions that we'll cover with your team include:

  • Where are the servers (on-premise or in the cloud?)
  • Which flavor of Linux will be used on the servers?
  • What is the scaling strategy?
  • What type of user authentication will be configured?
  • What is your package management strategy?

From the discussion, we will create an architecture diagram together on the call reflecting your environment - with all the relevant admin guide information to complete the installation.

Making the next move

Get in touch with RStudio

If you are interested in implementing RStudio’s professional products, please don’t hesitate to reach out to us, we are happy to help answer any questions you or your team may have.

Schedule a call

Ask others who have done this before

We host a Data Science Hangout every Thursday at 12 pm ET. It's a low-barrier get together on Zoom for aspiring and current data science leaders. It's very casual, so there's no need to register or RSVP to attend. Each week, host Rachael Dempsey invites a data science leader to discuss their experience and answer questions from the audience.