On the RStudio Developer Blog we’ve recently written a series on interoperability and R, including why enterprises should embrace workflows that are open to diverse toolsets.
The designers of R, from its very beginnings, have dealt directly with how best to tap into other tools. Statisticians, analysts, and data scientists have long been challenged to bring together all the statistical methods and technologies required to perform the analysis the situation calls for—and this challenge has grown as more tools, libraries, and frameworks become available.
John Chambers writing on the design philosophy behind the S programming language, the predecessor to R,
“[W]e wanted to be able to begin in an interactive environment, where they did not consciously think of themselves as programming. Then as their needs become clearer and their sophistication increased, they should be able to slide gradually into programming, when the language and system aspects would become more important.”
Part of this design philosophy is to minimize the amount of effort and overhead required to get your analytics work done. It is not fair to assume that every data scientist is programming all day, or coming from a computer science background, but they still need to implement some of the most sophisticated tools programmers use.
The ecosystem around R has striven to strike the right balance between a domain specific environment optimized for data science workflows and output, and a general programming environment. For example CRAN, Bioconductor, rOpenSci, and GitHub provide collections of packages written with data science in mind, which extend core R’s functionality, letting you tap into (and share) statistical methods and field-specific tools — when and only when you need them.
Many of the most popular packages offer interfaces to tools in other languages. For example, most tidyverse packages include compiled (C/C++) code. Interestingly, core R itself connects you to tooling mostly written in other programming languages. As of R 4.0.2 over 75% of the lines in core R’s codebase are written C or Fortran (C 43%, Fortran 33%, & R 23.9%).
Our mission at RStudio is to create free and open source software for data science, scientific research, and technical communication. R is a wonderful environment for data analysis, and we’ve focused on making it easier to use. We do this through our IDE and open sources packages, such as the tidyverse. We also do this by making data science easier to learn through RStudio Cloud and our support for data science education. And we help make R easier to manage and scale out across an organization through our our professional products, supporting best practices for data science in the enterprise through our solutions team.
As part of this effort, we have focused heavily on enabling and supporting interoperability between R and other tools. We recently outlined in a recent blog post how the RStudio IDE allows you to embed many different languages in RMarkdown documents, including:
rstanfor Bayesian modeling,
And we work with the community to support:
dbplyrfor database access and wrangling.
This list goes on and on and grows by the week.
R with RStudio is a wonderful environment for anyone who seeks understanding through the analysis of data. It does this by finding a balance between a domain specific environment and a general programming language that doesn’t prioritize data scientists. That is, it strives to be an environment optimized for analytics workflows and output. At the fulcrum of this balance is extensive interoperability, the ability to pull in interfaces into other technologies as they are needed, and a vibrant community sustaining these. This has been the goal for R since initial design principles, through the extensive work shared by the R community, and significant continued investment by RStudio.
2021 INSPIRE U2 participants Kathleen Bostic and Michel Ruiz-Fuentes reflect on their experience in the program, where they developed their data science skills with the support of faculty and...
Three key attributes define Serious Data Science; open-source software, code-first development, and a centralized data science infrastructure. This approach has been successful at driving value and...