Fighting Childhood Cancer with RStudio Server Pro
Prior to COVID-19, the Childhood Cancer Data Lab (CCDL) regularly held short in-person workshops to bolster analytical skills in the pediatric cancer research community. We teach workshop participants to analyze their own experimental data using their own laptops so they walk out the door ready to solve real problems. This end goal presents a problem: if twenty participants are in a workshop, a small team of instructors needs to set up twenty laptops with the same software environment in a short period of time. Our solution was Dockerizing RStudio and all software dependencies using base images from the Rocker Project. Installing Docker on participant laptops means instructors had to manage one moderately difficult installation, instead of tens of dependency installations that varied in difficulty by operating system.
When pandemic-related university closures shuttered pediatric cancer labs, researchers turned to rapidly expanding their data analysis skill sets to keep the research moving forward. Demand for our workshops was high, but we knew managing Docker installation remotely was infeasible. We turned to RStudio professional products to quickly pivot to virtual workshops and help the community we serve.
RStudio Server Pro allowed us to set up a self-hosted server on Amazon Web Services and deliver a high-quality workshop experience in less than a month. The RStudio Team AWS CloudFormation template gave our small team a great starting point to rapidly get from product documentation to deployment and start working on what mattered most.
We use a single RStudio server that we scale up to a large size during workshops and keep at a smaller size in between so that participants can come back to the material on their own time. For each workshop, we create a new disk for the participants to use. This allows us to keep their access available for a few months before removing the drives. RStudio uses a standard Linux login process that makes user management the same as administering any Linux server. Once the server was configured, the monitoring of the server, creation of users, and dispersal of passwords was able to be done by one of the scientists involved in the training rather than one of our engineers.
Each workshop participant had access to identical computing environments. We preinstalled all of the R packages they would need, along with external software tools that we wanted to demonstrate. We were able to easily populate their home directories with training materials prior to the workshop’s start and provide read-only access to common files on the server to save space.
Photo taken prior to Covid-19.
As a language, R is powerful for pediatric cancer research because of its rich ecosystem of general data science packages like the Tidyverse and domain-specific packages available from Bioconductor. RStudio is an incredible tool for instruction. By using R Notebooks in our workshops, we model essential skills for robust computational research: coding over point-and-click programs for reproducible results, documenting your analysis along the way, and visualizing the underlying data to improve your understanding. In addition, participants have terminal access to run command line bioinformatics tools without navigating away from the RStudio IDE.
While the RStudio IDE fits our needs for instruction, the Professional Edition of RStudio Server has key features that make it well-suited to administering and teaching a remote workshop with a team of our size. Our team of engineers did not need to create a monitoring system for the server. Instead, we hit the ground running with RStudio Server Pro’s administrative dashboard. Instructors can monitor active sessions, RAM, and CPU utilization all from a dashboard and assume control of or terminate sessions as necessary. Access to an administrative dashboard is helpful not only during the workshop, but for understanding the compute requirements during material development. An important feature of our workshops are periods of time where participants work on exercises or process and analyze data from their own experiments. Participants are able to run multiple R sessions on RStudio Server Pro, which allows them to work on processing their own data “in the background” and start a new session for the day’s instruction. Because we run multiple workshops a year, we expect that the R and package versions we use will need to be updated while past participants still have access to the server. Given our emphasis on reproducibility, we are excited about the additional control over R versions and environments that RStudio Server Pro affords.
Our virtual training workshops have been a great experience for training participants and CCDL staff alike. As one participant put it, “It's been the best class I've taken since this quarantine began and I could not speak highly enough of the instructors and the topics.” Without RStudio Server Pro, there would be no workshop to speak of highly. Training participants continue to have access to the server for a period of time after training. Seamless access to compute resources allows beginners to continue to practice their skills without obstacles that are specific to their institution.
Even if we return to in-person workshops, our self-hosted RStudio Server Pro is a much better solution than the Dockerized RStudio installation. There’s no waiting for installation; we immediately start covering R basics with little to no friction for workshop participants. We’re more confident in helping others host workshops using our training material, as we can have them use our server without navigating system administration support at their own institutions. RStudio’s products will be key to our efforts in scaling training and, in turn, leveling up the analytical capacity of the pediatric cancer field.
Photo taken prior to Covid-19.
Alex’s Lemonade Stand Foundation is one of the leading funders of pediatric cancer research in the US and Canada. Since its inception in 2005, ALSF has funded more than 1,000 projects at nearly 150 institutions across the United States and Canada. The Childhood Cancer Data Lab, an initiative of Alex’s Lemonade Stand Foundation, was founded in August 2017 with the mission of empowering pediatric cancer experts poised for the next big discovery with the knowledge, data, and tools to reach it.
RStudio provides open source and enterprise-ready professional software for data science teams using R and Python.