e-poster presentations at rstudio::conf 2018!
e-posters will be shown during the opening reception on Thursday evening February 1st. We’ll have a big screen, power, and internet; you’ll get to interact with an innovative display or demo and its creator!
|Dorris Scott||University of Georgia||Food for Thought: Analyzing Public Opinion on the Supplemental Nutrition Assistance Program||This project explores public opinion on the Supplemental Nutrition Assistance Program (SNAP) in news and social media outlets and tracks elected representatives’ voting records on issues relating to SNAP and food insecurity. This project was conducted in conjunction with the Atlanta Community Food Bank (ACFB) and the Data Science for Social Good program at Georgia Institute of Technology. We used machine learning, sentiment analysis, and text mining to analyze national and state level coverage of SNAP to gauge perceptions of the program over time across these outlets. Results indicate that the majority of news coverage is negative, more partisan news outlets have more extreme sentiment, and that clustering of negative reporting on SNAP occurs in the Midwest. Utilizing a variety of R packages, such as plotly, leaflet, and tidycensus, we created a set of Shiny apps to display our final results. These Shiny apps were bundled in an app that the ACFB Advocacy Team can use to better communicate to stakeholders about SNAP.|
|Viola Glenn||Elevate Labs||Buffy vs. Rachel: Can tidytext finally settle the score?||Girls Who Code (GWC) is a nonprofit that seeks to close the gap between high school girls and computer science, usually by way of building websites and apps. In July, I reached out to my local chapters to see if they had girls who had ever heard of data science, and was astounded and delighted to find out there are girls out there who not only know data science, but want more.Since then I’ve been working on projects to pitch them a short intro to data science and would love to share my most promising lead: A tidytext analysis of the representation of women in Buffy the Vampire Slayer—paired with a Shiny interface to share communicate findings.
This analysis leverages the tidytext and Shiny to a) demystify simple text analytics for my local GWC chapters while giving them the tools to digest media in an informed and critical way and b) answer some of the geekiest research questions out there.
Question 1: How are women represented in popular television today? Following work from Julia Silge and Dave Robinson on verb use and gender in Jane Austen, I use tidytext to identify verbs and adjectives that are uniquely associated with men and women. Utilizing transcripts, I extend this to include differentiation between when characters are speaking and when they are being described. Further, I explore differences in these patterns when same-gender pairs are describing one another vs. opposite-gender pairs. Finally, I explore the relationship between whether an episode passes the Alison Bechdel test and viewer ratings to connect representation to societal expectations.
Question 2: Are there shows that are different? Several shows boast that they portray women in a fundamentally different way. I compare two of these (Buffy the Vampire Slayer from the ’90s and today’s Wynonna Earp) to their more popular contemporaries (Friends for the ‘90s, Game of Thrones today) to analytically show why we might (or might not!) want to have our girls spend more time with the former.
The results are an approachable and interesting analysis that teaches not just how to be tidy, but how to be a thoughtful consumer of media. Would love to have the chance to share with the RStudio audience, but will see in at the conference no matter what!
|Nicholas Tierney||Monash University||Tidy principles for missing data exploration and analysis.||Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. There is a large body of research describing methods for missing data imputation, and exploring missing data dependence. However, there is comparatively little research on how to make the process of exploring missing values easy. This talk describes two R packages, naniar, and visdat, designed to make it easier to explore and analyse missing data within the tidyverse. Using real world data, this talk describes tidyverse workflows to answer the questions:
(i) Where are the missing values?
|Tyler Morgan-Wall||Institute for Defense Analyses||skpr: Generate and Evaluate Optimal Experimental Designs in R||Generating high quality, replicable experimental results is the goal of every scientist. Achieving that goal starts well before the first datum is collected, with planning the experimental design. A well planned experiment will have an adequate sample size to test the hypotheses under investigation, but not so many runs that money, time, and effort is needlessly wasted. skpr is an R Design of Experiments package developed with Shiny, Rcpp, and RcppArmadillo to flexibly generate and evaluate optimal experimental designs, which maximize the amount of information for a given set of factors and runs. This talk/poster will demonstrate how an R user can use skpr and the included Shiny GUI to quickly generate, evaluate, and plan experimental designs, and use the GUI to springboard to more complex analyses in code.|
|Jessica Higgins||Nuventra Pharma Sciences||R for clinical trial data manipulation and cleanup||Traditionally, the pharma industry has been using SAS for all data management, organization, and analysis, however this is not required by the FDA. We have developed a validation plan for working with clinical trial data in a regulated environment and providing our clients with FDA submission ready datasets and beautiful tables, listings, and figures (TLFs) that support their trial data and findings.|
|Reina Chau||NCCPA||Elegant, dynamic, and interactive reports with Shiny and Knitr package in R||Interactive web applications and dynamic report generation, developed using Shiny and Knitr in R, have gained extensive popularity among R users. Shiny is a free, open source package in R, and it provides a powerful framework for building interactive web applications. Knitr is also an open source R package, but has the capability to create elegant, dynamic and reproducible reports with the embedding of R code into Latex documents. Both packages are essential R tools for gathering, displaying, and communicating insights from data, and the usage of their applications has the potential to grow in many areas, within and outside of the R user communities.
In this proposal, I will illustrate three Shiny applications that were designed and implemented as an effective tool to solve daily operations and research problems without the need of acquiring a BI tool.
As a data analyst, my key roles are to perform quality assurance on our exams, analyze data and generate summary reports to meet psychometric needs. The process of generating effective reports can be painful and time consuming, especially as it may require the use of spreadsheets to create data visualization and manually compile reports. To eliminate the need for repetitive tasks and in order to reduce time and manual labor to generate reports, the first Shiny application was built to mitigate and free up these daily operations. The app utilizes easy-to-use Shiny widgets like buttons, text boxes, etc., and allows any user to easily update a spreadsheet and dynamically generate a summary report with a click of a button.
Nowadays, most companies will publish an annual report to provide an overview of how their businesses, services, and products are performing. Usually, these reports are pre-designed with specific layouts and unique colors to meet a company’s reporting style. In the second Shiny application, I will illustrate how our research team has utilized the powerful typesetting system of Latex with R Knitr, to create an elegant, dynamic, and interactive annual report.
Lastly, Shiny applications can be used to facilitate meetings and showcase research and model simulations at conferences. The final application was developed by our psychometric team, and it was built as a software interface to help with the standard-setting process. The goal of the standard-setting process is to determine a recommended cut score for an examination. Many content experts were invited to the meetings and used the app to provide expertise and ratings to where they think the cut score of their examinations should be. The Shiny application later aggregates these ratings and then produces plots and data visualization to show where each panelist’s ratings were compared to others. This application has helped the psychometric team facilitate the meeting in a timely and effective manner, and it eliminates the cost of acquiring a new BI tool.
|Alex Albright||Harvard University||Fueling FoRays with Curious Questions||How can new users learn to work with data in R? While there are many incredible courses and resources for learning R, nothing is as effective as finding an intriguing question and challenging oneself to use R to answer it. In particular, generating visual answers can be astonishingly satisfying for new users and can drive deep dives into R workflow, libraries, and community resources.
I use this presentation to show how curiosity can fuel forays (‘foRays’) into R data exploration and visualization. I present snippets of scripts and notebooks written to visually answer the following questions, running the gamut from quirky and light to serious and weighty: How often do female athletes grace the cover of Sports Illustrated? How have Pixar movie ratings changed over time? How did US Senators vote on health care amendments? How do demographics differ across Uber and Taxi drivers? Which characters are closest on the TV show ‘Friends’? Which state is the best at the New Yorker caption contest? How have demographics changed in the sciences?
The scope of questions that can fuel engagement with R is limitless. Channeling creativity into projects is an unrivaled approach for growing comfortable with R.
|John Blischak||University of Chicago||The workflowr R package: a framework for reproducible and collaborative data science||Analyzing complex data requires organization and planning, which is especially challenging when juggling multiple projects. For example, it can be difficult to recall which version of the data or code was used to produce a particular figure. The workflowr R package helps data scientists organize their projects by combining several existing packages for open and reproducible research. Specifically, it combines the literate programming paradigm (built upon knitr and rmarkdown) and the principles of version control (built upon git2r). Further, workflowr is designed to work seamlessly with the RStudio IDE through the rstudioapi package. Any data scientist familiar with R can quickly start using workflowr for their project. workflowr includes four key features to promote effective project management: (1) workflowr automatically creates a directory structure for organizing data, code, and results; (2) workflowr uses the version control system Git to track different versions of the code and results without the user needing to understand the complexities of Git; (3) to support reproducibility, workflowr provides a custom R Markdown template that automatically displays the version of the code that produced the results and; (4) workflowr facilitates hosting webpages online using GitHub Pages to share results. As a proof of concept, we are currently using the workflowr framework to manage a large genetics research project; the results can be viewed at https://jdblischak.github.io/singlecell-qtl. Our goal is that any data scientist using workflowr will find it easier to communicate their reproducible research results. Documentation and source code for workflowr are available at https://github.com/jdblischak/workflowr.|
|Matthew Avery||Institute for Defense Analyses||ciTools: An R Package to Ease Model Inference||ciTools is a uniform, model-invariant, data first R package that makes working with prediction uncertainties easy and intuitive. Existing tools in R tend to focus on parameter inference, leaving users more concerned with prediction inference (including confidence intervals, prediction intervals, quantile estimates and probability estimates) to build custom code. Worse, different modeling tools (lm, glm, lmer, etc.) have unique interfaces, meaning users must learn different syntax for performing the same inference. ciTools addresses these issues by using generics inspired by add_predictions() from the modelr package. add_ci() produces confidence intervals, add_pi() produces prediction intervals, add_probs() produces probability estimates, and add_quantile() estimates quantiles. When simple, out-of-the-box solutions already exist in R (such as confidence intervals for lm objects), ciTools provides wrappers that integrate these solutions with the tidyverse. When such solutions do not exist, ciTools provides novel implementations, including parametric and bootstrap approaches when appropriate. By using generics, ciTools allows users to learn one function for each form of inference, regardless of the type of model they’ve fit. Options with conservative defaults allow more advanced users to customize their output. ciTools uses a data-first approach, allowing users familiar with the tidyverse to seamlessly integrate it into their workflow for both inference and plotting. Finally, the use of generic functions means model objects not supported in the current build can be added easily. This presentation illustrates the use of ciTools through real-data examples, discusses the approach used for constructing the package, and shows how ciTools fits in with tidyverse.|
|Earo Wang||Monash University||When time series meets tibble, it’s tsibble!||Time series data can be frustrating to work within R. The real-world data arrives in various forms:
However, none of these input formats are accepted and supported by the traditional time series objects, like `ts`, `zoo` and `xts`. The common data structure underlying these objects is a vector or matrices, which are model-centric rather than data-centric. To wrangle, analyze and visualize temporal data, the wide “matrix” format is as if fitting a square peg into a round hole.
A data class of `tbl_ts` (or “tsibble”) has been newly implemented to store and manage temporal data, following the “tidy data” principles. It aims at not only making time series analysis easier but also building a uniformly accepted data format to forecasting.