Data Science in Meatspace

January 31, 2020

The Data Science community is dominated by folks doing amazing work with data that starts in and never leaves cyberspace. This talk is about best practices and playbooks for doing data science that involves meatspace (the opposite of cyberspace) and why R is such a great language for working with data that originated in the physical world. While the concrete examples in this talk will mostly come from the manufacturing space, where I have the most experience, I believe the themes are relevant to many meatspace workflows. We'll talk through effective playbooks that can help you navigate common tasks throughout the life-cycle of a project. We’ll also weave in how R’s glorious package ecosystem, including Tidyverse, can be combined with other languages like python, and with enterprise products like RStudio Connect to great effect. Specifically, we'll discuss practices in these areas:

•  best practices for data collection in meatspace
•  the importance of quantifying measurement system error
•  collecting the correct data for training computer vision models
•  the rarely discussed cost of maintaining models in production