Data Scientists are trailblazers. They look for value inside of data and seek to ask the right questions, disseminating insights to their stakeholders. In the world of business intelligence, those “on the ground” need more than just static reports. They need access to clear reproducible insights for exploration, feedback, and action, all in the right place at the right time. This may seem daunting but fortunately, BI and analytics have been tackling these challenges for some time. Embedded analytics integrates data analysis inside workflows, applications, and processes that people use every day, helping move the point of discovery to the point of decision.
In this post, we are going to dive into how the data scientist can integrate insights, increase adoption, and effectively empower end-users to make better decisions. With a code-first approach, data science is perfectly suited to rapidly integrate organizational insights with everyday systems. Our last post covered practical ways that BI and Data Science collaborate with data handoffs. Now let’s look further at how analysts, decision-makers, and end-users can benefit from “tightly tying the rope” between embedded analytics and data science in a secure, scalable, and flexible way.
With a code-first approach, data science is perfectly suited to rapidly integrate organizational insights with everyday systems.
For an enterprise, data security is regularly a top concern across the entire organization. Security must be front and center as you plan your path forward and coordinate sharing across stakeholders. Not everyone will likely require (or should have) access to the same data. This is where having a system in place that customizes security and permissions for various predetermined roles, often at the data and row-level, will be critical. You need to define which of your stakeholders can view and collaborate on various data products. For example, will only internal users have access, or will outside stakeholders and/or customers also be consuming information as a service? Will you need to integrate with existing services?
No matter the answer to these questions, considerable work will be involved to ensure that proper security is enforced and organizational standards are met. This is one of the major reasons that RStudio Connect is considered, to simplify the deployment of data products for multiple users, integrating directly with existing security protocols like LDAP/Active Directory, OAuth, PAM, SAML, and more.
As your user base grows, effective communication of results often requires access to the right tools for scheduling and alerts. Your team will likely need automated systems for updates and emails at critical times. No one wants to constantly monitor dashboards or receive non-relevant alerts. Having a system that helps you to administer alerts and scheduling will not only make your life easier but will make working and communicating across multiple teams and stakeholders over the long run more effective. Learn about how RStudio Connect makes this easy in our “Avoid Dashboard Fatigue” webinar here.
Embedded analytics runs on scalable platforms, particularly with software as a service (SaaS) to manage cost and capacity over time. As a data scientist, you can plug into these, allowing end-users to utilize models and increase adoption. The Plumber API (R-based) and Flask API (Python-based) both work alongside each other with RStudio Connect to provide the perfect combination of organizational access and integration. This in turn provides access without requiring R or Python knowledge for users. In addition, you can integrate work from both these languages, giving data science teams a clear point of collaboration. This is perfect for data science as a service (DSaaS) where models may need to be deployed and reused by multiple customers and different data sets. You can learn more about how the Plumber API allows data science to be used by a wide range of tools and technologies here in this webinar.
Insights from data science have huge potential, but they’re only as good as the runway given for exploration, visualization, and changes on the fly. Analysts and BI teams need access to fast, flexible, and performant updates, relevant to the question at hand.
Even when reporting was a part of the equation, it almost always boiled down to one major requirement; customizable self-service analytics, that could be reproduced and deployed quickly.
As a product manager for embedded analytics, I’ve spoken to thousands of customers and carefully analyzed their top needs. Even when reporting was a part of the equation, it almost always boiled down to one major requirement; customizable self-service analytics, that could be reproduced and deployed quickly. On one end of the spectrum, this may mean simply providing diversity and access to input controls and the ability to import and/or export data to ensure the flexibility required. On the other end, it may mean going the extra mile to connect data science results to BI systems (with APIs like Plumber) to ensure that end-users have full access to results directly inside a pre-built BI tool for full-service data analytics. This means putting the keys to the kingdom (with the right access) into the hands of managers, decision-makers, and final users, communicating results that are meant to be explored based on changes that are happening in real-time.
Also important to consider when scaling out new systems is the time that will be required from concept to release. How often do new data and insights need to be put into different applications, visualizations, or sets of controls? What type of performance is expected and how many users will ultimately need to be supported? How much customization will be required with the final visualization of results?
Luckily data scientists are already in a place where they are accustomed to working with code and have full access to a range of open-source packages tailored for building out interactive applications, input controls, and custom visualizations. Shiny is one such package for R, which combines computational power with interactivity for the modern web. Bokeh and Dash are similar packages for Python, which in addition to Shiny are fully supported for easy deployment inside RStudio Connect.
Organizations and their stakeholders depend on data scientists to forge a path forward through data. Like trailblazers, they are carving a path and overcoming unique challenges and obstacles that others can follow with BI tools. Many organizations are struck by the sheer speed that insights can be deployed using tools and languages that are native to their data science teams today. RStudio supports the direct creation and integration of open-source data science and stands ready to help companies and organizations expand with enterprise-level tooling and deployment.
Curious to learn more about our approach? Check out this previous post where we explore not only the importance of agility and durability to Serious Data Science but the key aspect of credibility and how having the correct access and tools to find insights that are relevant builds trust and a path forward to understanding.
Many tools used routinely by software developers can also be useful to data scientists.
In this post, we explore possible challenges to putting Shiny in production and how to overcome them.