Creating APIs for Data Science With plumber

Thumbnail Pipes on a white wall

Photo by Khara Woods on Unsplash

Whether it’s pulling data from Twitter, accessing the most recent weather information, or tracking where a particular plane is going, application programming interfaces (APIs) are often part of our data science pipeline. But why would you want to create an API? And how difficult is it to do?

APIs make it easy to scale the reach of your work. They allow your data science results to be responsive, accessible, and automated. And thanks to the plumber package, you can convert your R functions into API endpoints using just a few special comments.

What is an API?

APIs are messenger systems that allow applications to communicate with one another. You send a request to the API. The API takes your request to the server and receives a response. Then, the API delivers the response back to you.

You may already use APIs to retrieve data as part of your data science pipeline. For example, the rtweet package allows R users to interact with Twitter’s API. You request data through the package and then receive the API’s data as a response.

Graphic of a laptop sending a request to an API, the API getting information from a server, then responding to the request back to the computer

APIs communicate via “endpoints”. The endpoint receives a request to take an action. For example, when you run usrs <- search_users("#rstats", n = 1000) from rtweet, you are interacting with an endpoint that returns a list of users.

Since APIs allow different systems to interact when they wouldn’t be able to otherwise, they are incredibly powerful tools to increase interactivity and reach.

Why would a data scientist want to create an API?

At some point, you may want to share your R output with others. If the other person is not an R user, they may not be able to use your work without translating it into their language of choice.

If your results are available in the form of an API, then anybody can import your results without this difficult translation step. API responses are readable across platforms and applications. Just as you use R to interact with the Twitter API, others can access the Twitter API with other tools.

Let’s say you are working with a website developer who uses Javascript. You just developed a model in R and you’d like to share the results. You can send the developer an API so that they can display the results on a website without reconstructing your model in another language. The website can show updated results because it is communicating with your API in real-time. You do not have to manually refresh your code each time there’s a change in the data. For example, RStudio’s pricing calculator uses an API created from a backend R model to feed the results into our website!

Making your data science work available through an API reduces the handoff between R and other tools or technologies. More people can access your results and use them to make data-driven decisions.

We recommend reading James Blair's post on how APIs increase the impact of your analyses, RStudio and APIs.

Creating an API with plumber

The plumber package allows you to create APIs from your R code. It does this through special comments that give instructions on how to turn the functions in your script into API endpoints. It’s pretty amazing — with this package, your R code is easily accessible from other tools and frameworks.

Here’s an example plumber script. Notice how familiar it looks:

Screenshot of an R script decorated with plumber comments

Let’s walk through how to convert this R function into an API.

1. Write standard R code

Let’s say we want to randomly choose 100 numbers and create a histogram. We write out a function in R:

function() {
  rand <- rnorm(100)
  hist(rand)
}

Notice that the function is not assigned to an object. We can test it out by running the below:

test <- function() {
  rand <- rnorm(100)
  hist(rand)
}

test()

2. Add special comments

Now, we instruct plumber on how to turn the function into an API endpoint. Plumber parses your script to identify special comments beginning in the #* or @ symbols. It uses them to convert your script into an API.

Let’s give our function a description using #*. Here, we’re telling plumber to call this function “Plot a histogram”:

#* Plot a histogram

Now, let’s tell plumber that when we get a request, execute this function and return the plot:

#* @get /plot

By default, plumber will turn your response into JSON format. You can adjust the type of response if that is not the output you would like. For example, our function outputs an image. It doesn’t make sense to return an image in JSON format. We can “serialize” our result so that the API returns a PNG rather than JSON.

#* @serializer png

This is just one example of what an API can do. To learn more, check out the plumber documentation on rendering output.

Now, our script looks like this:

# plumber.R
library(plumber)

#* Plot a histogram
#* @serializer png
#* @get /plot
function() {
  rand <- rnorm(100)
  hist(rand)
}

Congratulations! We wrote an API using R.

3. Plumb it

Now that we’ve created an API, it’s time to “plumb” (run) it!

After we write our plumber script in the RStudio IDE, a special button appears that allows us to “Run API”:

Screenshot of R console highlighting the button where it says we can run the API

Running the API generates an interface for our API.

 
Plumbing an API in RStudio

The interface provides a way to interact with our API’s endpoints. We can test out different calls to make sure that everything runs as expected.

Button to get request from the API interface generated by plumber

Endpoint in our code and the interface

Run ‘try it out’ and then ‘execute’ to see what the API returns (in our case, an image of a histogram):

 
Testing out our API through the interface

Notice that you never left RStudio to create, run, and test your API!

4. Deploy the API

We can develop and test an API on our laptop, but how do we share it with others (for example, the website developer we mentioned previously)? We do not want our laptop to be serving the requests for a variety of reasons, including maintenance and security concerns.

RStudio Connect is an enterprise publishing platform that deploys APIs created by plumber with versioning, dependency management, and authentication. RStudio Connect also supports the deployment of many other data product formats, including Python APIs developed using frameworks such as Flask, FastAPI, Quart, Falcon, and Sanic. See the RStudio Connect Python Updates blog post for more info on deploying Python APIs on Connect.

 
Editing access settings in RStudio Connect

RStudio Connect also ensures that you are not consuming more system resources than is necessary. It automatically manages the processes necessary to handle the current load and balances incoming traffic across all available processes. It will also shut down idle processes when they’re not in use.

Learn more about hosting Plumber APIs.

Now that our API is hosted, anybody can use it in their application! Access it on RStudio Connect: https://colorado.rstudio.com/rsc/plumber-histogram-example/.

Learn More

APIs increase the impact of your data science work by making your code accessible to a larger audience. Thanks to plumber, you can create them by providing a few special comments in your R code.

More On Products and Technology

Stay Connected

Get updates when there's a new post.