Jun 2
4 min

Announcing pins for Python

Thumbnail Two hex stickers: the pins hex sticker, orange with a pin, the Python logo, two intertwining snakes. The background is a fuzzy collage of other hex stickers.

We’re excited to announce the release of pins for Python!

pins removes the hassle of managing data across projects, colleagues, and teams by providing a central place for people to store, version and retrieve data. If you’ve ever chased a CSV through a series of email exchanges, or had to decide between data-final.csv and data-final-final.csv, then pins is for you.

pins stores data on a board, which can be a local folder, or on RStudio Connect or a cloud provider like Amazon S3. Each individual object (such as a dataframe, model, or another pickle-able Python object), together with some metadata, is called a pin.

The Python pins library works with its R counterpart, so that teams working across R and Python have a unified strategy for sharing data. This work emerged as part of RStudio’s investment in Python open source, in order to support bilingual data science teams.

Getting Started

The first step to using pins is installing it from PyPI.

python -m pip install pins

In the examples below, I’ll walk through the basics of pins using a temporary directory for a board, with board_temp(). This gets deleted after you close Python, so it is not ideal for collaboration! You can use other boards, like board_rsconnect(), board_folder(), and board_s3(), in more realistic settings.

import pins
from pins.data import mtcars

board = pins.board_temp()

You can “pin” (save) data to a board with the .pin_write() method. It requires three arguments: an object, a name, and a pin type:

board.pin_write(mtcars.head(), "mtcars", type="csv")
#> Meta(title='mtcars: a pinned 5 x 11 DataFrame', description=None, created='20220601T175057Z', pin_hash='120a54f7e0818041', file='mtcars.csv', file_size=249, type='csv', api_version=1, version=Version(created=datetime.datetime(2022, 6, 1, 17, 50, 57, 80318), hash='120a54f7e0818041'), name='mtcars', user={})
#> Writing to pin 'mtcars'

Above, we saved the data as a CSV, but depending on what you’re saving and who else you want to read it, you might use the type argument to instead save it as a feather, parquet, or joblib file.

You can later retrieve the pinned data with .pin_read():

#>    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> 0 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> 1 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> 2 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> 3 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> 4 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2

You can search for data using .pin_search() and .pin_list().

# prints out a list of all pins
# board.pin_list()

# searches for pins containing "cars"
#>      name type  ... file_size                                               meta
#> 0  mtcars  csv  ...       249  Meta(title='mtcars: a pinned 5 x 11 DataFrame'...
#> [1 rows x 6 columns]

Two more pieces of important functionality exist:

  • .pin_write() won’t delete existing data, but versions your data.
  • .pin_read() caches your data, so subsequent reads are much faster.

See getting started in the pins documentation for more information.

Interoperability with R pins

Pins stored with Python can be read with R, and vice-versa.

For example, here is R code that reads the mtcars pin we wrote to the board above. Note that TEMP_PATH refers to the temporary directory we created in this blog post for our Python board.


board <- board_folder(TEMP_PATH)
board %>% pin_read("mtcars")
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

This is especially useful when colleagues prefer one language over the other. For real collaborative work like this, you would use a board like board_rsconnect() or board_s3().

Going further

The real power of pins comes when you share a board with multiple people. To get started, you can use board_folder() with a directory on a shared drive or in DropBox, or if you use RStudio Connect you can use board_rsconnect():

board = pins.board_rsconnect()
board.pin_write(tidy_sales_data, "michael/sales-summary", type="csv")

Then, someone else (or an automated report) can read and use your pin:

board = pins.board_rsconnect()

The pins package also includes boards that allow you to share data on services like Amazon’s S3 (board_s3()), with plans to support other backends such as Google Cloud Storage and Azure’s blob storage.

Get in touch

We are so happy about releasing pins for Python, and we want to make sure it supports your workflow. Join our discussion on RStudio Community to let us know what you’re working on, and how pins could help!

More On Open Source

Stay Connected

Get updates when there's a new post.