This document describes some explorations of Databrary using a set of R functions that interact with the Databrary API.
Let’s say that I want to examine the demographic distribution of a shared dataset. The databrary_summarize_volume()
command does this for me. It downloads the spreadsheet for a given volume – the default is volume 4:
Adolph, K. (2013). Crawling and walking infants see the world differently. Databrary. Retrieved March 14, 2018 from
Then it creates a boxplot of the age distribution by race or ethnicity.
Here, it appears that the spreadsheet contains some records where the gender was unspecified. These appear to be “Materials” sessions. A future version of the software should eliminate these from the spreadsheet.
Let’s try another example from a different dataset.
Tamis-LeMonda, C. (2013). Language, cognitive, and socio-emotional skills from 9 months until their transition to first grade in U.S. children from African-American, Dominican, Mexican, and Chinese backgrounds. Databrary. Retrieved March 14, 2018 from
I’ve commented this out because it seems to time out during the knit process. The command does work when run on its own.
databrary_summarize_volume(volume = 8)
Again, there appear to be some non-session rows in the spreadsheet.
Let’s imagine that I want to list all of the video files in a given session of a specific volume. First, I need to identify the session IDs for a given volume. We’ll use the PLAY pilot volume as an illustration.
Adolph, K., Tamis-LeMonda, C. & Gilmore, R.O. (2017). PLAY Pilot Data Collections. Databrary. Retrieved March 14, 2018 from
this.vol <- 444
vol.444 <- databrary_download_containers_records(volume = this.vol)
## List of 10
## $ id : int 444
## $ name : chr "PLAY Pilot Data Collections"
## $ body : chr "Pilot data collections of 12-24 month olds playing with their caregiver in their home."
## $ creation : chr "2017-06-30T19:11:09.360464Z"
## $ owners :'data.frame': 3 obs. of 2 variables:
## ..$ name: chr [1:3] "Adolph, Karen" "Tamis-LeMonda, Catherine" "Gilmore, Rick O."
## ..$ id : int [1:3] 5 11 6
## $ permission : int 5
## $ publicsharefull: NULL
## $ publicaccess : chr "none"
## $ containers :'data.frame': 21 obs. of 4 variables:
## ..$ id : int [1:21] 18695 18801 18803 18805 18806 18807 18808 18810 18811 18813 ...
## ..$ top : logi [1:21] TRUE NA NA NA NA NA ...
## ..$ date : chr [1:21] NA "2016-05-23" "2016-07-06" "2016-07-09" ...
## ..$ release: int [1:21] NA 2 2 0 2 2 2 2 2 2 ...
## $ records :'data.frame': 38 obs. of 3 variables:
## ..$ id : int [1:38] 10820 11143 10826 11142 11141 11140 11139 11138 10814 11137 ...
## ..$ category: int [1:38] 1 6 1 6 6 6 6 6 1 6 ...
## ..$ measures:'data.frame': 38 obs. of 15 variables:
## .. ..$ 1 : chr [1:38] "05" NA "09" NA ...
## .. ..$ 4 : chr [1:38] "2014-08-17" NA "2015-03-09" NA ...
## .. ..$ 5 : chr [1:38] "Male" NA "Male" NA ...
## .. ..$ 6 : chr [1:38] "White" NA "White" NA ...
## .. ..$ 7 : chr [1:38] "Not Hispanic or Latino" NA "Not Hispanic or Latino" NA ...
## .. ..$ 8 : chr [1:38] "40" NA "41" NA ...
## .. ..$ 10: chr [1:38] "8.375" NA "6.375" NA ...
## .. ..$ 11: chr [1:38] "typical" NA "typical" NA ...
## .. ..$ 12: chr [1:38] "English" NA "English, German" NA ...
## .. ..$ 29: chr [1:38] NA "Dyadic Play" NA "Body Dimensions" ...
## .. ..$ 33: chr [1:38] NA NA NA NA ...
## .. ..$ 34: chr [1:38] NA NA NA NA ...
## .. ..$ 35: chr [1:38] NA NA NA NA ...
## .. ..$ 36: chr [1:38] NA NA NA NA ...
## .. ..$ 26: chr [1:38] NA NA NA NA ...
There’s a lot of information here. The containers
field of this list contains the session ID numbers. These can be accessed via vol.444$containers$id
## [1] 18695 18801 18803 18805 18806 18807 18808 18810 18811 18813 18814
## [12] 18815 18817 18818 18819 18821 18822 18823 18824 18825 18826
Based on the containers$date
field, it looks like the first entry is not a real session, so let’s list the videos in the second entry, the one with the test date equal to 2016-05-23.
this.slot <- vol.444$containers$id[2]
databrary_list_assets(volume = 444, slot = this.slot)
## $id
## [1] 18801
## $date
## [1] "2016-05-23"
## $release
## [1] 2
## $assets
## id format duration segment name permission size
## 1 84604 -800 3904000 0, 3904000 5 2820412092
It looks like there’s one movie file there. A quick visit to volume 444, slot 18801 at shows that this looks right.
I could download this video, but it’s big. So, let’s do that in the next vignette with a smaller video file.
Let’s download the stimulus samples used in this study:
Gilmore, R.O. (2014). Four-month-olds' discrimination of optic flow patterns depicting different directions of observer motion. Databrary. Retrieved March 14, 2018 from
Let’s see how many sessions there are.
this.vol <- 31
vol.31 <- databrary_download_containers_records(volume = this.vol)
## [1] 6437 9803
We previously showed that the first container/session doesn’t usually contain data, so let’s look inside the second container/session.
this.slot <- vol.31$containers$id[2]
(these.assets <- databrary_list_assets(volume = this.vol, slot = this.slot))
## $id
## [1] 9803
## $top
## [1] TRUE
## $name
## [1] "Top-level materials"
## $assets
## id format classification duration
## 1 11173 -800 3 667
## 2 11171 -800 3 667
## 3 11175 -800 3 667
## 4 11178 -800 3 4000
## 5 11179 -800 3 4000
## name
## 1 Movie depicting 180 degrees (backward) motion along a ground plane similar to that used in Experiments 1 and 2. In the actual experiment, this animation looped continuously in time.
## 2 Movie depicting 0 degrees (forward) motion along a ground plane similar to that used in Experiments 1 and 2. In the actual experiment, this animation looped continuously in time.
## 3 Movie depicting 16 degree motion along a ground plane similar to that used in Experiment 2. In the actual experiment, this animation looped continuously in time.
## 4 Movie with paired optic flow patterns depicting motion along the anterior/posterior axis on one side and optic flow that alternated between anterior/posterior motion and movement 32 degrees from that axis on the other side. This type of display was used in Experiment 3.
## 5 Movie with paired optic flow patterns depicting motion along the anterior/posterior axis on one side and optic flow that alternated between anterior/posterior motion and movement 32 degrees from that axis on the other side. This type of display was used in Experiment 3.
## permission size
## 1 5 23094
## 2 5 23325
## 3 5 24009
## 4 5 235741
## 5 5 251002
It looks like there are five videos here. Let’s download the first one.
videos <- these.assets$assets$id
databrary_download_asset(slot = this.slot, asset = videos[1])
That seems to work. Let’s see if we can embed it to check.
(fl <- list.files(path = ".", pattern = "\\.mp4$"))
## [1] "9803-11173-2018-03-14-1654-02.mp4" "9803-11173-2018-03-14-1716-16.mp4"
Yep, there’s a video called ‘9803-11173-2018-03-14-1654-02.mp4’ there. Here it is embedded:
Let’s see if we can download a set of videos.
lapply(videos, databrary_download_asset, slot = this.slot)
## [[1]]
## [1] 0
## [[2]]
## [1] 0
## [[3]]
## [1] 0
## [[4]]
## [1] 0
## [[5]]
## [1] 0
That is pretty satisfying.