This document describes some explorations of Databrary using a set of R functions that interact with the Databrary API.
Let’s say that I want to examine the demographic distribution of a shared dataset. The databrary_summarize_volume()
command does this for me. It downloads the spreadsheet for a given volume – the default is volume 4:
Adolph, K. (2013). Crawling and walking infants see the world differently. Databrary. Retrieved March 14, 2018 from http://doi.org/10.17910/B7RP4H
Then it creates a boxplot of the age distribution by race or ethnicity.
source("databrary_summarize_volume.R")
databrary_summarize_volume()
Here, it appears that the spreadsheet contains some records where the gender was unspecified. These appear to be “Materials” sessions. A future version of the software should eliminate these from the spreadsheet.
Let’s try another example from a different dataset.
Tamis-LeMonda, C. (2013). Language, cognitive, and socio-emotional skills from 9 months until their transition to first grade in U.S. children from African-American, Dominican, Mexican, and Chinese backgrounds. Databrary. Retrieved March 14, 2018 from http://doi.org/10.17910/B7CC74.
I’ve commented this out because it seems to time out during the knit process. The command does work when run on its own.
databrary_summarize_volume(volume = 8)
Again, there appear to be some non-session rows in the spreadsheet.
Let’s imagine that I want to list all of the video files in a given session of a specific volume. First, I need to identify the session IDs for a given volume. We’ll use the PLAY pilot volume as an illustration.
Adolph, K., Tamis-LeMonda, C. & Gilmore, R.O. (2017). PLAY Pilot Data Collections. Databrary. Retrieved March 14, 2018 from https://nyu.databrary.org/volume/444
source("databrary_download_containers_records.R")
this.vol <- 444
vol.444 <- databrary_download_containers_records(volume = this.vol)
str(vol.444)
## List of 10
## $ id : int 444
## $ name : chr "PLAY Pilot Data Collections"
## $ body : chr "Pilot data collections of 12-24 month olds playing with their caregiver in their home."
## $ creation : chr "2017-06-30T19:11:09.360464Z"
## $ owners :'data.frame': 3 obs. of 2 variables:
## ..$ name: chr [1:3] "Adolph, Karen" "Tamis-LeMonda, Catherine" "Gilmore, Rick O."
## ..$ id : int [1:3] 5 11 6
## $ permission : int 5
## $ publicsharefull: NULL
## $ publicaccess : chr "none"
## $ containers :'data.frame': 21 obs. of 4 variables:
## ..$ id : int [1:21] 18695 18801 18803 18805 18806 18807 18808 18810 18811 18813 ...
## ..$ top : logi [1:21] TRUE NA NA NA NA NA ...
## ..$ date : chr [1:21] NA "2016-05-23" "2016-07-06" "2016-07-09" ...
## ..$ release: int [1:21] NA 2 2 0 2 2 2 2 2 2 ...
## $ records :'data.frame': 38 obs. of 3 variables:
## ..$ id : int [1:38] 10820 11143 10826 11142 11141 11140 11139 11138 10814 11137 ...
## ..$ category: int [1:38] 1 6 1 6 6 6 6 6 1 6 ...
## ..$ measures:'data.frame': 38 obs. of 15 variables:
## .. ..$ 1 : chr [1:38] "05" NA "09" NA ...
## .. ..$ 4 : chr [1:38] "2014-08-17" NA "2015-03-09" NA ...
## .. ..$ 5 : chr [1:38] "Male" NA "Male" NA ...
## .. ..$ 6 : chr [1:38] "White" NA "White" NA ...
## .. ..$ 7 : chr [1:38] "Not Hispanic or Latino" NA "Not Hispanic or Latino" NA ...
## .. ..$ 8 : chr [1:38] "40" NA "41" NA ...
## .. ..$ 10: chr [1:38] "8.375" NA "6.375" NA ...
## .. ..$ 11: chr [1:38] "typical" NA "typical" NA ...
## .. ..$ 12: chr [1:38] "English" NA "English, German" NA ...
## .. ..$ 29: chr [1:38] NA "Dyadic Play" NA "Body Dimensions" ...
## .. ..$ 33: chr [1:38] NA NA NA NA ...
## .. ..$ 34: chr [1:38] NA NA NA NA ...
## .. ..$ 35: chr [1:38] NA NA NA NA ...
## .. ..$ 36: chr [1:38] NA NA NA NA ...
## .. ..$ 26: chr [1:38] NA NA NA NA ...
There’s a lot of information here. The containers
field of this list contains the session ID numbers. These can be accessed via vol.444$containers$id
.
vol.444$containers$id
## [1] 18695 18801 18803 18805 18806 18807 18808 18810 18811 18813 18814
## [12] 18815 18817 18818 18819 18821 18822 18823 18824 18825 18826
Based on the containers$date
field, it looks like the first entry is not a real session, so let’s list the videos in the second entry, the one with the test date equal to 2016-05-23.
source("databrary_list_assets.R")
this.slot <- vol.444$containers$id[2]
databrary_list_assets(volume = 444, slot = this.slot)
## $id
## [1] 18801
##
## $date
## [1] "2016-05-23"
##
## $release
## [1] 2
##
## $assets
## id format duration segment name permission size
## 1 84604 -800 3904000 0, 3904000 S#1_1-hour.mov 5 2820412092
It looks like there’s one movie file there. A quick visit to volume 444, slot 18801 at https://databrary.org/volume/444/slot/18801/- shows that this looks right.
I could download this video, but it’s big. So, let’s do that in the next vignette with a smaller video file.
Let’s download the stimulus samples used in this study:
Gilmore, R.O. (2014). Four-month-olds' discrimination of optic flow patterns depicting different directions of observer motion. Databrary. Retrieved March 14, 2018 from http://doi.org/10.17910/B7Z593.
Let’s see how many sessions there are.
this.vol <- 31
vol.31 <- databrary_download_containers_records(volume = this.vol)
vol.31$containers$id
## [1] 6437 9803
We previously showed that the first container/session doesn’t usually contain data, so let’s look inside the second container/session.
this.slot <- vol.31$containers$id[2]
(these.assets <- databrary_list_assets(volume = this.vol, slot = this.slot))
## $id
## [1] 9803
##
## $top
## [1] TRUE
##
## $name
## [1] "Top-level materials"
##
## $assets
## id format classification duration
## 1 11173 -800 3 667
## 2 11171 -800 3 667
## 3 11175 -800 3 667
## 4 11178 -800 3 4000
## 5 11179 -800 3 4000
## name
## 1 Movie depicting 180 degrees (backward) motion along a ground plane similar to that used in Experiments 1 and 2. In the actual experiment, this animation looped continuously in time.
## 2 Movie depicting 0 degrees (forward) motion along a ground plane similar to that used in Experiments 1 and 2. In the actual experiment, this animation looped continuously in time.
## 3 Movie depicting 16 degree motion along a ground plane similar to that used in Experiment 2. In the actual experiment, this animation looped continuously in time.
## 4 Movie with paired optic flow patterns depicting motion along the anterior/posterior axis on one side and optic flow that alternated between anterior/posterior motion and movement 32 degrees from that axis on the other side. This type of display was used in Experiment 3.
## 5 Movie with paired optic flow patterns depicting motion along the anterior/posterior axis on one side and optic flow that alternated between anterior/posterior motion and movement 32 degrees from that axis on the other side. This type of display was used in Experiment 3.
## permission size
## 1 5 23094
## 2 5 23325
## 3 5 24009
## 4 5 235741
## 5 5 251002
It looks like there are five videos here. Let’s download the first one.
videos <- these.assets$assets$id
source("databrary_download_asset.R")
databrary_download_asset(slot = this.slot, asset = videos[1])
That seems to work. Let’s see if we can embed it to check.
(fl <- list.files(path = ".", pattern = "\\.mp4$"))
## [1] "9803-11173-2018-03-14-1654-02.mp4" "9803-11173-2018-03-14-1716-16.mp4"
Yep, there’s a video called ‘9803-11173-2018-03-14-1654-02.mp4’ there. Here it is embedded:
Let’s see if we can download a set of videos.
lapply(videos, databrary_download_asset, slot = this.slot)
## [[1]]
## [1] 0
##
## [[2]]
## [1] 0
##
## [[3]]
## [1] 0
##
## [[4]]
## [1] 0
##
## [[5]]
## [1] 0
That is pretty satisfying.