Purpose

This document describes some explorations of Databrary using a set of R functions that interact with the Databrary API.

Use case: Summarizing the demographics of a dataset

Let’s say that I want to examine the demographic distribution of a shared dataset. The databrary_summarize_volume() command does this for me. It downloads the spreadsheet for a given volume – the default is volume 4:

Adolph, K. (2013). Crawling and walking infants see the world differently. Databrary. Retrieved March 14, 2018 from http://doi.org/10.17910/B7RP4H

Then it creates a boxplot of the age distribution by race or ethnicity.

source("databrary_summarize_volume.R")
databrary_summarize_volume()

Here, it appears that the spreadsheet contains some records where the gender was unspecified. These appear to be “Materials” sessions. A future version of the software should eliminate these from the spreadsheet.

Let’s try another example from a different dataset.

Tamis-LeMonda, C. (2013). Language, cognitive, and socio-emotional skills from 9 months until their transition to first grade in U.S. children from African-American, Dominican, Mexican, and Chinese backgrounds. Databrary. Retrieved March 14, 2018 from http://doi.org/10.17910/B7CC74.

I’ve commented this out because it seems to time out during the knit process. The command does work when run on its own.

databrary_summarize_volume(volume = 8)

Again, there appear to be some non-session rows in the spreadsheet.

Use case: Listing all the videos in a session

Let’s imagine that I want to list all of the video files in a given session of a specific volume. First, I need to identify the session IDs for a given volume. We’ll use the PLAY pilot volume as an illustration.

Adolph, K., Tamis-LeMonda, C. & Gilmore, R.O. (2017). PLAY Pilot Data Collections. Databrary. Retrieved March 14, 2018 from https://nyu.databrary.org/volume/444
source("databrary_download_containers_records.R")
this.vol <- 444
vol.444 <- databrary_download_containers_records(volume = this.vol)
str(vol.444)
## List of 10
##  $ id             : int 444
##  $ name           : chr "PLAY Pilot Data Collections"
##  $ body           : chr "Pilot data collections of 12-24 month olds playing with their caregiver in their home."
##  $ creation       : chr "2017-06-30T19:11:09.360464Z"
##  $ owners         :'data.frame': 3 obs. of  2 variables:
##   ..$ name: chr [1:3] "Adolph, Karen" "Tamis-LeMonda, Catherine" "Gilmore, Rick O."
##   ..$ id  : int [1:3] 5 11 6
##  $ permission     : int 5
##  $ publicsharefull: NULL
##  $ publicaccess   : chr "none"
##  $ containers     :'data.frame': 21 obs. of  4 variables:
##   ..$ id     : int [1:21] 18695 18801 18803 18805 18806 18807 18808 18810 18811 18813 ...
##   ..$ top    : logi [1:21] TRUE NA NA NA NA NA ...
##   ..$ date   : chr [1:21] NA "2016-05-23" "2016-07-06" "2016-07-09" ...
##   ..$ release: int [1:21] NA 2 2 0 2 2 2 2 2 2 ...
##  $ records        :'data.frame': 38 obs. of  3 variables:
##   ..$ id      : int [1:38] 10820 11143 10826 11142 11141 11140 11139 11138 10814 11137 ...
##   ..$ category: int [1:38] 1 6 1 6 6 6 6 6 1 6 ...
##   ..$ measures:'data.frame': 38 obs. of  15 variables:
##   .. ..$ 1 : chr [1:38] "05" NA "09" NA ...
##   .. ..$ 4 : chr [1:38] "2014-08-17" NA "2015-03-09" NA ...
##   .. ..$ 5 : chr [1:38] "Male" NA "Male" NA ...
##   .. ..$ 6 : chr [1:38] "White" NA "White" NA ...
##   .. ..$ 7 : chr [1:38] "Not Hispanic or Latino" NA "Not Hispanic or Latino" NA ...
##   .. ..$ 8 : chr [1:38] "40" NA "41" NA ...
##   .. ..$ 10: chr [1:38] "8.375" NA "6.375" NA ...
##   .. ..$ 11: chr [1:38] "typical" NA "typical" NA ...
##   .. ..$ 12: chr [1:38] "English" NA "English, German" NA ...
##   .. ..$ 29: chr [1:38] NA "Dyadic Play" NA "Body Dimensions" ...
##   .. ..$ 33: chr [1:38] NA NA NA NA ...
##   .. ..$ 34: chr [1:38] NA NA NA NA ...
##   .. ..$ 35: chr [1:38] NA NA NA NA ...
##   .. ..$ 36: chr [1:38] NA NA NA NA ...
##   .. ..$ 26: chr [1:38] NA NA NA NA ...

There’s a lot of information here. The containers field of this list contains the session ID numbers. These can be accessed via vol.444$containers$id.

vol.444$containers$id
##  [1] 18695 18801 18803 18805 18806 18807 18808 18810 18811 18813 18814
## [12] 18815 18817 18818 18819 18821 18822 18823 18824 18825 18826

Based on the containers$date field, it looks like the first entry is not a real session, so let’s list the videos in the second entry, the one with the test date equal to 2016-05-23.

source("databrary_list_assets.R")
this.slot <- vol.444$containers$id[2]
databrary_list_assets(volume = 444, slot = this.slot)
## $id
## [1] 18801
## 
## $date
## [1] "2016-05-23"
## 
## $release
## [1] 2
## 
## $assets
##      id format duration    segment           name permission       size
## 1 84604   -800  3904000 0, 3904000 S#1_1-hour.mov          5 2820412092

It looks like there’s one movie file there. A quick visit to volume 444, slot 18801 at https://databrary.org/volume/444/slot/18801/- shows that this looks right.

I could download this video, but it’s big. So, let’s do that in the next vignette with a smaller video file.

Use case: Downloading the video from a sesssion

Let’s download the stimulus samples used in this study:

Gilmore, R.O. (2014). Four-month-olds' discrimination of optic flow patterns depicting different directions of observer motion. Databrary. Retrieved March 14, 2018 from http://doi.org/10.17910/B7Z593.

Let’s see how many sessions there are.

this.vol <- 31
vol.31 <- databrary_download_containers_records(volume = this.vol)
vol.31$containers$id
## [1] 6437 9803

We previously showed that the first container/session doesn’t usually contain data, so let’s look inside the second container/session.

this.slot <- vol.31$containers$id[2]
(these.assets <- databrary_list_assets(volume = this.vol, slot = this.slot))
## $id
## [1] 9803
## 
## $top
## [1] TRUE
## 
## $name
## [1] "Top-level materials"
## 
## $assets
##      id format classification duration
## 1 11173   -800              3      667
## 2 11171   -800              3      667
## 3 11175   -800              3      667
## 4 11178   -800              3     4000
## 5 11179   -800              3     4000
##                                                                                                                                                                                                                                                                             name
## 1                                                                                          Movie depicting 180 degrees (backward) motion along a ground plane similar to that used in Experiments 1 and 2. In the actual experiment, this animation looped continuously in time.
## 2                                                                                             Movie depicting 0 degrees (forward) motion along a ground plane similar to that used in Experiments 1 and 2. In the actual experiment, this animation looped continuously in time.
## 3                                                                                                              Movie depicting 16 degree motion along a ground plane similar to that used in Experiment 2. In the actual experiment, this animation looped continuously in time.
## 4 Movie with paired optic flow patterns depicting motion along the anterior/posterior axis on one side and optic flow that alternated between anterior/posterior motion and movement 32 degrees from that axis on the other side. This type of display was used in Experiment 3.
## 5 Movie with paired optic flow patterns depicting motion along the anterior/posterior axis on one side and optic flow that alternated between anterior/posterior motion and movement 32 degrees from that axis on the other side. This type of display was used in Experiment 3.
##   permission   size
## 1          5  23094
## 2          5  23325
## 3          5  24009
## 4          5 235741
## 5          5 251002

It looks like there are five videos here. Let’s download the first one.

videos <- these.assets$assets$id
source("databrary_download_asset.R")
databrary_download_asset(slot = this.slot, asset = videos[1])

That seems to work. Let’s see if we can embed it to check.

(fl <- list.files(path = ".", pattern = "\\.mp4$"))
## [1] "9803-11173-2018-03-14-1654-02.mp4" "9803-11173-2018-03-14-1716-16.mp4"

Yep, there’s a video called ‘9803-11173-2018-03-14-1654-02.mp4’ there. Here it is embedded:

Let’s see if we can download a set of videos.

lapply(videos, databrary_download_asset, slot = this.slot)
## [[1]]
## [1] 0
## 
## [[2]]
## [1] 0
## 
## [[3]]
## [1] 0
## 
## [[4]]
## [1] 0
## 
## [[5]]
## [1] 0

That is pretty satisfying.