2016-08-16

Curating identifiable data with Databrary

Acknowledgments

TL; DR

  • Psychology is harder than physics
  • Video uniquely informative about behavior
  • Video is essential for reproducible science of behavior
    • Even where video is not the primary raw data
  • Databrary specializes in storing and sharing research video + metadata

TL; DR

  • Video is identifiable, but it can be securely stored, managed, and shared
  • All behavioral scientists should collect and share videos of their research methods.
  • Video (+ other streams) can serve as the core of a multivariate/multi-level big data science of behavior

Why psychology is harder than physics

Why psychology is harder than physics

Video is a uniquely informative source of evidence about behavior

(Yu 2016)

(Wilkinson 2014)

(Wilkinson 2014)

(Wilkinson 2014)

Is behavioral science reproducible?

Why the dispute; what to do?

  • Behavior rich, complex
  • Numeric, text-based measures reduce that complexity

  • Video captures and preserves it

(Frank 2014)

(Frank 2014)

Why the dispute; what to do?

  • Replications can fail due to methodological differences
  • Methods sections can't possibly report all essential details
  • Video captures and preserves it
[[@ddff6ac6-cd1b-4365-89aa-70ea654b4ed9]](http://dx.doi.org/10.17910/B7MW2K)

A reproducible behavioral science must

  • Video record all tasks, measures, and behaviors
  • Share the recordings openly with other researchers

Journal of Visualized Experiments

A reproducible behavioral science must also

  • Share all questionnaires, tasks, displays
  • Share statistical, computational, data workflows
  • Make it easy to share from the beginning
  • Seek permission to share data from participants
  • Store data securely

Databrary.org

  • Digital data library specialized for research video
  • Video/audio + participant/context metadata
  • Share displays, materials, text-based data files
  • Policy framework for sharing identifiable data
  • Developmental focus, but not exclusive

Databrary.org facilitates data sharing, re-use, preservation

  • High capacity, centralized storage
  • Transcoding to common, interoperable formats
  • Long-term preservation

Databrary helps to overcomes barriers to sharing video

  • Policies for sharing identifiable video data
  • Tools for reproducibly coding video
  • Tools for "active curation" == during data collection
  • Tools for searching, filtering

Policies

  • Restrict access to authorized researchers (& affiliates)
  • Seek permission to share data from participants

Standardized (reproducible) release levels

Tools for coding video

  • Raw research video must be coded by human observers
  • Datavyu a free, open source coding tool
  • Add codes, annotations time-locked to video segments
  • Turn behavior into quantifiable data
  • Ruby API for scripting reproducible workflows

Tools for curating data as it is collected

  • After-the-fact curation burdensome
  • Databrary organizes, shares, standardized participant metadata
  • Sharing based on
    • user access level
    • participant permission

Databrary's structure

  • Datasets have own page; shared datasets have DOIs.
  • Data about testing sessions (locations, dates/times, people) stored in spreadsheet
  • Session data organized in timeline
  • Store data AND materials (displays, protocols, etc.)

(Adolph 2013)

<div class="notes" - Here is an illustrative dataset. - Datasets have own page; shared datasets have DOIs. - Datsets can have highlights audio, video, or photos that represent the main points of a study. - Data about testing sessions (locations, dates/times, people) stored in spreadsheet - Session data organized in timeline - Store data AND materials (displays, protocols, etc.)

Tools for searching, filtering by participant characteristics

"Big" questions about behavior require big(ger) data

"Big" questions about behavior require big(ger) data

  • Multiple levels of spatial, temporal resolution
  • Linked data sets and streams
  • Key linkage variables: when (time), where (location), who (age, gender)
  • Video is a spatially and temporally rich time series

How data scientists can help…

  • Semi-automated video annotation
  • Speech stream extraction from natural video, transcription, ms-level precision
  • Multivariate time series visualization, data reduction, analysis
  • Linking data, while preserving privacy

Take homes

  • Video uniquely captures complexity & richness of behavior
  • Video captures vital methodological details
  • Identifiable video (and other data) can be shared securely and ethically
  • Preparing to share reduces the burden

Take homes

  • Video data sharing essential for a reproducible, robust behavioral science
  • Video can serve as the core of a big data revolution in behavioral science

References

Adolph, Karen; 2013. “Infants Crawling and Walking over High and Low Bridges.” Databrary. doi:10.17910/B7MW2K.

Batra, Erich K., Douglas M. Teti, Eric W. Schaefer, Brooke A. Neumann, Elizabeth A. Meek, and Ian M. Paul. 2016. “Nocturnal Video Assessment of Infant Sleep Environments.” Pediatrics, August, e20161533. doi:10.1542/peds.2016-1533.

Collaboration, Open Science. 2015. “Estimating the Reproducibility of Psychological.” Science 349 (6251): aac4716. doi:10.1126/science.aac4716.

DeLoache, Judy. 2014. “Scale Errors Offer Evidence for a Perception-Action Dissociation Early in Life.” Databrary. doi:10.17910/B7H019.

DeLoache, Judy S., David H. Uttal, and Karl S. Rosengren. 2004. “Scale Errors Offer Evidence for a Perception-Action Dissociation Early in Life.” Science 304 (5673): 1027–9. doi:10.1126/science.1093567.

Frank, Michael C.; 2014. “Representing Exact Number Visually Using Mental Abacus.” Databrary. doi:10.17910/B7PP4W.

Gilbert, Daniel T., Gary King, Stephen Pettigrew, and Timothy D. Wilson. 2016. “Comment on ‘Estimating the Reproducibility of Psychological Science’.” Science 351 (6277): 1037–7. doi:10.1126/science.aad7243.

Jayaraman, Linda B.; Raudies, Swapnaa; Smith. 2014. “Natural Scene Statistics of Visual Experience Across Development and Culture.” Databrary. doi:10.17910/B7988V.

Wilkinson, Krista; 2014. “Preliminary Investigation of Visual Attention to Human Figures in Photographs: Potential Considerations for the Design of Aided AAC Visual Scene Displays.” Databrary. doi:10.17910/B7G59R.

Yu, Chen; 2016. “The Social Origins of Sustained Attention in One-Year-Old Human Infants.” Databrary. doi:10.17910/B7.236.