gilmore-csc-talk

Rick O. Gilmore

October 28, 2016

The future of big data in developmental science – Answering the big questions

Rick O. Gilmore

Support: NSF BCS-1147440, NSF BCS-1238599, NICHD U01-HD-076595

Overview

  • Where are we now?
  • Challenges
  • Some thoughts on the future

Biq questions, big dreams

Shonkoff, J. P., & Phillips, D. A. (Eds.). (2000). From neurons to neighborhoods: The science of early childhood development. National Academies Press.

So, how is developmental science doing?

Challenges with 'big data' developmental science

  • Collect diverse types of data
  • Must aggregate, link data across space, time, individual identities
  • Data not spatially uniform
  • Time series not uniformly sampled, different sampling intervals

Challenges…

  • Aggregating big data about individuals poses privacy risks

Challenges…

"We have empirically assessed the distribution of published effect sizes and estimated power by extracting more than 100,000 statistical records from about 10,000 cognitive neuroscience and psychology papers published during the past 5 years…False report probability is likely to exceed 50% for the whole literature. In light of our findings the recently reported low replication success in psychology is realistic and worse performance may be expected for cognitive neuroscience."

(Szucs and Ioannidis 2016)

Challenges to replicability

  • Still collect data in non-electronic formats
  • Even electronic formats not readily shareable
  • Vital metadata (geo-, participant-level) often not collected
  • "Reproducible" workflows not standard practice
  • Results have limited robustness and generalizability
  • Misunderstanding/agreement about what reproducibility means

  • Methods reproducibility refers to the provision of enough detail about study procedures and data so the same procedures could, in theory or in actuality, be exactly repeated.
  • Results reproducibility (previously described as replicability) refers to obtaining the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible.

(Goodman, Fanelli, and Ioannidis 2016)

  • Robustness refers to the stability of experimental conclusions to variations in either baseline assumptions or experimental procedures.
  • Generalizability refers to the persistence of an effect in settings different from and outside of an experimental framework.

(Goodman, Fanelli, and Ioannidis 2016)

Do we have…

  • Reproducible methods
  • Reproducible results
  • Robust fundings
  • Generalizable findings

The Year 4 A.D.

Gilmore, R. O. (2016). From big data to deep insight in developmental science. Wiley Interdisciplinary Reviews: Cognitive Science, 7(2), 112–126. https://doi.org/10.1002/wcs.1379

Lessons learned

Big data developmental studies have long histories

But, big cohort studies have uncertain futures

But, big cohort studies have uncertain futures

Data sharing is part of the solution, but

  • We don't agree about who owns data
    • Participants
    • Us
    • Penn State
    • The taxpayer
  • Minimal rewards for data sharing
  • Post hoc sharing hard, time-consuming, expensive

"You can checkout any time you like, but you can never leave."

Building a culture of reuse, reanalysis, meta-analysis

  • Why share if no one will reuse, build upon?
  • Journals don't always encourage/support/mandate publication of data, detailed methods
  • Building community consensus better than centralized mandates

Datasets can be "magnets" for scholarship

Centralizing shared data can enable discovery

Video essential

  • Numeric, text-based measures miss/reduce complexity of behavior
  • Video captures and preserves it
  • Replications can fail due to methodological differences
  • Methods sections can't possibly report essential details
  • Video captures and preserves them

Gilmore et al. talk

A robust and reproducible developmental science should…

  • Video record all tasks, measures, and behaviors
  • Share the recordings
  • Share all questionnaires, tasks, displays
  • Share statistical, computational, data workflows
  • Prepare to share from the beginning
  • Seek permission to share data

Of course, it's hard(er) to collect and share sensitive or identifiable data

  • But, not impossible

When asked, most participants say yes

People will volunteer information if they get something of value in return

What is the value of participating in research?

  • Contribute to public good
  • Aid discovery
  • Curiosity
  • Help institution
  • How can we better capitalize on the value the public places on our work?

We need better tools, training

Our measures need to be open, non-proprietary

  • Are your preferred standardized/normed tasks open, non-proprietary?
  • Can you share the questions/items with colleagues?
  • Are the data underlying the norms openly available?

We need to think bigger…

Imagine a developmental "Databservatory"

What would this micro/macro/telescope look like?

  • Recruiting – larger, more diverse samples
  • Data collection – more data types, allow linkage across levels
  • Data curation/management – easy/automatic, standardized formats
  • Data sharing – PI controls when, permission levels

  • Data mining, visualization, linking
  • Search, filter by participant characteristics, tasks/measures, geo/temporal factors
  • Analysis in the "cloud"
  • Automatic versioning, history

The front end

  • App/web service (MeeSearch.com)
  • Linking researchers with participants (or parents)
  • Participants own/control their data, determine level of sharing (like datawallet.io)
  • Lab, computer/smart-phone based, survey tasks
  • Data visualizations, dashboard
  • 1,000+ psych pool/semester, 500K PSU alumni, 1M friends

The middle

Analytic/visualization/data publication engine

What do you think?

  • Shall we build it?
  • After all…

References

Collaboration, Open Science. 2015. “Estimating the Reproducibility of Psychological.” Science 349 (6251): aac4716. doi:10.1126/science.aac4716.

Gilmore, Rick O. 2016. “From Big Data to Deep Insight in Developmental Science.” Wiley Interdisciplinary Reviews: Cognitive Science 7 (2): 112–26. doi:10.1002/wcs.1379.

Goodman, Steven N., Daniele Fanelli, and John P. A. Ioannidis. 2016. “What Does Research Reproducibility Mean?” Science Translational Medicine 8 (341): 341ps12–341ps12. doi:10.1126/scitranslmed.aaf5027.

Henrich, Joseph, Steven J. Heine, and Ara Norenzayan. 2010. “The Weirdest People in the World?” The Behavioral and Brain Sciences 33 (2-3): 61–83; discussion 83–135. doi:10.1017/S0140525X0999152X.

Ioannidis, John P. A. 2005. “Why Most Published Research Findings Are False.” PLoS Med 2 (8): e124. doi:10.1371/journal.pmed.0020124.

Maxwell, Scott E. 2004. “The Persistence of Underpowered Studies in Psychological Research: Causes, Consequences, and Remedies.” Psychological Methods 9 (2): 147–63. doi:10.1037/1082-989X.9.2.147.

Szucs, Denes, and John PA Ioannidis. 2016. “Empirical Assessment of Published Effect Sizes and Power in the Recent Cognitive Neuroscience and Psychology Literature.” BioRxiv, August, 071530. doi:10.1101/071530.