October 28, 2016

The future of big data in developmental science – Answering the big questions

Rick O. Gilmore

Support: NSF BCS-1147440, NSF BCS-1238599, NICHD U01-HD-076595


  • Where are we now?
  • Challenges
  • Some thoughts on the future

Biq questions, big dreams

Shonkoff, J. P., & Phillips, D. A. (Eds.). (2000). From neurons to neighborhoods: The science of early childhood development. National Academies Press.

So, how is developmental science doing?

Challenges with 'big data' developmental science

  • Collect diverse types of data
  • Must aggregate, link data across space, time, individual identities
  • Data not spatially uniform
  • Time series not uniformly sampled, different sampling intervals


  • Aggregating big data about individuals poses privacy risks


"We have empirically assessed the distribution of published effect sizes and estimated power by extracting more than 100,000 statistical records from about 10,000 cognitive neuroscience and psychology papers published during the past 5 years…False report probability is likely to exceed 50% for the whole literature. In light of our findings the recently reported low replication success in psychology is realistic and worse performance may be expected for cognitive neuroscience."

(Szucs and Ioannidis 2016)

Challenges to replicability

  • Still collect data in non-electronic formats
  • Even electronic formats not readily shareable
  • Vital metadata (geo-, participant-level) often not collected
  • "Reproducible" workflows not standard practice
  • Results have limited robustness and generalizability
  • Misunderstanding/agreement about what reproducibility means

  • Methods reproducibility refers to the provision of enough detail about study procedures and data so the same procedures could, in theory or in actuality, be exactly repeated.
  • Results reproducibility (previously described as replicability) refers to obtaining the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible.

(Goodman, Fanelli, and Ioannidis 2016)

  • Robustness refers to the stability of experimental conclusions to variations in either baseline assumptions or experimental procedures.
  • Generalizability refers to the persistence of an effect in settings different from and outside of an experimental framework.

(Goodman, Fanelli, and Ioannidis 2016)

Do we have…

  • Reproducible methods
  • Reproducible results
  • Robust fundings
  • Generalizable findings

The Year 4 A.D.

Gilmore, R. O. (2016). From big data to deep insight in developmental science. Wiley Interdisciplinary Reviews: Cognitive Science, 7(2), 112–126. https://doi.org/10.1002/wcs.1379

Lessons learned

Big data developmental studies have long histories

But, big cohort studies have uncertain futures

But, big cohort studies have uncertain futures

Data sharing is part of the solution, but

  • We don't agree about who owns data
    • Participants
    • Us
    • Penn State
    • The taxpayer
  • Minimal rewards for data sharing
  • Post hoc sharing hard, time-consuming, expensive

"You can checkout any time you like, but you can never leave."

Building a culture of reuse, reanalysis, meta-analysis

  • Why share if no one will reuse, build upon?
  • Journals don't always encourage/support/mandate publication of data, detailed methods
  • Building community consensus better than centralized mandates

Datasets can be "magnets" for scholarship

Centralizing shared data can enable discovery