20 July 2017

Overview

Motivation
Four general principles
Case studies
Costs and benefits

Motivation

Key ideas

Reproducibilty is necessary for scientific progress
Computers wrangle the data, but also obscure it
Especially point-and-click actions
Technical solutions available in open source/format/data/access

Four general principles of reproducible research that have emerged in other fields

✓ Make openly available the data and methods that generated the published result

✓ Write scripts to conduct analyses

✓ Use version control to track changes

✓ Describe and archive the computational environment

First principle

All files on figshare, OSF, university data repo, or similar

Data in CSV format

Organised as an R package

Second principle

Third principle

All files tracked with Git, hosted on GitHub

Collaboration occurred via GitHub's 'flow'

Fourth principle

Docker image and Dockerfile to contain RStudio, packages, code and external dependencies

Based on Rocker image and templates

Continuous integration is very helpful

Case Studies of Compendia

Research compendium +

VCS repository

containing…

README.md
R package & manuscript
code CI
environment CI

Costs & benefits

Costs

Time learning the tools

Time doing new things

Built-in vs Bolt-on

Benefits

Comfort of knowing that I am right & have no secrets

Save time by reusing my previous code

Open data confers citation advantages, but magnitude is highly variable

Open Source community membership provides access to high-quality help

Sustainability

Two implications: Training

Two implications: Incentives

Summary

Open methods and materials, scripted workflow, version control and environment control are generic principles suitable for most fields of research

The specific details will change over time, but the principles will endure

For most people, the technical problems already have good solutions, the remaining challenge is cultural

Colophon