The role of peer-reviewers in checking supporting information

The role of peer-reviewers in checking supporting information promoting open science

Laurent Gatto                      Computational Proteomics Unit
https://lgatto.github.io           University of Cambridge
lg390@cam.ac.uk                    @lgatt0

Slides: http://bit.ly/20170329peerrev
Source: https://github.com/lgatto/2017-03-30-OSC-peerreview

DOI (Last update Thu Mar 30 11:14:41 2017)

About me

Have reviewed for Proteomics, Bioinformatics, BMC Genomics, BMC Research Notes, Expert Reviews in Proteomics, Journal of Proteomics, F1000Research, PeerJ, Journal of Open Research Software, Journal of Proteome Research, ISMB/ECCB conference, MRC funding body.
By no means a highly regarded reference in peer review.
Good with data, data analysis and computer stuff. Open scientist. More details here.

My responsibility as a reviewer

Accept sound/valid research and provide constuctive comments

Hence

Focus firstly on the validity of the research by inspecting the data, software and method. If the methods and/or data fail, the rest is meaningless.

(I don’t see novely, relevance, news-worthyness as my business as a reviewer. I leave that to the editor.)

Quick survey

Reviewing data/software/methods: is this asking too much from reviewers?

The above implies that

Review can be very quick

If no data/software are available, there can’t be any review.

Are data/software/method accessible, understandable, reproducible? If so, we have done a big step towards open science and reproducible research.

How much should we invest? (1)

As a reviewer, I commit to make a reasonable effort to find the data, understand it, install the software, run the software, debug any issues, …
I also know my strengths and weaknesses, and will review accordingly. For example, I consider myself reasonably well skilled in data analysis and software. If I struggle to install and run the software or understand the analysis, I will assume that many will struggle too.

How much should we invest? (2)

Molecular biology is not a strength, and I will have to either read some background, or acknowledge my weakness.
Note that it’s not only up to me (or, it shouldn’t be). Modern biology is multi-faceted, conducted by teams, and can’t be expected to be reviewed completely by a single person.

Tips (1): availability

no data, no review
no software, no review
no meta-data, no review
no reasonable methods description, no review
impossible to reproduce any of it, no review

Timing: 1 - 10 minutes

Constructive comments

I didn’t find any data, or partial data - I would have hoped to find this and that, to be able to do …
I did find data, but it wasn’t annotated - I would need … to be able to …
I tried to analyse the data, managed this, failed at that.
I tried to install the software, installed some undocumented dependencies, got this error message, debugged doing this and that, …

Tips (2): do numbers match?

Experimental design: 2 groups, 3 replicates, but
Data files or columns: not a multiple of 6!

I also look at the file names; is there are consistent naming convention?

Tips (3): meta-data

No need to necessarily repeat everything.

But if I wanted to repeat the analysis, would I be able to? Is something missing? What?

Tip: Document everything in a README file.

Tips (4): what data, what format

Is it readable with standard and open/free software?
Raw data? Processed data? Summary table? (do numbers and names match?)
Do I understand how they got from raw to processed to summary?

Tips (5): license

Open science is not only about accessing outputs for free (as in beer), but for free (as in speech), i.e. in a way that allows to re-use.

If you share anything, make sure users are allowed to re-use it, and are aware of the terms under which they can re-use it.

Examples: CC-BY, CC0, GPL, BSD, …

NB: Ask you default data repository about licensing. You might be surprised!

Quick survey (2)

Is this asking too much?
Who would like to see this done systematically?
Who does this in their peer review activities?

Quick survey (3)

Is this asking too much?
Who would like to see this done systematically?
Who does this in their peer review activities?

The real question: who applies these principles when they prepare data?

Data/software citation

As somebody who values data, software and methods, I will make every effort to make sure that these are cited when reviewing papers.

How to cite software/data

A paper describing the software/data
A DOI for the software (from zenodo or figshare, for example)
If none of the above is available, a software name, a developer and a URL are enough to make a valid citation.

Where to share (find) data?

Where to expect to find it? From best to least desirable:

Subject-specific repository
General repos (zenodo, figshare, …)
Institutional repositories (Apollo here at the University)
SI
Personal webpage

There is no perfect solution, often a combination of the above is great.

What matters in terms of data sharing

FAIR principles:

Findable and Accessible and Interoperable and Reusable

SI are not FAIR, not discoverable, not structured, voluntary, used to bury stuff. A personal web page is likely to disappear in the near future.

My ideal review system (1)

From Chris Hartgenink, 14 March 2017, OpenConCam

Biais in peer review:

Sampling biais (small number of reviewers)
Methods for statistically relevant results are positively biased. Mixed results are regarded as of lower quality, rather than only positive results.

2 stage review: start with intro and methods, then results.

My ideal review system (2)

Submit your data to a repository, where it get’s checked (by specialists, data scientists, data curators) for quality, annotation, meta-data.
Submit your research with a link to the peer reviewed data.

Automation

A lot of requirements in terms of data and meta-data could be automated.
There is no unique fit in terms of data sharing - more is better: raw, processed, summary; access through programmatic interface, GUI, …
Sharing of data and analysis methods should ideally be reproducible, possibly as software containers that include the data and the complete computational environment.

Thank you

For your attention
Discussions with Marta Teperek, Chris Hartgenink, Stephen Eglen
OpenConCam group
Office of Scholarly Communication

The content of these slides is made available under the Creative Commons Attribution license.