The role of peer-reviewers in checking supporting information
Laurent Gatto
About me
- Have reviewed for Proteomics, Bioinformatics, BMC Genomics, BMC Research Notes, Expert Reviews in Proteomics, Journal of Proteomics, F1000Research, PeerJ, Journal of Open Research Software, Journal of Proteome Research, ISMB/ECCB conference, MRC funding body.
- By no means a highly regarded reference in peer review.
- Good with data, data analysis and computer stuff. Open scientist. More details here.
My responsibility as a reviewer
- Accept sound/valid research and provide constuctive comments
Hence
- Focus firstly on the validity of the research by inspecting the
data
, software
and method
. If the methods and/or data fail, the rest is meaningless.
(I don’t see novely, relevance, news-worthyness as my business as a reviewer. I leave that to the editor.)
Quick survey
Reviewing data/software/methods: is this asking too much from reviewers?
The above implies that
Review can be very quick
If no data/software are available, there can’t be any review.
Are data/software/method accessible, understandable, reproducible? If so, we have done a big step towards open science and reproducible research.
How much should we invest? (1)
- As a reviewer, I commit to make a reasonable effort to find the data, understand it, install the software, run the software, debug any issues, …
- I also know my strengths and weaknesses, and will review accordingly. For example, I consider myself reasonably well skilled in data analysis and software. If I struggle to install and run the software or understand the analysis, I will assume that many will struggle too.
How much should we invest? (2)
- Molecular biology is not a strength, and I will have to either read some background, or acknowledge my weakness.
- Note that it’s not only up to me (or, it shouldn’t be). Modern biology is multi-faceted, conducted by teams, and can’t be expected to be reviewed completely by a single person.
Tips (1): availability
- no data, no review
- no software, no review
- no meta-data, no review
- no reasonable methods description, no review
- impossible to reproduce any of it, no review
Timing: 1 - 10 minutes
Tips (2): do numbers match?
- Experimental design: 2 groups, 3 replicates, but
- Data files or columns: not a multiple of 6!
I also look at the file names; is there are consistent naming convention?
Tips (5): license
Open science is not only about accessing outputs for free (as in beer), but for free (as in speech), i.e. in a way that allows to re-use.
If you share anything, make sure users are allowed to re-use it, and are aware of the terms under which they can re-use it.
Examples: CC-BY, CC0, GPL, BSD, …
NB: Ask you default data repository about licensing. You might be surprised!
Quick survey (2)
- Is this asking too much?
- Who would like to see this done systematically?
- Who does this in their peer review activities?
Quick survey (3)
- Is this asking too much?
- Who would like to see this done systematically?
- Who does this in their peer review activities?
The real question: who applies these principles when they prepare data?
Data/software citation
As somebody who values data, software and methods, I will make every effort to make sure that these are cited when reviewing papers.
How to cite software/data
- A paper describing the software/data
- A DOI for the software (from zenodo or figshare, for example)
- If none of the above is available, a software name, a developer and a URL are enough to make a valid citation.
Where to share (find) data?
Where to expect to find it? From best to least desirable:
- Subject-specific repository
- General repos (zenodo, figshare, …)
- Institutional repositories (Apollo here at the University)
- SI
- Personal webpage
There is no perfect solution, often a combination of the above is great.
What matters in terms of data sharing
FAIR principles:
Findable and Accessible and Interoperable and Reusable
SI are not FAIR, not discoverable, not structured, voluntary, used to bury stuff. A personal web page is likely to disappear in the near future.
My ideal review system (1)
From Chris Hartgenink, 14 March 2017, OpenConCam
Biais in peer review:
- Sampling biais (small number of reviewers)
- Methods for statistically relevant results are positively biased. Mixed results are regarded as of lower quality, rather than only positive results.
2 stage review: start with intro and methods, then results.
My ideal review system (2)
- Submit your data to a repository, where it get’s checked (by specialists, data scientists, data curators) for quality, annotation, meta-data.
- Submit your research with a link to the peer reviewed data.
Automation
- A lot of requirements in terms of data and meta-data could be automated.
- There is no unique fit in terms of data sharing - more is better: raw, processed, summary; access through programmatic interface, GUI, …
- Sharing of data and analysis methods should ideally be reproducible, possibly as software containers that include the data and the complete computational environment.