January 25, 2017

R in Biodiversity Analysis: rOpenSci for all
Stockholm, Sweden, 24th-25th Jan 2016

Jan 25th 09:00 - 09:30

    "Mirroreum and EUBON R-packages"

Keyboard shortcuts for presentation viewing mode:

'f' enable fullscreen mode 
'w' toggle widescreen mode
'o' enable overview mode
'h' enable code highlight mode
'p' show presenter notes

EUBON - the challenge

  • The challenge in EUBON - silos in WP 1-8
  • Integration of workflows and processes across work packages and systems integration
  • Example: How to get biodiversity analysis tools into WP1 and 2 - web portal?

Traditional approach

One ring to rule them all - one Centralized System

May work locally - but internationally? Practical challenges:

  • Different ways to work - "cultural" diversity
  • Teams, time zones, languages, tools, methods
  • Code of conduct - don't control others - mutually beneficial collaboration
  • Reality: funding provided on project basis - does it give sustainable platforms?
  • The results - a paper is published and the project ends

More modern approach

Decentralized international collaboration over Internet

  • How do people work today - in software?
  • FOSS Data Science tools like R, ROpenSci
  • Versioning and git + GitHub
  • Not one single system - but several in concert, integrated

The Internet - As polyglot environment, using open Internet protocols and standards as common ground

Rationale for Mirroreum

  • Mi ROR eum - reflecting Reproducible Open Research
  • No new budget at all for licenses, hardware etc - use existing infrastructure
  • Provide a web ui for biodiversity analysis research ("how to share work on raquamaps package - open source implementation of Aqua Maps")
  • Log in and get a full-features platform for authoring and publishing reproducible open research
  • Reuse existing widely used open source tools from the research community
  • Stay away from non-open non-free licenses
  • Reduce dependencies, increase freedom to create/innovate
  • Can you run all of it on your laptop, off-line?

FOSS approach - tools

  • Use Docker for systems integration
  • Use R for reproducible research workflows
  • Allow any FOSS Data Science tools - Python, Julia, Spark etc
  • Web-based biodiversity analysis frontend - Bundle R + ROpenSci + other packages
  • Any data source on the Internet can provide data, including
    • GBIF
    • Atlas of Living Australia
    • Other Web APIs
    • Any other source open for use (open science != commercial secrets?)
    • Closed sources such as traditional databases

Typical workflow

Create an R or Python package with data, web ui and tutorial

  • Build a package - include data, web ui, tutorial (vignette)
  • Author a reproducible research paper using the above
  • When stable, publish/release to …
    • GitHub
    • RForge
    • ROpenSci
    • CRAN

Local customizations and integrations

  • Not all R packages on CRAN or ROpenSci (yet!)
  • Packages from github - local or regional adaptations, work in progress
  • Custom data sources
  • Re-bundle with Docker - extend with these customizations

Microservices architecture with Docker

  • Mirroreum is the web-based frontend - deploy locally or regionally - this client is not tightly coupled to backend services, can use any services but is pre-configured for regional data services
  • Bundles rgbif and ALA4R configs - allows (pre-)configured R clients to connect to regional services
  • Deploy local and/or regional data services based on open-source GBIF and ALA components
  • Anyone can pull components and run anywhere - combine like LEGO

DevOps systems integration workflow

Examples