Ethoinformatics I

Developing a standard vocabulary and data model for behavioral field research

Kenneth L. Chiou · Anthony Di Fiore · Robyn Overstreet · Mike Chevett · Tom Igoe


Ethoinformatics

Building capacity for
data discovery, data reuse, and data comparison
for behavioral research

(with particular emphasis on field primatology)

Ethoinformatics Working Group

Washington University in St. Louis
November 2013
University of Texas at Austin
May 2015

Imagine . . .

Bad news

Primatological data are complex

On a typical day, we might be concerned with:

  • activities and social interactions of animals
  • locations, identities, and availability patterns of key resources
  • meteorological or climatological patterns of occupied sites
  • size, composition, and spatial arrangement of social aggregations
  • life histories of study animals
  • morphological measurements of study animals
  • molecular information derived from biomaterials

Obstacles to comparison

  • Research data are geographically disjunct
  • Datasets vary in their
    • terminology
    • organization
    • technology
  • Metadata (data about data) are seldom documented
  • Lack of training in information science
  • Concerns about data access and data ownership

Project goals

  1. A community-derived standard vocabulary
  2. Mobile data collection and support software
  3. Compatibility tools for existing data
  4. A framework for long-term data archiving

Breaking the Table

Famous primates (fictional) 🙊

IDNameSpecies (best guess)
1Rafikimandrill
2Curious Georgechimpanzee
3King Konggorilla

(key:value)

{
  "ID" : 1 ,
  "Name" : "Rafiki" ,
  "Species" : "mandrill"
} , {
  "ID" : 2 ,
  "Name" : "Curious George" ,
  "Species" : "chimpanzee"
} , {
  "ID" : 3 ,
  "Name" : "King Kong" ,
  "Species" : "gorilla"
}
IDNameSpecies
1Rafikimandrill
2Curious Georgechimpanzee
3King Konggorilla
key-value design
table design

Key-value data are well-suited for distributed data challenges

The key-value model is sufficiently flexible for generalization across projects

The missing ingredient hindering data comparison is a standard vocabulary for keys, which is necessary for encoding semantics

Linked data

  • Discipline centered around connecting data across the web
  • Facts about anything can be represented as triples
  • sky ‣ isColored ‣ blue
  • All components can be named as URIs so that usage is consistent and meaning is unambiguous

Linking Open Data cloud (last updated August 2014)
via lod-cloud.net/

A standard vocabulary for behavioral field research

EthoCore

Standard vocabulary for behavior modeled on Darwin Core

Darwin Core

  • Standard vocabulary for biodiversity science
  • Itself an extension of Dublin Core, which maintains metadata standards across the web

EthoCore

Currently 166 terms in eleven classes

EthoCore is designed for primatology

A community-derived vocabulary for community usage is valuable and desirable because it facilitates communication and comparison

This vocabulary may be flexibly applied in a variety of contexts including as

columns in a table
keys in key-value pairs
predicates in a linked data triple

As with Darwin Core, the EthoCore vocabulary encompasses classes and properties of information

Categories of data, not their possible values

These correspond most closely to the column headings of your tables, not the values in your cells

Location (Darwin Core)

EthoCore classes

  • Observation
  • Measurement
  • Organism
  • Patch
  • MaterialSample
  • Event
  • Activity
  • Location
  • Identification
  • Taxon
  • ResourceRelationship

EthoCore: design principles

Complex data structures can be deconstructed into atomic elements

Applications still require a "packaging" of atomic elements into data structures

This is where common constraints reflecting a shared model of the world are useful

Document model

document-oriented design

EthoCore: design principles (cont'd)

Terms should be designed for packaged data structures that are as generalized as possible

Observation

New class designed specially for observational research

observationID | observationType | observationValue | observationAccuracy | observationUnit | observationDeterminedBy | observationMethod | observationRemarks

Observation documents

EthoCore: design principles (cont'd)

Relationships among resources (e.g., tables or documents) can themselves be represented as resources

ResourceRelationship

Darwin Core class for describing relationships between resources

resourceRelationshipID | resourceID | relatedResourceID | relationshipOfResource | relationshipAccordingTo | relationshipEstablishedDate | relationshipRemarks

ResourceRelationship documents

Closing thoughts

Tables

Tables have a place and relational databases are still powerful and efficient tools

The table paradigm, however, gets in the way of comparison

As a community, we need to break free of "table-thinking" and think about commonalities in our underlying data models

New tools are making this possible

NoSQL

NoSQL ("not only SQL") databases reject the dominance of the table-based relational data model

NoSQL databases are built for the Web and generally embrace web technologies such as HTTP/REST, JavaScript, and URIs/URLs

By employing data models such as the key-value or document models, NoSQL promises better software

For us, the document-oriented data design is a reasonable intermediary from our tables to, eventually, the global linked data ecosystem on the web

Next steps

Maintain our metadata initiative and solicit feedback

http://ethoinformatics.org/ethocore

Observation · Measurement · Organism · Patch · MaterialSample · Event · Activity · Location · Identification · Taxon · ResourceRelationship

Engage projects to begin incorporating EthoCore terms into their databases and workflows

Develop an infrastructure for the larger anthropological community

AnthroCore

(primatology + paleontology + archaeology + ...)

Acknowledgments

  • Jennifer Moore, Cynthia Hudson, & Aaron Addison Washington University Libraries
  • Jane Phillips-Conroy Washington University · Neuroscience / Anthropology
  • DennĂ© Reed University of Texas at Austin · Anthropology
  • National Science Foundation SMA 1338524 · SMA 1338467 · SMA 1338452

2013 St. Louis & 2015 Austin working group meetings participants

Laura Abondano · Colin Addis · Elizabeth Archie · Louise Barrett · Thore Bergman · Maryjka Blaszczyk · Damien Caillaud · Shahrina Chowdhury · Kelsey Ellis · Gideon Erkenswick · Eduardo Fernandez-Duque · Steffen Foerster · Paul Garber · Peter Henzi · Katharine Jack · Clifford Jolly · Andreas Koenig · Kara Leimberger · Rebecca Lewis · Katie MacKinnon · Amely Martins · Monica McDonald · Amanda Melin · Mike Montague · Stephanie Musgrave · Joseph Orkin · Katie Ortiz · Steve Phelps · Crickette Sanz · Clara Scarry · Christopher Schmitt · Christopher Shaffer · Joan Silk · Karen Strier · Robert Sussman · Stacey Tecot · Claudia Valeggia · Sarie Van Belle · Anna Weyher

Ethoinformatics II

Developing open-source digital data services for behavioral field research

Anthony Di Fiore · Kenneth L. Chiou · Mike Chevett · Robyn Overstreet · Tom Igoe


Handheld apps for field data collection

What's currently available?

  • Plus various database + form apps for structured notetaking
  • Handheld apps for field data collection

    Significant limitations!

    • Most are proprietary
    • Limited customizability
    • No emphasis on using standard ontologies
    • Deal with limited data types, requiring multiple devices
    • Geolocation tagging not automatic
    • Not cross-platform

    Goals

    A handheld app that is...

    • Built on open source and standard web technologies
    • Configurable, modular, and allows multiple data streams
    • Runs on multiple platforms and older devices
    • Automatic geolocation tagging and time stamping
    • Works offline without a continuous web connection
    • Flexible enough to support different existing schemas
    • Compliant with standards and linked data friendly

    Caveat

    We're not (yet!) concerned with designing a slick interface for behavioral data entry...

    ...instead, we're focusing on designing a modular, customizable framework for collecting multiple data streams using a single app and device

    Design Philosophy

    Custom data structures

    • Information that goes together (metadata + data) is flexibly packaged into user-defined data structures called documents
    • Documents are made up from a set of key:value pairs, where keys either are EthoCore terms or have have well-defined relationships with those terms

    Document model

    document-oriented design

    Example: "taxonomy" document

    Mapping to EthoCore terms

    Design Philosophy

    Structure of documents

    • Contain a URI
    • Contain key metadata
      • Who made it
      • Datetime (point or series) to which the data pertain
      • Geolocation (point or series) to which the data pertain
    • Represented in a standard format for packaging key:value text

    Example: "contact" document

    Mapping to EthoCore terms

    Design Philosophy

    Documents can be linked to one or more other documents

    Design Philosophy

    History of revision of documents is preserved

    Implementation

    The framework of our mobile is app is built using...

    ...which are the primary tools used to build interactive web interfaces.

    The real programming work is done in JavaScript...

    ... and we use the node.js framework and the node package manager npm to bundle code from various programming libraries.

    We use the Cordova framework to build the user interface...

    ... which lets us write one application that can be used on iOS devices, Android devices, or other laptops or tablets through a browser.

    And we use document-based NoSQL databases for data storage, both locally, in the app, and on a remote server.

          

    All documents tagged with geolocation, streaming continuously from a device's location services (from internal GPS, cell towers, external GPS)

    Web-based admin tool

    Closing thoughts

    1. Our design emphasizes modularity and versatility and allows for custom data structures and user interfaces

    2. Use of the EthoCore vocabulary facilitates comparability and data sharing across studies and is a step towards joining the linked data ecosystem

    3. Using documents facilitates data interchange using standard protocols and is simultaneously human-readable and easily recast into other formats

    4. The timeline and map views provide real-time visual feedback in the field

    5. Our approach promotes digital data collection and minimizes initial investment for projects while allowing compatibility with existing data structures and workflows

            

    Acknowledgments

    • Nick Sears & Jonathan Cousins Cousins & Sears Creative Technologists
    • Jennifer Moore, Cynthia Hudson, & Aaron Addison Washington University Libraries
    • Jane Phillips-Conroy Washington University · Neuroscience / Anthropology
    • 2013 St. Louis & 2015 Austin working group meetings participants

      Laura Abondano · Colin Addis · Elizabeth Archie · Louise Barrett · Thore Bergman · Maryjka Blaszczyk · Damien Caillaud · Shahrina Chowdhury · Kelsey Ellis · Gideon Erkenswick · Eduardo Fernandez-Duque · Steffen Foerster · Paul Garber · Peter Henzi · Katharine Jack · Clifford Jolly · Andreas Koenig · Kara Leimberger · Rebecca Lewis · Katie MacKinnon · Amely Martins · Monica McDonald · Amanda Melin · Mike Montague · Stephanie Musgrave · Joseph Orkin · Katie Ortiz · Steve Phelps · DennĂ© Reed · Crickette Sanz · Clara Scarry · Christopher Schmitt · Christopher Shaffer · Joan Silk · Karen Strier · Robert Sussman · Stacey Tecot · Claudia Valeggia · Sarie Van Belle · Anna Weyher

    • National Science Foundation SMA 1338524 · SMA 1338467 · SMA 1338452