Developing a standard vocabulary and data model for behavioral field research
Kenneth L. Chiou · Anthony Di Fiore · Robyn Overstreet · Mike Chevett · Tom Igoe
Building capacity for
data discovery, data reuse, and data comparison
for behavioral research
(with particular emphasis on field primatology)
Imagine . . .
Bad news
On a typical day, we might be concerned with:
ID | Name | Species (best guess) |
---|---|---|
1 | Rafiki | mandrill |
2 | Curious George | chimpanzee |
3 | King Kong | gorilla |
Key-value (key:value
) data
{
"ID" : 1 ,
"Name" : "Rafiki" ,
"Species" : "mandrill"
} , {
"ID" : 2 ,
"Name" : "Curious George" ,
"Species" : "chimpanzee"
} , {
"ID" : 3 ,
"Name" : "King Kong" ,
"Species" : "gorilla"
}
ID | Name | Species |
---|---|---|
1 | Rafiki | mandrill |
2 | Curious George | chimpanzee |
3 | King Kong | gorilla |
Key-value data are well-suited for distributed data challenges
The key-value model is sufficiently flexible for generalization across projects
The missing ingredient hindering data comparison is a standard vocabulary for keys, which is necessary for encoding semantics
Linking Open Data cloud (last updated August 2014)
via lod-cloud.net/
A standard vocabulary for behavioral field research
Standard vocabulary for behavior modeled on Darwin Core
Currently 166 terms in eleven classes
EthoCore is designed for primatology
A community-derived vocabulary for community usage is valuable and desirable because it facilitates communication and comparison
This vocabulary may be flexibly applied in a variety of contexts including as
columns in a table
keys in key-value pairs
predicates in a linked data triple
As with Darwin Core, the EthoCore vocabulary encompasses classes and properties of information
Categories of data, not their possible values
These correspond most closely to the column headings of your tables, not the values in your cells
Complex data structures can be deconstructed into atomic elements
Applications still require a "packaging" of atomic elements into data structures
This is where common constraints reflecting a shared model of the world are useful
Terms should be designed for packaged data structures that are as generalized as possible
New class designed specially for observational research
observationID | observationType | observationValue | observationAccuracy | observationUnit | observationDeterminedBy | observationMethod | observationRemarks
Relationships among resources (e.g., tables or documents) can themselves be represented as resources
Darwin Core class for describing relationships between resources
resourceRelationshipID | resourceID | relatedResourceID | relationshipOfResource | relationshipAccordingTo | relationshipEstablishedDate | relationshipRemarks
Tables have a place and relational databases are still powerful and efficient tools
The table paradigm, however, gets in the way of comparison
As a community, we need to break free of "table-thinking" and think about commonalities in our underlying data models
New tools are making this possible
NoSQL ("not only SQL") databases reject the dominance of the table-based relational data model
NoSQL databases are built for the Web and generally embrace web technologies such as HTTP/REST, JavaScript, and URIs/URLs
By employing data models such as the key-value or document models, NoSQL promises better software
For us, the document-oriented data design is a reasonable intermediary from our tables to, eventually, the global linked data ecosystem on the web
Maintain our metadata initiative and solicit feedback
Observation · Measurement · Organism · Patch · MaterialSample · Event · Activity · Location · Identification · Taxon · ResourceRelationship
Engage projects to begin incorporating EthoCore terms into their databases and workflows
Develop an infrastructure for the larger anthropological community
AnthroCore
(primatology + paleontology + archaeology + ...)
2013 St. Louis & 2015 Austin working group meetings participants
Laura Abondano · Colin Addis · Elizabeth Archie · Louise Barrett · Thore Bergman · Maryjka Blaszczyk · Damien Caillaud · Shahrina Chowdhury · Kelsey Ellis · Gideon Erkenswick · Eduardo Fernandez-Duque · Steffen Foerster · Paul Garber · Peter Henzi · Katharine Jack · Clifford Jolly · Andreas Koenig · Kara Leimberger · Rebecca Lewis · Katie MacKinnon · Amely Martins · Monica McDonald · Amanda Melin · Mike Montague · Stephanie Musgrave · Joseph Orkin · Katie Ortiz · Steve Phelps · Crickette Sanz · Clara Scarry · Christopher Schmitt · Christopher Shaffer · Joan Silk · Karen Strier · Robert Sussman · Stacey Tecot · Claudia Valeggia · Sarie Van Belle · Anna Weyher
Developing open-source digital data services for behavioral field research
Anthony Di Fiore · Kenneth L. Chiou · Mike Chevett · Robyn Overstreet · Tom Igoe
What's currently available?
Significant limitations!
A handheld app that is...
We're not (yet!) concerned with designing a slick interface for behavioral data entry...
...instead, we're focusing on designing a modular, customizable framework for collecting multiple data streams using a single app and device
Custom data structures
key:value
pairs, where keys either are EthoCore terms or have have well-defined relationships with those termsExample: "taxonomy" document
Mapping to EthoCore terms
Structure of documents
key:value
textExample: "contact" document
Mapping to EthoCore terms
Documents can be linked to one or more other documents
History of revision of documents is preserved
The framework of our mobile is app is built using...
...which are the primary tools used to build interactive web interfaces.
The real programming work is done in JavaScript...
... and we use the node.js framework and the node package manager npm to bundle code from various programming libraries.
We use the Cordova framework to build the user interface...
... which lets us write one application that can be used on iOS devices, Android devices, or other laptops or tablets through a browser.
And we use document-based NoSQL databases for data storage, both locally, in the app, and on a remote server.
All documents tagged with geolocation, streaming continuously from a device's location services (from internal GPS, cell towers, external GPS)
1. Our design emphasizes modularity and versatility and allows for custom data structures and user interfaces
2. Use of the EthoCore vocabulary facilitates comparability and data sharing across studies and is a step towards joining the linked data ecosystem
3. Using documents facilitates data interchange using standard protocols and is simultaneously human-readable and easily recast into other formats
4. The timeline and map views provide real-time visual feedback in the field
5. Our approach promotes digital data collection and minimizes initial investment for projects while allowing compatibility with existing data structures and workflows
Laura Abondano · Colin Addis · Elizabeth Archie · Louise Barrett · Thore Bergman · Maryjka Blaszczyk · Damien Caillaud · Shahrina Chowdhury · Kelsey Ellis · Gideon Erkenswick · Eduardo Fernandez-Duque · Steffen Foerster · Paul Garber · Peter Henzi · Katharine Jack · Clifford Jolly · Andreas Koenig · Kara Leimberger · Rebecca Lewis · Katie MacKinnon · Amely Martins · Monica McDonald · Amanda Melin · Mike Montague · Stephanie Musgrave · Joseph Orkin · Katie Ortiz · Steve Phelps · DennĂ© Reed · Crickette Sanz · Clara Scarry · Christopher Schmitt · Christopher Shaffer · Joan Silk · Karen Strier · Robert Sussman · Stacey Tecot · Claudia Valeggia · Sarie Van Belle · Anna Weyher