John Deck, University of California, Berkeley Brian Stucky, University of Colorado, Boulder Lukasz Ziemba, University of Florida, Gaineseville Nico Cellinese,

Slides:



Advertisements
Similar presentations
From Ontology Design to Deployment Semantic Application Development with TopBraid Holger Knublauch
Advertisements

Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
GUID-1 Workshop Welcome and Introduction Donald Hobern GBIF Program Officer for Data Access and Database Interoperability February 2006.
Virtualizing Entomology Collection Student: Di Wang (Alan) Sponsors: John Marris: Curator, Entomology Research Museum Stuart Charters: Department of Applied.
Web-based Specimen Databasing: Lessons from the Plant Bug Planetary Biodiversity Inventory Project presented by Randall T. Schuh Curator and Chair Division.
John Deck, University of California, Berkeley Brian Stucky, University of Colorado, Boulder Lukasz Ziemba, University of Florida, Gaineseville Nico Cellinese,
Technical BI Project Lifecycle
Linking biodiversity data with the Biological Collections Ontology Ramona Walls (iPlant Collaborative, University of Arizona) John Deck (University of.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
Linking collections to related resources: Multi-scale, multi-dimensional, multi-disciplinary collaborative research in biodiversity. Is this a “Big.
Roles and Goals Greg Riccardi. iDigBio People University of Florida o Larry Page, Jose Fortes, Pamela Soltis, Bruce McFadden, Renato Figueiredo, Reed.
Fourth Annual Summit | Feb | Tucson, AZ Scratchpads for community involvement for natural history collections Dr Dimitris Koureas Biodiversity.
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
Context and Prosopography: Putting the 'Archives' Into LOD-LAM Corey A Harper SAA MDOR
IDs in and out of the database Entomological Collections Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
Introduction to OBIS-USA Biological Data, Applications, & Relationships March 14, 2011.
Knowledge based Learning Experience Management on the Semantic Web Feng (Barry) TAO, Hugh Davis Learning Society Lab University of Southampton.
Deploying Trust Policies on the Semantic Web Brian Matthews and Theo Dimitrakos.
1 Enhancing Organism Based Disease Knowledge Using Biological Taxonomy, and Environmental Ontologies Ken Baclawski Northeastern University Neil Sarkar.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Globally Unique Identifiers Workshop (GUID-1) International Working Group on Taxonomic Databases - TDWG Global Biodiversity Information Facility - GBIF.
Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences
Community Ontology Development Lessons from the Gene Ontology.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
Standards and tools for publishing biodiversity data Yu-Huang Wang June 25, 2012.
Aspects for Improving the ABBI Patricia Escalante Instituto de Biología UNAM AOU-Collections Committee member.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
A Biodiversity Content Management System for Research, Education, and Outreach Cynthia Sims Parr University of Maryland, College Park Co-authors Roger.
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
Cynthia Parr Phenotype RCN NESCent 25 February 2013.
Field Based Data Validation: a very real experience in wrangling data, taxonomic names, and photos Moorea Biocode Project, supported by the Gordon and.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Meredith A. Lane CODATA/ERPANET Workshop: Scientific Data Selection &
Ricardo Pereira Software Engineer TDWG Infrastructure Project (TIP)
Biocode Field Information Management System (FIMS) John Deck, UC Berkeley TDWG, 2014.
Distributed Biodiversity Information Databases A. Townsend Peterson.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
Creating a Semantic Web with Linked Data Todd King.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
© 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry
Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.
Biocode Commons Identifiers (BCIDS) A free* for use, persistent identifier solution for biological sample collection from the field, scalable to the billions.
How Linked Open Data helps Museums Collaborate, Reach New Audiences, and Improve Access to art Information Eleanor E. Fink Manager, American Art Collaborative.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
TDWG – Looking Backward and Forward Donald Hobern, Director, Atlas of Living Australia 20 October 2008.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla.
NSDL STEM Exchange: Technical Overview and Implications for Active Dissemination of Federally Funded Resources Across Implementation Systems.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
Course on persistent identifiers, Madrid (Spain) Information architecture and the benefits of persistent identifiers Greg Riccardi Director Institute for.
Cyril Pommier et al. / Feedback from the RDA and WheatIS recommendations for Wheat Data Interoperability Adoption of the Wheat Data Interoperability Guidelines.
Sample-based data publication; reflections on semantics and logic 1(1) Hanna - GBIF Finland Lepidoptera collection of Hannu SaarenmaaPublicNo (but DwC.
Flanders Marine Institute (VLIZ)
Development of the Amphibian Anatomical Ontology
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Bringing Organism Observations Into Bioinformatics Networks
Analyzing and Securing Social Networks
Presentation transcript:

John Deck, University of California, Berkeley Brian Stucky, University of Colorado, Boulder Lukasz Ziemba, University of Florida, Gaineseville Nico Cellinese, University of Florida, Gainesville Rob Guralnick, University of Colorado, Boulder BiSciCol Team Reed Beaman, Nico Cellinese, Jonathan Coddington, Neil Davies, John Deck, Rob Guralnick, Bryan P. Heidorn, Chris Meyer, Tom Orrell, Rich Pyle, Kate Rachwal, Brian Stucky, Rob Whitton, Lukasz Ziemba BiSciCol: Tracking Biodiversity Objects to Brokering Standards “Or, Gustav’s Big Problem”

Biological Science Collections Tracker working towards building an infrastructure designed to tag and track scientific collections and all of their derivatives. National Science Foundation funded 2010 – 2014 Partners are University of Florida at Gaineseville, University of Colorado at Boulder, Bishop Museum, University of California at Berkeley, Smithsonian Institution, University of Arizona at Tucson Relies on globally unique identifiers (GUIDs) to track objects Implements a Linked Data approach Provides support for the Global Names Architecture

From “Facebook Visualizer” Tracking FaceBook relationships …

Can we track relationships for Biological Objects as well?

Why? Here is Gustav’s Problem…. (Prefers to collect stuff) Lots of Data …. Generates … Due to project requirements and integration needs, Gustav is left navigating a plethora of redundant and disconnected distributed Databases. Lots of effort to track objects And their derivatives.

Can we borrow from Facebook and social networking to help solve Gustav’s Problem?

Taxonomic Type FilterClass Filter X X Specimens Tissues Sequences Functions X Infer Relationships Across providers A Biological Relationship Graph …

Moorea Biocode Example: Tracking biological material from field collection through analysis, across multiple systems (Biocode Event) (Essig Museum Specimen) (Smithsonian Tissue) (CAMERA Gut Sample Event) (Genbank Sequence) (metagenomic Sequencing) KeyBlast*n Taxon*n Taxon Blast Taxon (Key) (Taxon)

How do we Track Biological Objects and their Relations Across Distributed, Heterogeneous systems?

Tracking Biological Object Relationships Group like terms into classes. In Darwin Core, for example we have the following “groups of terms”: Events, Locations, Occurrences, GeologicalContext, Identification, Taxon. Assign Identifiers. Use globally unique, resolvable, persistent identifiers for each class or term. Link Identifiers using Relationship Terms. For example, “This object is related to that object.” Put this data on the Web.

Related Projects that are Grouping Like terms into Classes Darwin-SW ( Building an ontology of Darwin Core Terms to make it possible to describe biodiversity resources on the web. Gene Ontology ( Standardizing the representation of gene and gene product attributes across species and databases. ENVO ( Annotating the environment for any biological sample. OBO Foundry ( A suite of orthogonal interoperable reference ontologies in the biomedical domain

Creating Globally Unique Identifiers (GUIDs)  Globally unique (mandatory)  Persistent (not mandatory, but very helpful)  Resolvable (not mandatory, but very helpful) Resolution/Domain + Identifier JDeckSpecimen1 (A named identifier) Examples: (Unique, at least for phones) 7217D A-11DF C9A66 (opaque)

Linking Identifiers Using Relationship Terms Predicate An RDF Statement: Subject Object relatedTo (Transitive): relatedTo GUID1 GUID2 GUID3 relatedTo GUID1 GUID2 GUID2 GUID3 GUID1 GUID3 OR Predicate GUID1 GUID2 A Simple BiSciCol Graph (graph=set of RDF Statements): relatedTo a a Date GUID1 GUID2 GUID3 relatedTo Event “ ” “ ” Tissue “ ” Specimen a Date

Getting the most out of your data: Inferring Object Relationships Facebook Inferencing: “Let us sell you, to others (or vice-versa)” BiSciCol Inferencing: “What relationships exist that haven’t been explicitly expressed”

Location1 (Essig Museum) Organism2 (Smithsonian) sameAs inferred Organism1 (Essig Museum) relatedTo Tissue1 (Essig Museum) relatedTo Tissue2 (Smithsonian) relatedTo Georeference1 (BioGeomancer) relatedTo ,16.371;crs=wgs84;u=40 hasSpatialThingGeoreference Even though Tissue #2 is not directly related to Location1, we can Still infer its relationship through Organism1 and Organism2 being the same as each other. Tissue1 (Essig Museum) inferred Tissue2 (Smithsonian) inferred Inferred Relationship Chains

Tools in Development “Bio-Plugins”

Update Mechanisms Gustav’s Watchlist: GP (Occurrence) BE dd39 (Event) GP II3 (Occurrence) GP12dd xxxI (Tissue) GP9999-xkx9d-dkdkd (Occurrence) … BiSciCol API (Search on Date And return graph Of object) Search Descendents (By Recent Modification) Updates

Genomic Rosetta Stone Uses GUIDs, classed data, and links to tie Organismal data to Genomic Data.

“Triplifier” linking biological objects Mysql KEMU “Triplifier” Create links from Native data formats Mysql BiSciCol Darwin Core Archive

Example Taxonomic Query Aedes increpitus Search Scientific Name: Run Client Interface: BISCICOL SERVICE LOOKUP: dwc:IdentificationID1 :relatedTo dwc:IdentificationID1 :relatedTo dwc:OccurrenceID1 dwc:IdentificationID2 :relatedTo dwc:IdentificationID2 :relatedTo dwc:OccurrenceID3 Results: OccurrenceID1 (Aedes increpitus Dyar, 1916 ) Dyar, 1916 OccurrenceID3 (Aedes vittata Theobald, 1903) Theobald, 1903 Taxon SERVICE (ITIS / GNUB)

Working with Locations E.g. Tracking location in space of a moving individual (whales) EventID1 EventID2 EventID3 IndividualID1 GeoreferenceID 1 GeoreferenceID 2 GeoreferenceID 3

Data Impact Factor – Graph Metrics Occurrence:MBIO1234 (“ :10:00”) DNA Extraction:Extrac9999 (“ :00:00”) Sequence:s (“ :00:00”) Occurrence:MBIO1235 (“ :00:00”) Photo:P (“ :00:00”) Whats New? Occurrences MBIO99999 (1024 total descendents) IMBL (723 total descendents) Events Biocode10234 (4234 direct children) Expedition21234 (1023 direct children) Collectors Gustav Paulay (102,000 direct children) Christopher Meyer (83,000 direct children) Craig Moritz (523 direct children) [ ] GBIF Relations Graph [X] Moorea Biocode [X] SI MSNGR System [+] Add New Graph Graphs

Web Interface (Demonstration Wed. 2pm at BiSciCol Meeting)

Summary All objects are re-usable in the semantic web. We only need to express an identifier once and then it can be linked by anything else (either directly or indirectly) By using sameAs relations it is possible to infer relations for data that was not previously expressed. Queries are easily federated – possibility to create global graphs and ask questions against heterogeneous databases. Graph based databases can help us understand the relevance of individual objects. For example, indicate the number of relations a particular object has for 1 st, 2 nd, 3 rd, or n th order relations.

“Create stable identifiers, link them to other stable identifiers, and put them on the web.” How to Get Involved