Presentation is loading. Please wait.

Presentation is loading. Please wait.

John Deck, University of California, Berkeley Brian Stucky, University of Colorado, Boulder Lukasz Ziemba, University of Florida, Gaineseville Nico Cellinese,

Similar presentations


Presentation on theme: "John Deck, University of California, Berkeley Brian Stucky, University of Colorado, Boulder Lukasz Ziemba, University of Florida, Gaineseville Nico Cellinese,"— Presentation transcript:

1 John Deck, University of California, Berkeley Brian Stucky, University of Colorado, Boulder Lukasz Ziemba, University of Florida, Gaineseville Nico Cellinese, University of Florida, Gainesville Rob Guralnick, University of Colorado, Boulder BiSciCol Team Reed Beaman, Nico Cellinese, Jonathan Coddington, Neil Davies, John Deck, Rob Guralnick, Bryan P. Heidorn, Chris Meyer, Tom Orrell, Rich Pyle, Kate Rachwal, Brian Stucky, Rob Whitton, Lukasz Ziemba BiSciCol: Tracking Biodiversity Objects to Brokering Standards “Or, Gustav’s Big Problem”

2 Biological Science Collections Tracker working towards building an infrastructure designed to tag and track scientific collections and all of their derivatives. National Science Foundation funded 2010 – 2014 Partners are University of Florida at Gaineseville, University of Colorado at Boulder, Bishop Museum, University of California at Berkeley, Smithsonian Institution, University of Arizona at Tucson Relies on globally unique identifiers (GUIDs) to track objects Implements a Linked Data approach Provides support for the Global Names Architecture

3 From “Facebook Visualizer” Tracking FaceBook relationships …

4 Can we track relationships for Biological Objects as well?

5 Why? Here is Gustav’s Problem…. (Prefers to collect stuff) Lots of Data …. Generates … Due to project requirements and integration needs, Gustav is left navigating a plethora of redundant and disconnected distributed Databases. Lots of effort to track objects And their derivatives.

6 Can we borrow from Facebook and social networking to help solve Gustav’s Problem?

7 Taxonomic Type FilterClass Filter X X Specimens Tissues Sequences Functions X Infer Relationships Across providers A Biological Relationship Graph …

8 Moorea Biocode Example: Tracking biological material from field collection through analysis, across multiple systems (Biocode Event) (Essig Museum Specimen) (Smithsonian Tissue) (CAMERA Gut Sample Event) (Genbank Sequence) (metagenomic Sequencing) KeyBlast*n Taxon*n Taxon Blast Taxon (Key) (Taxon)

9 How do we Track Biological Objects and their Relations Across Distributed, Heterogeneous systems?

10 Tracking Biological Object Relationships Group like terms into classes. In Darwin Core, for example we have the following “groups of terms”: Events, Locations, Occurrences, GeologicalContext, Identification, Taxon. Assign Identifiers. Use globally unique, resolvable, persistent identifiers for each class or term. Link Identifiers using Relationship Terms. For example, “This object is related to that object.” Put this data on the Web.

11 Related Projects that are Grouping Like terms into Classes Darwin-SW (http://code.google.com/p/darwin-sw/) Building an ontology of Darwin Core Terms to make it possible to describe biodiversity resources on the web.http://code.google.com/p/darwin-sw/ Gene Ontology (http://www.geneontology.org/) Standardizing the representation of gene and gene product attributes across species and databases.http://www.geneontology.org/ ENVO (http://environmentontology.org/)http://environmentontology.org/ Annotating the environment for any biological sample. OBO Foundry (http://www.obofoundry.org/)http://www.obofoundry.org/ A suite of orthogonal interoperable reference ontologies in the biomedical domain

12 Creating Globally Unique Identifiers (GUIDs)  Globally unique (mandatory)  Persistent (not mandatory, but very helpful)  Resolvable (not mandatory, but very helpful) Resolution/Domain + Identifier JDeckSpecimen1 (A named identifier)http://mycollection.org/specimen/ http://mycollection.org/specimen/JDeckSpecimen1 http://mycollection.org/specimen/uuid=7217D220-836A-11DF-8395-0800200C9A66 Examples: http://example.org/urn:lsid:example.org:specimen/7217D220-836A-11DF-8395-0800200C9A66 +1-541-914-4739 (Unique, at least for phones) 7217D220-836A-11DF-8395-0800200C9A66 (opaque) http://example.org/urn:lsid:example.org:specimen/

13 Linking Identifiers Using Relationship Terms Predicate An RDF Statement: Subject Object relatedTo (Transitive): relatedTo GUID1 GUID2 GUID3 relatedTo GUID1 GUID2 GUID2 GUID3 GUID1 GUID3 OR Predicate GUID1 GUID2 A Simple BiSciCol Graph (graph=set of RDF Statements): relatedTo a a Date GUID1 GUID2 GUID3 relatedTo Event “2011-06-20” “2011-05-01” Tissue “2011-06-01” Specimen a Date

14 Getting the most out of your data: Inferring Object Relationships Facebook Inferencing: “Let us sell you, to others (or vice-versa)” BiSciCol Inferencing: “What relationships exist that haven’t been explicitly expressed”

15 Location1 (Essig Museum) Organism2 (Smithsonian) sameAs inferred Organism1 (Essig Museum) relatedTo Tissue1 (Essig Museum) relatedTo Tissue2 (Smithsonian) relatedTo Georeference1 (BioGeomancer) relatedTo 48.198,16.371;crs=wgs84;u=40 hasSpatialThingGeoreference Even though Tissue #2 is not directly related to Location1, we can Still infer its relationship through Organism1 and Organism2 being the same as each other. Tissue1 (Essig Museum) inferred Tissue2 (Smithsonian) inferred Inferred Relationship Chains

16 Tools in Development “Bio-Plugins”

17 Update Mechanisms Gustav’s Watchlist: GP12345-3939-33939 (Occurrence) BE99999-3939-3dd39 (Event) GP12346-3939-33II3 (Occurrence) GP12dd6-3939-3xxxI (Tissue) GP9999-xkx9d-dkdkd (Occurrence) … BiSciCol API (Search on Date And return graph Of object) Search Descendents (By Recent Modification) Updates

18 Genomic Rosetta Stone Uses GUIDs, classed data, and links to tie Organismal data to Genomic Data.

19 “Triplifier” linking biological objects Mysql KEMU “Triplifier” Create links from Native data formats Mysql BiSciCol Darwin Core Archive

20 Example Taxonomic Query Aedes increpitus Search Scientific Name: Run Client Interface: BISCICOL SERVICE LOOKUP: dwc:IdentificationID1 :relatedTo http://lsid.itis.gov/urn:lsid:itis.gov:itis_tsn:126314http://lsid.itis.gov/urn:lsid:itis.gov:itis_tsn:126314 dwc:IdentificationID1 :relatedTo dwc:OccurrenceID1 dwc:IdentificationID2 :relatedTo http://lsid.itis.gov/urn:lsid:itis.gov:itis_tsn:126317http://lsid.itis.gov/urn:lsid:itis.gov:itis_tsn:126317 dwc:IdentificationID2 :relatedTo dwc:OccurrenceID3 Results: OccurrenceID1 (Aedes increpitus Dyar, 1916 ) Dyar, 1916 OccurrenceID3 (Aedes vittata Theobald, 1903) Theobald, 1903 Taxon SERVICE (ITIS / GNUB) http://lsid.itis.gov/urn:lsid:itis.gov:itis_tsn:126314 http://lsid.itis.gov/urn:lsid:itis.gov:itis_tsn:126317 http://gnub.org/8E19F1DC-74BA-47D4-A505-6498414B4CCE

21 Working with Locations E.g. Tracking location in space of a moving individual (whales) EventID1 EventID2 EventID3 IndividualID1 GeoreferenceID 1 GeoreferenceID 2 GeoreferenceID 3

22 Data Impact Factor – Graph Metrics Occurrence:MBIO1234 (“2011-10-18 09:10:00”) DNA Extraction:Extrac9999 (“2011-10-18 09:00:00”) Sequence:s1113939999 (“2011-10-18 08:00:00”) Occurrence:MBIO1235 (“2011-10-17 00:00:00”) Photo:P123456 (“2011-10-17 00:00:00”) Whats New? Occurrences MBIO99999 (1024 total descendents) IMBL8888888 (723 total descendents) Events Biocode10234 (4234 direct children) Expedition21234 (1023 direct children) Collectors Gustav Paulay (102,000 direct children) Christopher Meyer (83,000 direct children) Craig Moritz (523 direct children) [ ] GBIF Relations Graph [X] Moorea Biocode [X] SI MSNGR System [+] Add New Graph Graphs

23 Web Interface (Demonstration Wed. 2pm at BiSciCol Meeting)

24 Summary All objects are re-usable in the semantic web. We only need to express an identifier once and then it can be linked by anything else (either directly or indirectly) By using sameAs relations it is possible to infer relations for data that was not previously expressed. Queries are easily federated – possibility to create global graphs and ask questions against heterogeneous databases. Graph based databases can help us understand the relevance of individual objects. For example, indicate the number of relations a particular object has for 1 st, 2 nd, 3 rd, or n th order relations.

25 “Create stable identifiers, link them to other stable identifiers, and put them on the web.” How to Get Involved http://biscicol.blogspot.com/ http://code.google.com/p/biscicol/


Download ppt "John Deck, University of California, Berkeley Brian Stucky, University of Colorado, Boulder Lukasz Ziemba, University of Florida, Gaineseville Nico Cellinese,"

Similar presentations


Ads by Google