Comparative Data Analysis Ontology (CDAO)

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

DIMNet Workshop 7 & 8/10/2002 AutoMed: Automatic generation of Mediator tools for heterogeneous database integration Alex Poulovassilis (Birkbeck College)
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
So What Does it All Mean? Geospatial Semantics and Ontologies Dr Kristin Stock.
Knowledge Acquisitioning. Definition The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
1 COS 425: Database and Information Management Systems XML and information exchange.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Advances in the PARCC Mathematics Assessment August
ITEC224 Database Programming
Copyright © 2013 Curt Hill The Zachman Framework What is it all about?
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
SONet: Scientific Observations Network Semtools: Semantic Enhancements for Ecological Data Management Mark Schildhauer, Matt Jones, Shawn Bowers, Huiping.
School of Computing FACULTY OF ENGINEERING Developing a methodology for building small scale domain ontologies: HISO case study Ilaria Corda PhD student.
NeXML A future data exchange standard for phylogenetics Rutger Vos University of British Columbia.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
Methodology: Conceptual Databases Design
Dimitrios Skoutas Alkis Simitsis
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Information Systems Engineering. Lecture Outline Information Systems Architecture Information System Architecture components Information Engineering Phases.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Week III  Recap from Last Week Review Classes Review Domain Model for EU-Bid & EU-Lease Aggregation Example (Reservation) Attribute Properties.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
Algorithmic Detection of Semantic Similarity WWW 2005.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Tool for Ontology Paraphrasing, Querying and Visualization on the Semantic Web Project By Senthil Kumar K III MCA (SS)‏
AT&T Government Solutions, Inc. Patrick Emery Lewis Hart or
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why have an Ontology for DoT? The difficult questions.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Formal Specification: a Roadmap Axel van Lamsweerde published on ICSE (International Conference on Software Engineering) Jing Ai 10/28/2003.
Phylogenetic Trees - Parsimony Tutorial #13
Trait ontology approach Marie-Angélique LAPORTE NCEAS June 7 th 2010.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
February 2010 OBO Foundry Meeting Hilmar Lapp Nescent Comparative Data Analysis Ontology.
Using OWL 2 For Product Modeling David Leal Caesar Systems April 2009 Henson Graves Lockheed Martin Aeronautics.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Ontology Technology applied to Catalogues Paul Kopp.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Mechanisms for Requirements Driven Component Selection and Design Automation 최경석.
Representing and Reasoning with Heterogeneous, Modular and Distributed ontologies UniTN/IRST contribution to KnowledgeWeb.WP 2.1.
Rutger Vos and Wayne Maddison University of British Columbia
Nexml A future data exchange standard for phylogenetics
DOMAIN ONTOLOGY DESIGN
Comparative Data Analysis Ontology (CDAO)
SysML v2 Formalism: Requirements & Benefits
Development of the Amphibian Anatomical Ontology
Warm Up Who was Charles Darwin?
WP1: D 1.3 Standards Framework Status June 25, 2015
knowledge organization for a food secure world
Chapter 6 Database Design
The Re3gistry software and the INSPIRE Registry
Taibah University College of Computer Science & Engineering Course Title: Discrete Mathematics Code: CS 103 Chapter 10 Trees Slides are adopted from “Discrete.
CSc4730/6730 Scientific Visualization
Object-Oriented Knowledge Representation
CCO: concept & current status
OBI – Standard Semantic
Ontology-Based Approaches to Data Integration
Web services, Geospatial processing models, Workflows, and Virtualization of Geospatial Products Liping Di Laboratory for Advanced Information Technology.
Database Design Hacettepe University
A framework for ontology Learning FROM Big Data
Cs212: Data Structures Lecture 7: Tree_Part1
Presentation transcript:

Comparative Data Analysis Ontology (CDAO) Francisco Prosdocimi, Brandon Chisham, Julie Thompson, Enrico Pontelli, Arlin Stoltzfus

Objectives Develop a framework to formalize knowledge in the evolutionary biology domain Formalize an ontology for comparative data analysis Comparative Data Analysis Ontology (CDAO) Implement and Evaluate the ontology

Motivation Interoperation Reasoning Miscellaneous Ontologies formalize knowledge Overcome ambiguities in data formats (e.g., the multiple interpretations of NEXUS) Facilitate provably correct format conversions Reasoning Beyond relational queries Automated generation of format converters Advanced reasoning required for workflow constructions and validation Miscellaneous Guide development of new data formats Lingua franca for knowledge exchange …

Choice of Representation Development Process Ontology Construction Conceptualization Concept glossary Organize knowledge Implementation Build model Integrate ontologies Evaluation Consistency tests Completeness tests Use cases Specification Identify purpose and scope Define use cases Choice of Representation Language Development tools

Structure of CDAO Current Focus: Taxonomic units Tree-like networks of relationships Models of evolutionary changes

Structure of CDAO Core Components Imported Components Representation of Networks and Trees (e.g., NEXUS TREE Block) Representation of Character Data (e.g., NEXUS CHARACTERS Block) Imported Components Amino Acid Ontology http://www.co-ode.org/ontologies/amino-acid U. Manchester, 2006 Nucleotide Ontology http://www.co-ode.org/ontologies/basic-bio/

CDAO: Core Components Network/Tree representation Rooted and Unrooted Trees Nodes Edges Sets of Nodes Child Node topology node rooted tree part_of directed edge has ancestor is_a has descendant network node is_a Parent Node is_a Unrooted tree part_of edge node Represents TU part_of has_annotation has_annotation has element mrca_of Annotation: Tree Procedure, Model… Annotation: Transformation, Length… set of nodes is_a lineage

has_descendant min 2 Nodes CDAO: Core Components Representation of a Directed Tree a) A B C D E has_descendant min 2 Nodes Lineage Subtree MRCA_Node has_child_Node Directed edge or branch has_parent_Node Edge Transformation has_root_node Character Ancestor state, Derived state… Edge Node Node (Ancestral) Rooted_tree Edge Transformation

CDAO: Core Components Annotations Edge Annotations Model Description Length Transformation Model Description Gap Cost Substitution Model TU Annotation Taxonomic Link Tree Annotation Tree Procedure Edge Annotation transform_character has_left_state has_left_node character state transformation character state has_right_node has_right_state

CDAO: Core Components Character State Data Matrix Character Taxonomic Units Datum State Character State Data Matrix has annotation Annotation: Alignment procedures… character state data matrix part_of part_of Annotation: TAXID, DB-XREF… has annotation belongs_to taxonomic unit has datum character has datum character state datum belongs_to has represented by node has coordinate character state is_a is_a belongs_to is_a is_a compound amino acid discrete coordinate system is_transformation_of is_a nucleotide continuous

Implementation Details Formalization OWL 1.1 Tools Protégé 4 [edit] Swoop 2.3 [validation] C++ and Perl+Prolog translators Swoop 2.3 [reasoning] Pellet [reasoning] Fact++ [reasoning]

Preliminary Evaluation We are reaching the stage where concrete evaluation is possible NEXUS converters We stumbled on several blocks A good formalization of CDAO requires sophisticated features (OWL 1.1) The majority of reasoning engines has not reached OWL 1.1 yet (even if they claim so…)

Some Examples Simple NEXUS file #NEXUS BEGIN TAXA; DIMENSIONS ntax=10; TAXLABELS Arabidopsis_thaliana_AAD31363.1 Arabidopsis_thaliana_CAB79970.1 Oryza_sativa_BAB21282.1 Dictyostelium_discoideum_AAO51107.1 Caenorhabditis_elegans_CAA92686.1 Drosophila_melanogaster_AAF55117.1 Drosophila_melanogaster_AAF55115.1 Mus_musculus_BAB61955.1 Saccharomyces_cerevisiae_AAB68881.1 Schizosaccharomyces_pombe_CAB16373.1; END; BEGIN CHARACTERS; TITLE dna; LINK taxa=PF00137_47; DIMENSIONS nchar=10; FORMAT datatype=dna gap=- missing=?; MATRIX Arabidopsis_thaliana_CAB79970.1 gtgtggttgc Schizosaccharomyces_pombe_CAB16373.1 tgtatatgct Drosophila_melanogaster_AAF55117.1 tgtacttcgt Arabidopsis_thaliana_AAD31363.1 gt---gtggc Oryza_sativa_BAB21282.1 ct-------- Saccharomyces_cerevisiae_AAB68881.1 tgtacaagct Mus_musculus_BAB61955.1 tctgctacac Dictyostelium_discoideum_AAO51107.1 cacttactcc Caenorhabditis_elegans_CAA92686.1 tgttttacat Drosophila_melanogaster_AAF55115.1 ac------g- ; BEGIN TREES; TREE con_50_majrule = (((Arabidopsis_thaliana_AAD31363.1:0.004496,Arabidopsis_thaliana_CAB79970.1:0.009539)inode15:0.090479,Oryza_sativa_BAB21282.1:0.043596)inode14:0.219708,(Dictyostelium_discoideum_AAO51107.1:0.341768,(((Caenorhabditis_elegans_CAA92686.1:0.308884,(Drosophila_melanogaster_AAF55117.1:0.128132,Drosophila_melanogaster_AAF55115.1:0.384443)inode20:0.236060)inode19:0.093887,Mus_musculus_BAB61955.1:0.243982)inode18:0.150844,(Saccharomyces_cerevisiae_AAB68881.1:0.235101,Schizosaccharomyces_pombe_CAB16373.1:0.261646)inode21:0.225955)inode17:0.189073)inode16:0.127974)root;

Some Examples Node: Directed_Edge: <cdao:Node rdf:ID="node_inode15"> <cdao:part_of rdf:resource="#Tree"/> <cdao:belongs_to_Edge rdf:resource="#edge_inode15_inode14" /> <cdao:belongs_to_Edge rdf:resource="#edge_Arabidopsis_thaliana_CAB79970_1_inode15" /> <cdao:belongs_to_Edge rdf:resource="#edge_Arabidopsis_thaliana_AAD31363_1_inode15" /> <cdao:belongs_to_Edge_as_Child rdf:resource="#edge_inode15_inode14" /> <cdao:belongs_to_Edge_as_Parent rdf:resource="#edge_Arabidopsis_thaliana_CAB79970_1_inode15" /> <cdao:belongs_to_Edge_as_Parent rdf:resource="#edge_Arabidopsis_thaliana_AAD31363_1_inode15" /> <cdao:nca_node_of rdf:resource="#set_nca_44"/> </cdao:Node> Directed_Edge: <cdao:Directed_Edge rdf:ID="edge_Arabidopsis_thaliana_CAB79970_1_inode15"> <cdao:has_Parent_Node rdf:resource="#node_inode15"/> <cdao:has_Child_Node rdf:resource="#node_Arabidopsis_thaliana_CAB79970_1"/> <cdao:has_Annotation rdf:resource="#edge_Arabidopsis_thaliana_CAB79970_1_inode15_length"/> </cdao:Directed_Edge> <cdao:Edge_Length rdf:ID="edge_Arabidopsis_thaliana_CAB79970_1_inode15_length"> <cdao:has_Value rdf:datatype="&xsd;float"> 0.009539 </cdao:has_Value> </cdao:Edge_Length>

Some Examples TU Character Datum State <cdao:TU rdf:ID="Caenorhabditis_elegans_CAA92686_1"> <cdao:belongs_to_Character_State_Data_Matrix rdf:resource="#Matrix"/> <cdao:represented_by_Node rdf:resource="#node_Caenorhabditis_elegans_CAA92686_1"/> <cdao:has_Nucleotide_Datum rdf:resource="#datum_Caenorhabditis_elegans_CAA92686_1_char_0"/> <cdao:has_Nucleotide_Datum rdf:resource="#datum_Caenorhabditis_elegans_CAA92686_1_char_1"/> <cdao:has_Nucleotide_Datum rdf:resource="#datum_Caenorhabditis_elegans_CAA92686_1_char_2"/> … </cdao:TU> Character <cdao:Nucleotide_Character rdf:ID="char_2"> <cdao:has_Nucleotide_Datum rdf:resource="#datum_Oryza_sativa_BAB21282_1_char_2"/> <cdao:has_Nucleotide_Datum rdf:resource="#datum_Arabidopsis_thaliana_CAB79970_1_char_2"/> <cdao:has_Nucleotide_Datum rdf:resource="#datum_Mus_musculus_BAB61955_1_char_2"/> </cdao:Nucleotide_Character> Datum <cdao:Nucleotide_State_Datum rdf:ID="datum_Caenorhabditis_elegans_CAA92686_1_char_6"> <cdao:belongs_to_Character rdf:resource="#char_6"/> <cdao:belongs_to_TU rdf:resource="#Caenorhabditis_elegans_CAA92686_1"/> <cdao:has_Nucleotide_State rdf:resource="#value_a"/> </cdao:Nucleotide_State_Datum> State <cdao:Nucleotide rdf:ID="value_a"> <owl:sameAs rdf:resource="#dA"/> </cdao:Nucleotide>

Simple Reasoning Tasks Determine what TUs contain a gap in their tables: [Fact++] (has_Datum some (has_State value gap)) and TU Determine the ancestors of a TU in the tree: has_Descendant value node_Drosophila_melanogaster_AAF55115_1

Simple Reasoning Tasks Extract the row of a specific TU: SELECT ?z,?y WHERE (base:Arabidopsis_thaliana_AAD31363_1>, cdao:has_Datum, ?x) (?x, cdao:has_State, ?y) (?x, cdao:belongs_to_Character, ?z) USING base FOR <file:/C:/Users/epontell/Documents/Research/Proposals/NEXUS/Research/Perl/inst_matrix.owl#>, cdao FOR <http://www.cs.nmsu.edu/~epontell/CURRENT_matrix.owl#> z y inst_matrix:char_9 CURRENT:dC inst_matrix:char_8 CURRENT:dG inst_matrix:char_7 inst_matrix:char_5 inst_matrix:char_0 inst_matrix:char_6 CURRENT:dT inst_matrix:char_1 inst_matrix:char_3 CURRENT:gap inst_matrix:char_2 inst_matrix:char_4

Future Work To facilitate evaluation Java-level reasoning Create an OWL 1.0 edition of the ontology (and corresponding NEXUS translator) Java-level reasoning Aggregation Etc. Large scale NEXUS validation NeXML Interface OBO distribution