Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparative Data Analysis Ontology (CDAO)

Similar presentations


Presentation on theme: "Comparative Data Analysis Ontology (CDAO)"— Presentation transcript:

1 Comparative Data Analysis Ontology (CDAO)
Francisco Prosdocimi, Brandon Chisham, Julie Thompson, Enrico Pontelli, Arlin Stoltzfus

2 Objectives Develop a framework to formalize knowledge in the evolutionary biology domain Formalize an ontology for comparative data analysis Comparative Data Analysis Ontology (CDAO) Implement and Evaluate the ontology

3 Motivation Interoperation Reasoning Miscellaneous
Ontologies formalize knowledge Overcome ambiguities in data formats (e.g., the multiple interpretations of NEXUS) Facilitate provably correct format conversions Reasoning Beyond relational queries Automated generation of format converters Advanced reasoning required for workflow constructions and validation Miscellaneous Guide development of new data formats Lingua franca for knowledge exchange

4 Choice of Representation
Development Process Ontology Construction Conceptualization Concept glossary Organize knowledge Implementation Build model Integrate ontologies Evaluation Consistency tests Completeness tests Use cases Specification Identify purpose and scope Define use cases Choice of Representation Language Development tools

5 Structure of CDAO Current Focus: Taxonomic units
Tree-like networks of relationships Models of evolutionary changes

6 Structure of CDAO Core Components Imported Components
Representation of Networks and Trees (e.g., NEXUS TREE Block) Representation of Character Data (e.g., NEXUS CHARACTERS Block) Imported Components Amino Acid Ontology U. Manchester, 2006 Nucleotide Ontology

7 CDAO: Core Components Network/Tree representation
Rooted and Unrooted Trees Nodes Edges Sets of Nodes Child Node topology node rooted tree part_of directed edge has ancestor is_a has descendant network node is_a Parent Node is_a Unrooted tree part_of edge node Represents TU part_of has_annotation has_annotation has element mrca_of Annotation: Tree Procedure, Model… Annotation: Transformation, Length… set of nodes is_a lineage

8 has_descendant min 2 Nodes
CDAO: Core Components Representation of a Directed Tree a) A B C D E has_descendant min 2 Nodes Lineage Subtree MRCA_Node has_child_Node Directed edge or branch has_parent_Node Edge Transformation has_root_node Character Ancestor state, Derived state… Edge Node Node (Ancestral) Rooted_tree Edge Transformation

9 CDAO: Core Components Annotations Edge Annotations Model Description
Length Transformation Model Description Gap Cost Substitution Model TU Annotation Taxonomic Link Tree Annotation Tree Procedure Edge Annotation transform_character has_left_state has_left_node character state transformation character state has_right_node has_right_state

10 CDAO: Core Components Character State Data Matrix Character
Taxonomic Units Datum State Character State Data Matrix has annotation Annotation: Alignment procedures… character state data matrix part_of part_of Annotation: TAXID, DB-XREF… has annotation belongs_to taxonomic unit has datum character has datum character state datum belongs_to has represented by node has coordinate character state is_a is_a belongs_to is_a is_a compound amino acid discrete coordinate system is_transformation_of is_a nucleotide continuous

11 Implementation Details
Formalization OWL 1.1 Tools Protégé 4 [edit] Swoop 2.3 [validation] C++ and Perl+Prolog translators Swoop 2.3 [reasoning] Pellet [reasoning] Fact++ [reasoning]

12 Preliminary Evaluation
We are reaching the stage where concrete evaluation is possible NEXUS converters We stumbled on several blocks A good formalization of CDAO requires sophisticated features (OWL 1.1) The majority of reasoning engines has not reached OWL 1.1 yet (even if they claim so…)

13 Some Examples Simple NEXUS file #NEXUS BEGIN TAXA; DIMENSIONS ntax=10;
TAXLABELS Arabidopsis_thaliana_AAD Arabidopsis_thaliana_CAB Oryza_sativa_BAB Dictyostelium_discoideum_AAO Caenorhabditis_elegans_CAA Drosophila_melanogaster_AAF Drosophila_melanogaster_AAF Mus_musculus_BAB Saccharomyces_cerevisiae_AAB Schizosaccharomyces_pombe_CAB ; END; BEGIN CHARACTERS; TITLE dna; LINK taxa=PF00137_47; DIMENSIONS nchar=10; FORMAT datatype=dna gap=- missing=?; MATRIX Arabidopsis_thaliana_CAB gtgtggttgc Schizosaccharomyces_pombe_CAB tgtatatgct Drosophila_melanogaster_AAF tgtacttcgt Arabidopsis_thaliana_AAD gt---gtggc Oryza_sativa_BAB ct Saccharomyces_cerevisiae_AAB tgtacaagct Mus_musculus_BAB tctgctacac Dictyostelium_discoideum_AAO cacttactcc Caenorhabditis_elegans_CAA tgttttacat Drosophila_melanogaster_AAF ac------g- ; BEGIN TREES; TREE con_50_majrule = (((Arabidopsis_thaliana_AAD : ,Arabidopsis_thaliana_CAB : )inode15: ,Oryza_sativa_BAB : )inode14: ,(Dictyostelium_discoideum_AAO : ,(((Caenorhabditis_elegans_CAA : ,(Drosophila_melanogaster_AAF : ,Drosophila_melanogaster_AAF : )inode20: )inode19: ,Mus_musculus_BAB : )inode18: ,(Saccharomyces_cerevisiae_AAB : ,Schizosaccharomyces_pombe_CAB : )inode21: )inode17: )inode16: )root;

14 Some Examples Node: Directed_Edge:
<cdao:Node rdf:ID="node_inode15"> <cdao:part_of rdf:resource="#Tree"/> <cdao:belongs_to_Edge rdf:resource="#edge_inode15_inode14" /> <cdao:belongs_to_Edge rdf:resource="#edge_Arabidopsis_thaliana_CAB79970_1_inode15" /> <cdao:belongs_to_Edge rdf:resource="#edge_Arabidopsis_thaliana_AAD31363_1_inode15" /> <cdao:belongs_to_Edge_as_Child rdf:resource="#edge_inode15_inode14" /> <cdao:belongs_to_Edge_as_Parent rdf:resource="#edge_Arabidopsis_thaliana_CAB79970_1_inode15" /> <cdao:belongs_to_Edge_as_Parent rdf:resource="#edge_Arabidopsis_thaliana_AAD31363_1_inode15" /> <cdao:nca_node_of rdf:resource="#set_nca_44"/> </cdao:Node> Directed_Edge: <cdao:Directed_Edge rdf:ID="edge_Arabidopsis_thaliana_CAB79970_1_inode15"> <cdao:has_Parent_Node rdf:resource="#node_inode15"/> <cdao:has_Child_Node rdf:resource="#node_Arabidopsis_thaliana_CAB79970_1"/> <cdao:has_Annotation rdf:resource="#edge_Arabidopsis_thaliana_CAB79970_1_inode15_length"/> </cdao:Directed_Edge> <cdao:Edge_Length rdf:ID="edge_Arabidopsis_thaliana_CAB79970_1_inode15_length"> <cdao:has_Value rdf:datatype="&xsd;float"> </cdao:has_Value> </cdao:Edge_Length>

15 Some Examples TU Character Datum State
<cdao:TU rdf:ID="Caenorhabditis_elegans_CAA92686_1"> <cdao:belongs_to_Character_State_Data_Matrix rdf:resource="#Matrix"/> <cdao:represented_by_Node rdf:resource="#node_Caenorhabditis_elegans_CAA92686_1"/> <cdao:has_Nucleotide_Datum rdf:resource="#datum_Caenorhabditis_elegans_CAA92686_1_char_0"/> <cdao:has_Nucleotide_Datum rdf:resource="#datum_Caenorhabditis_elegans_CAA92686_1_char_1"/> <cdao:has_Nucleotide_Datum rdf:resource="#datum_Caenorhabditis_elegans_CAA92686_1_char_2"/> </cdao:TU> Character <cdao:Nucleotide_Character rdf:ID="char_2"> <cdao:has_Nucleotide_Datum rdf:resource="#datum_Oryza_sativa_BAB21282_1_char_2"/> <cdao:has_Nucleotide_Datum rdf:resource="#datum_Arabidopsis_thaliana_CAB79970_1_char_2"/> <cdao:has_Nucleotide_Datum rdf:resource="#datum_Mus_musculus_BAB61955_1_char_2"/> </cdao:Nucleotide_Character> Datum <cdao:Nucleotide_State_Datum rdf:ID="datum_Caenorhabditis_elegans_CAA92686_1_char_6"> <cdao:belongs_to_Character rdf:resource="#char_6"/> <cdao:belongs_to_TU rdf:resource="#Caenorhabditis_elegans_CAA92686_1"/> <cdao:has_Nucleotide_State rdf:resource="#value_a"/> </cdao:Nucleotide_State_Datum> State <cdao:Nucleotide rdf:ID="value_a"> <owl:sameAs rdf:resource="#dA"/> </cdao:Nucleotide>

16 Simple Reasoning Tasks
Determine what TUs contain a gap in their tables: [Fact++] (has_Datum some (has_State value gap)) and TU Determine the ancestors of a TU in the tree: has_Descendant value node_Drosophila_melanogaster_AAF55115_1

17 Simple Reasoning Tasks
Extract the row of a specific TU: SELECT ?z,?y WHERE (base:Arabidopsis_thaliana_AAD31363_1>, cdao:has_Datum, ?x) (?x, cdao:has_State, ?y) (?x, cdao:belongs_to_Character, ?z) USING base FOR <file:/C:/Users/epontell/Documents/Research/Proposals/NEXUS/Research/Perl/inst_matrix.owl#>, cdao FOR < z y inst_matrix:char_9 CURRENT:dC inst_matrix:char_8 CURRENT:dG inst_matrix:char_7 inst_matrix:char_5 inst_matrix:char_0 inst_matrix:char_6 CURRENT:dT inst_matrix:char_1 inst_matrix:char_3 CURRENT:gap inst_matrix:char_2 inst_matrix:char_4

18 Future Work To facilitate evaluation Java-level reasoning
Create an OWL 1.0 edition of the ontology (and corresponding NEXUS translator) Java-level reasoning Aggregation Etc. Large scale NEXUS validation NeXML Interface OBO distribution


Download ppt "Comparative Data Analysis Ontology (CDAO)"

Similar presentations


Ads by Google