Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Ontological Approach for Describing Phospho-proteins in Rhodococcus Dept. of Computer Science, University of British Columbia. Dennis Wang, Gavin Ha,

Similar presentations


Presentation on theme: "An Ontological Approach for Describing Phospho-proteins in Rhodococcus Dept. of Computer Science, University of British Columbia. Dennis Wang, Gavin Ha,"— Presentation transcript:

1 An Ontological Approach for Describing Phospho-proteins in Rhodococcus Dept. of Computer Science, University of British Columbia. Dennis Wang, Gavin Ha, Jennifer Chen, Nancy Wang CPSC 445. April 5 th. 2007

2 Purpose: knowledge representation & reasoning Facilitates knowledge sharing and reuse Definition: a data model that represents a set of concepts within a domain and the relationships between those concepts. It is used to reason about the objects within that domain. Describe individuals (instances), classes (concepts), attributes, relations and axioms Uses: AI, information architecture, semantic web, software engineer

3 Biology = knowledge based use prior knowledge to infer new knowledge data rich Biologist needs extensive prior knowledge to analyze data obtained Pace of data production beyond one’s ability to acquire knowledge Need an automated system to apply domain experts’ knowledge to biological data

4 Joint effort of biologist and computer scientist Build ontologies using domain knowledge Rapid classification of large datasets Allows query to find instances of a class Create controlled vocabularies for shared use across different biological and medical domains. In bioinformatics, ontology can make knowledge available to community and its applications.

5 “provides structured, controlled vocabularies and classifications that cover several domains of molecular biology” Uses: annotation of large data sets the ability to group gene products to some high level term Computational (putative) assignments of molecular function based on sequence similarity to annotated genes or sequences. Unknown gene product Sequence in SWISS-PROT Seq similarity ?Inferred gene function from electronic annotation Known function Infer function

6 There is no standardized methodology But, efforts to make more comprehensive guidelines In general: Informal Stage natural language Formal Stage formal knowledge representation language

7 Inspired by software engineering. User Model (Biologist): #1) Identification of the purpose and scope of the ontology #2) Acquisition of domain knowledge Identify purpose and scope Knowledge Acquisition

8 Conceptualization Model (Bioinformatician/Biologist): #3) Identifying key concepts in the domain. #4) Integration by using and incorporating other existing ontologies Building Identify purpose and scope Knowledge Acquisition Conceptualizatio n Integrating existing ontologies

9 Implementation Model (Bioinformatician): #5) Representing concepts with a formal language #6) Documenting informal and formal definitions #7) Evaluation of the appropriateness of the ontology for its intended application Building Identify purpose and scope Knowledge Acquisition Conceptualizatio n Integrating existing ontologies Encoding Evaluation Language & Representation Available Development Tools

10 Provides Results Build using OWL-DL Made up of Pellet Reasoner Uses Biologists Signal Protein Experts Phosphatase & Kinase background knowledge Proteomic experimental data Data (Instances/Individuals) Ontology (Classes) Bioinformatician Can we use the phosphabase ontology to describe phospho-proteins discovered by the Rhodococcus Genome Project?

11 subClassOf XML syntax OWL-DL (Description Logic) : Certain restrictions to guarantee decidability based on description logic OWL uses Resource Description Framework (RDF) Subject Predicate Object Basic components in OWL: classes Individuals properties Class Professor Superclass FacultyMember InstanceOf Individual Anne Condon Individual Jennifer Chen teaches

12 Biological Motivation Driven by protein domain architecture to describe signalling protein families Background knowledge required for construction: Signal protein domains Presence of protein domains within signal proteins OWL Ontology Ontology uses OWL-DL Description-logic can be applied to classify proteins using reasoners Many different ways to represent this knowledge in OWL Wolstencroft et al, 2006

13 Domain_EntityMacromoleculeProtein_Phosphatase Protein_Kinase

14 Input Ontology – OWL-DL format axioms about classes into TBox type and property assertions (individuals) into ABox Query - RDQL (SPARQL) format Instance data (individuals) Tableau Reasoner Checks satisfiability of an ABox with respect to a TBox Test for knowledge base consistency [Parsia and Sirin, ISWC 2004]

15 Locus ID:RHA1_ro01186 AcknowledgementsAcknowledgements for this annotation Strain: Rhodococcus sp. RHA1 NBCI Taxonomy Database NBCI Taxonomy Database Replicon: Chromosome Refseq: NC_008268NC_008268 Start:1260414Stop:1260866 Gene Name: Alternate gene name(s): Protein / Product Name: protein-tyrosine- phosphatase Alternate product name(s): Refseq GI Number: 111018199 Category: Protein Localization: CytoplasmicCytoplasmic (Class 3) Transposon Mutant Available?: No transposon mutant available yet COG predictions: Wzb, Protein-tyrosine-phosphatase [Signal transduction mechanisms]. PseudoCAP EC Number: 3.1.3.48 COG0394 Comments: PFAM predictions:PF01451PF01451: LMWPc, Low molecular weight phosphotyrosine protein phosphatase.. go_function: protein tyrosine phosphatase activity [goid 0004725]

16

17 Locus ID:RHA1_ro05453AcknowledgementsAcknowledgements for this annotation Strain: Rhodococcus sp. RHA1 NBCI Taxonomy Database NBCI Taxonomy Database Replicon: Chromosome Refseq: NC_008268NC_008268 Start:5845588Stop:5847288 Gene Name: Alternate gene name(s): Protein / Product Name: probable protein-tyrosine kinaseAlternate product name(s): Refseq GI Number: 111022419 Category: Protein Localization: Cytoplasmic MembraneCytoplasmic Membrane (Class 3) Transposon Mutant Available?: No transposon mutant available yet COG predictions: Mrp, ATPases involved in chromosome partitioning [Cell division and chromosome partitioning]. PseudoCAP EC Number: 2.7.10.1 COG0489 TIGRFAM predictions: TIGRFAM Accession: TIGR01007 TIGRFAM name and function: eps_fam - capsular exopolysaccharide family (6.7e-46) TIGRFAM EC Number: Role: Transport and binding proteins Sub Role: Carbohydrates, organic alcohols, and acids TIGRFAM to Gene Ontology Mappings:TIGR01007 Comments: PFAM predictions: PF02706PF02706: Wzz, Chain length determinant protein. This family includes proteins involved in lipopolysaccharide (lps) biosynthesis. This family comprises the whole length of chain length determinant protein (or wzz protein) that confers a modal distribution of chain length on the O-antigen component of lps. This region is also found as part of bacterial tyrosine kinases.. go_component: signal recognition particle (sensu Eukaryota) [goid 0005786]

18

19 Locus ID:RHA1_ro05554 AcknowledgementsAcknowledgements for this annotation Strain: Rhodococcus sp. RHA1 NBCI Taxonomy Database NBCI Taxonomy Database Replicon: Chromosome Refseq: NC_008268NC_008268 Start:5971327 Stop:5972865 Gene Name: Alternate gene name(s): Protein / Product Name: probable alkaline phosphatase Alternate product name(s): Refseq GI Number: 111022520 Category: Protein Localization: Unknown (This protein may have multiple localization sites)Unknown (This protein may have multiple localization sites) (Class 3) Transposon Mutant Available?: No transposon mutant available yet COG predictions:PhoD, Phosphodiesterase/alkaline phosphatase D [Inorganic ion transport and metabolism]. TIGRFAM predictions: TIGRFAM to Gene Ontology Mappings: COG3540 Comments: PFAM predictions: PF00245PF00245: Alk_phosphatase, Alkaline phosphatase. go_component: organelle inner membrane [goid 0019866]

20 Ontologies can be used as a standard model for the exchange of biological information Building ontologies can get very complicated Biologists with little description logic training Computer scientist with little knowledge of biology  Need more bioinformaticians Ontologies can facilitate automated annotation of genes / gene products Difficult to Read and Infer from Ontologies Ontologies can get very big (Phosphabase only small example) Reasoners are sometimes slow and inaccurate www.quicklybored.com

21 Rhodococcus sp. RHA1 data Eltis Lab: Dr. Lindsay Eltis, Dept. Microbiology & Biochemistry Phosphabase Ontologoy Wolstencroft Lab, University of Manchester, UK Bioinformatics paper: Wolstencroft et al, 2006 Phosphabase Ontology processing Benjamin Good, iCAPTURE Centre, Vancouver


Download ppt "An Ontological Approach for Describing Phospho-proteins in Rhodococcus Dept. of Computer Science, University of British Columbia. Dennis Wang, Gavin Ha,"

Similar presentations


Ads by Google