Download presentation
Presentation is loading. Please wait.
Published byAdam Hodges Modified over 9 years ago
1
Applying Semantic Technologies to the Glycoproteomics Domain W. S York May 15, 2006
2
Some Goals of Glycoproteomics How do changes in the expression levels of specific genes alter the expression of specific glycans on the cell surface? Are changes in the expression of specific glycans at the cell surface related to cell function, cell development, and disease? What are the mechanisms by which specific glycans at the cell surface affect cell function, cell development, and the progression of disease?
3
Challenges of Glycoproteomics Vast amounts of data collected by high- throughput experiments - better methods for data archival, retrieval, and analysis are needed Complex structures of glycans and glycoproteins – better methods for representing branched structures and finding structural and functional homologies are needed Complex Biology and Biochemistry – better methods to find relationships between the glycoproteome and biological processes are needed
4
Glycoproteomics Solutions Brute-force analysis of flat data files Too much data Data is heterogeneous What does the data represent? Relational databases Data is well organized Data organization is relatively rigid What does the data represent? Semantic Technologies Data is well organized Data organization is flexible Concepts represented by data are accessible Relationships between concepts are accessible
5
What is Semantic Technology? Semantics: 1. (Linguistics) The study or science of meaning in language. 2. (Linguistics) The study of relationships between signs and symbols and what they represent. The American Heritage® Dictionary of the English Language, Fourth Edition Semantic Technology: The use of formal representations of concepts and their relationships to enable efficient, intelligent software. Ontology (Computer Science): A model that represents a domain and is used to reason about the objects in that domain and the relations between them. http://en.wikipedia.org/wiki/Ontology_(computer_science) The implication is that enabling computers to “understand” the meanings of and relationships between concepts will allow them to reason and communicate in a way that is analogous to the way humans do.
6
A Simple Ontology Organism Animal Plant LionDeerCowHostaAlfalfa Elsa Simba ElsieBambiMy Hosta Peter’s Alfalfa is_a ate
7
A Simple Ontology Organism Animal Plant LionDeerCowHostaAlfalfa Elsa Simba ElsieBambiMy Hosta Peter’s Alfalfa is_a ate CarnivoreHerbivore is_a eats
8
chemical entity residue molecule is_a amino acid residue is_a molecular fragment The Structure of GlycO – Concept Taxonomy carbohydrate moiety is_a carbohydrate residue is_a monoglycosyl moiety glycan moiety N-glycan is_a O-glycan
9
residue amino acid residue is_a carbohydrate residue glycan moiety N-glycan O-glycan is_a – Concept TaxonomyThe Structure of GlycO
10
residue amino acid residue is_a carbohydrate residue glycan moiety N-glycan O-glycan is_a N-glycan core b- D -Manp is_instance_of N-glycan_00020 is_instance_of has_residue is_instance_of N-glycan a- D -Manp 4 is_linked_to – Concept Taxonomy– Instances and Properties The Structure of GlycO
11
The GlycO Ontology in Protégé 3 Top-Level Classes are Defined in GlycO
12
The GlycO Ontology in Protégé Semantics Include Chemical Context This Class Inherits from 2 Parents
13
The GlycO Ontology in Protégé The -D-Manp residues in N-glycans are found in 8 different chemical environments
14
GlycoTree – A Canonical Representation of N-Glycans N. Takahashi and K. Kato, Trends in Glycosciences and Glycotechnology, 15: 235-251 - D -GlcpNAc - D -Manp -(1-4)- - D -Manp -(1-6)+ - D -GlcpNAc -(1-2)- - D -Manp -(1-3)+ - D -GlcpNAc -(1-4)- - D -GlcpNAc -(1-2)+ - D -GlcpNAc -(1-6)+ We give a residue in this position the same name, regardless of the specific structure it resides in Semantics!
15
The GlycO Ontology in Protégé Bisecting -D-GlcpNAc
16
The GlycO Ontology in Protégé
17
1,3-linked -L-Fucp
18
The GlycO Ontology in Protégé
19
Ontology Population Workflow
20
[][Asn]{[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-Manp] {[(3+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc] {}[(4+1)][b-D-GlcpNAc] {}}[(6+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc]{}}}}}} Ontology Population Workflow
21
Ontology Population Workflow
22
The ProPreO Ontology in Protégé 3 Top-Level Classes are Defined in ProPreO
23
The ProPreO Ontology in Protégé This Class Inherits from 2 Parents
24
The ProPreO Ontology in Protégé This Class Inherits from 2 Parents
25
830.9570 194.9604 2 580.2985 0.3592 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043 parent ion m/z fragment ion m/z ms/ms peaklist data fragment ion abundance parent ion abundance parent ion charge Semantic Annotation of MS Data
26
<parameter instrument=micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer mode = “ms/ms”/> Ontological Concepts Semantically Annotated MS Data
27
Web Services Based Workflow for Proteomics 1 Storage Standard Format Data Raw Data Filtered Data Search Results Final Output Agent Biological Sample Analysis by MS/MS Raw Data to Standard Format Data Pre- process 2 DB Search (Mascot/ Sequest) Results Post- process (ProValt 3 ) OIOIOIOIO Biological Information 1 Design and Implementation of Web Services based Workflow for proteomics. Journal of Proteome Research. Submitted 2 Computational tools for increasing confidence in protein identifications. Association of Biomolecular Resource Facilities Annual Meeting, Portland, OR, 2004. 3 A Heuristic method for assigning a false-discovery rate for protein identifications from Mascot database search results. Mol. Cell. Proteomics. 4(6), 762-772.
28
An Integrated Semantic Information System Formalized domain knowledge is in ontologies The schema defines the concepts Instances represent individual objects Relationships provide expressiveness Data is annotated using concepts from the ontologies The semantic annotations facilitate the identification and extraction of relevant information The semantic relationships allow knowledge that is implicit in the data to be discovered
29
Satya Sahoo Christopher Thomas Cory Henson Ravi Pavagada Amit Sheth Krzysztof Kochut John Miller James Atwood Lin Alison Nairn Gerardo Alvarez-Manilla Saeed Roushanzamir Michael Pierce Ron Orlando Kelley Moremen Parastoo Azadi Alfred Merrill
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.