Download presentation
Presentation is loading. Please wait.
1
Semantics powered Bioinformatics Amit Sheth, William S. York, et al Large Scale Distributed Information Systems Lab & Complex Carbohydrate Research Center University of Georgia http://lsdis.cs.uga.edu Project Information:
2
Background: SW for Life Sciences Bioinformatics of Glycan Expression – component of the NCRR "Integrated Technology Resource for Biomedical Glycomics”. W3C Interest Group on Semantic Web for Health care and Life Sciences Deployed Active Semantic Electronic Medical Patient Record application at the Athens Heart Center
3
Agenda Review of Accomplishments/Ongoing Work: oGLYDE standard oGlycO Ontology oProPreO Ontology oSemantic Analytical Glycomics Workflow oVisualization oSemantic Web Services: WSDL-S/METEOR-S
4
GLYDE standard An XML based representation format for glycan structures Inter-convertible with existing data represented using IUPAC or LINUCS. In progress: Incorporation of Probability based representation In progress: Incorporation of aspects for visualization of structures using GLYDE (XML) files GLYDE - An expressive XML standard for the representation of glycan structure. Carbohydrate Research, 340 (18), Dec 30, 2005.
5
Enable querying and export of query results in GLYDE format Using GLYDE representation for disambiguation, mapping and matching MonosaccharideDB SweetDB KEGG.. QUERY RESULT GLYDE Collaborative GlycoInformatics
6
Development of GLYDE semantic web portal Integration with www.glycosciences.dewww.glycosciences.de oVisualization aspect integrated with LiGraph (Heidelberg) or OntoVista (UGA) Semantic Annotation of publications in GlycoProteomics domain GLYDE Semantic Portal KEGG MonosaccharideDB www.glycosciences.de Collaborative GlycoInformatics
7
Evolving collaboration between: LSDIS/CCRC: Will York, Amit Sheth, Michael Pierce EUROCarbDB (German Cancer Research Center): Willi von der Lieth Consortium for Functional Glycomics (CFG): Rahul Raman, Ram Sasisekharan, Thomas Lütteke N.D. Zelinsky Institute of Organic Chemistry (Moscow) Yuriy Knirel Mitsui Knowledge Industry (Japan): Hisashi Narimatsu, Norihiro Kikuchi Kyoto Encyclopedia of Genes and Genomes (KEGG): Minoru Kanehisa, Kiyoko F. Aoki-Kinoshita Palo Alto Research Center (PARC): David Goldberg,
8
Semantic GlcyoInformatics - Ontologies GlycOGlycO: A domain ontology for glycan structures, glycan functions and enzymes (embodying knowledge of the structure and metabolisms of glycans) oContains 600+ classes and 100+ properties – describe structural features of glycans; unique population strategy oURL: http://lsdis.cs.uga.edu/projects/glycomics/glyco http://lsdis.cs.uga.edu/projects/glycomics/glyco ProPreOProPreO: a comprehensive process Ontology modeling experimental proteomics oContains 330 classes, 6 million+ instances oModels three phases of experimental proteomics URL: http://lsdis.cs.uga.edu/projects/glycomics/propreo http://lsdis.cs.uga.edu/projects/glycomics/propreo
9
GlycO taxonomy The first levels of the GlycO taxonomy Most relationships and attributes in GlycO GlycO exploits the expressiveness of OWL-DL. Cardinality constraints, value constraints, Existential and Universal restrictions on Range and Domain of properties allow the classification of unknown entities as well as the deduction of implicit relationships.
10
Pathway representation in GlycO Pathways do not need to be explicitly defined in GlycO. The residue-, glycan-, enzyme- and reaction descriptions contain all the knowledge necessary to infer pathways.
11
Zooming in a little … The N-Glycan with KEGG ID 00015 is the substrate to the reaction R05987, which is catalyzed by an enzyme of the class EC 2.4.1.145. The product of this reaction is the Glycan with KEGG ID 00020. Reaction R05987 catalyzed by enzyme 2.4.1.145 adds_glycosyl_residue N-glycan_b-D-GlcpNAc_13
12
Ontology Population The next slides show the different steps that were necessary to populate GlycO with glycan structures from multiple sources. GLYDE is used to disambiguate between representations from multiple sources
13
Ontology population workflow
14
[][Asn]{[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-Manp] {[(3+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc] {}[(4+1)][b-D-GlcpNAc] {}}[(6+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc]{}}}}}}
17
ProPreO: A process ontology to capture proteomics experimental lifecycle: oSeparation oMass spectrometry oAnalysis o330 classes o110 properties o6 million+ instances ProPreO
18
Manual annotation of mouse kidney spectrum by a human expert. For clarity, only 19 of the major peaks have been annotated. Usage: Mass spectrometry analysis Goldberg, et al, Automatic annotation of matrix-assisted laser desorption/ionization N-glycan spectra, Proteomics 2005, 5, 865–875
19
P(S | M = 3461.57) = 0.6 P(T | M = 3461.57) = 0.4 Goldberg, et al, Automatic annotation of matrix-assisted laser desorption/ionization N-glycan spectra, Proteomics 2005, 5, 865–875 Semantic Annotation of Experimental Data Enables Ontology-mediated Disambiguation Allows correlation between disparate entities using Semantic Relations
20
Cell Culture Glycoprotein Fraction Glycopeptides Fraction extract Separation technique I Glycopeptides Fraction n*m n Signal integration Data correlation Peptide Fraction ms datams/ms data ms peaklist ms/ms peaklist Peptide listN-dimensional array Glycopeptide identification and quantification proteolysis Separation technique II PNGase Mass spectrometry Data reduction Peptide identification binning n 1 Semantic GlycoProteomics Workflow
21
Web Services based Workflow = Web Process Web Service 1 Web Service 4 Web Service 2 Web Service 3 WS1 WS 2 WS 3 WS 4 WORKFLOW LINUX Solaris MAC Windows XP
22
BOWSER Use semantics for describing Web Services WSDL-S (LSDIS/IBM) Use service-level annotation of Web Services Graphical traversal of taxonomy of biological concepts to search for Web Services http://128.192.9.11:8080/stargate/bowser.jsp
23
Semantic Annotation of Scientific Data 830.9570 194.9604 2 580.2985 0.3592 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043 ms/ms peaklist data <parameter instrument=micromass_QTOF_2_quadropole_time_of_flight_m ass_spectrometer mode = “ms/ms”/> 830.9570 194.9604 2 Annotated ms/ms peaklist data
24
Semantic annotation of Scientific Data Annotated ms/ms peaklist data <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_s pectrometer” mode = “ms/ms”/> 830.9570 194.9604 2
25
Identified and quantified peptides Specific cellular process Lectin Collection of N-glycan ligands Collection of Biosynthetic enzymes Discovery of relationship between biological entities Fragment of Specific protein GlycOProPreO Gene Ontology (GO) Genomic database (Mascot / Sequest ) The inference: instances of the class collection of Biosynthetic enzymes (GNT-V) are involved in the specific cellular process (metastasis). processprocess
26
Formalize description and classification of Web Services using ProPreO concepts Semantic Web Services using WSDL-S <wsdl:definitions targetNamespace="urn:ngp" ….. xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <schema targetNamespace="urn:ngp“ xmlns="http://www.w3.org/2001/XMLSchema"> ….. WSDL ModifyDBWSDL-S ModifyDB <wsdl:definitions targetNamespace="urn:ngp" …… xmlns: wssem="http://www.ibm.com/xmlns/WebServices/WSSemantics" xmlns: ProPreO="http://lsdis.cs.uga.edu/ontologies/ProPreO.owl" > <schema targetNamespace="urn:ngp" xmlns="http://www.w3.org/2001/XMLSchema"> …… <wsdl:message name="replaceCharacterRequest" wssem:modelReference="ProPreO#peptide_sequence"> ProPreO process Ontology data sequence peptide_sequence Concepts defined in process Ontology Description of a Web Service using: Web Service Description Language
27
Semantic Visualization Ontologies are meant for machine consumption Often too convoluted for the human eye The scientist needs to know the concepts she uses for annotation Build a visualization environment that translates the formal concepts into a representation the domain expert understands well
28
Single Glycan
29
Customizable Layouts Using customizable layouts, knowledge can be formalized in a machine understandable way and then visually translated for the user’s needs. –Cartoonist representation for the Glycobiologist –Chemical reactions as left side right side, instead of convoluted representation in the ontology.
30
Ongoing and Future Work SemURI: Semantic URI based provenance scheme using ProPreO RDF-based version of the GLYDE schema A framework for semantic annotation of experimental data Integration of large datasets (~500MB) into ProPreO for reasoning
31
http://lsdis.cs.uga.edu/projects/glycomics/ Further details at:
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.