Web Services for N-Glycosylation Process Integrated Technology Resource for Biomedical Glycomics NCRR/NIH Satya S. Sahoo, Amit P. Sheth, William S. York,

Slides:



Advertisements
Similar presentations
Bioinformatics Platform Three-tier Architecture Object-based Relational Database implemented using Oracle Middleware implemented using Entity-Class Operations,
Advertisements

Knowledge Modeling and its Application in Life Sciences: A Tale of two ontologies Bioinformatics for Glycan Expression Integrated Technology Resource for.
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
1 University of Namur, Belgium PReCISE Research Center Using context to improve data semantic mediation in web services composition Michaël Mrissa (spokesman)
Semantic Web & Semantic Web Services: Applications in Healthcare and Scientific Research International IFIP Conference on Applications of Semantic Web.
Semantic Web Services Peter Bartalos. 2 Dr. Jorge Cardoso and Dr. Amit Sheth
N-Glycopeptide Identification from CID Tandem Mass Spectra using Glycan Databases and False Discovery Rate Estimation Kevin B. Chandler, Petr Pompach,
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Data Management in the DOE Genomics:GTL Program Janet Jacobsen and Adam Arkin Lawrence Berkeley National Laboratory University of California, Berkeley.
Presentation 7 part 2: SOAP & WSDL. Ingeniørhøjskolen i Århus Slide 2 Outline Building blocks in Web Services SOA SOAP WSDL (UDDI)
Aligning Business Processes to SOA B. Ramamurthy 6/16/2015Page 1.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Semantic Web Technology in Support of Bioinformatics for Glycan Expression Amit Sheth Large Scale Distributed Information Systems (LSDIS) lab, Univ. of.
Semantics powered Bioinformatics Amit Sheth, William S. York, et al Large Scale Distributed Information Systems Lab & Complex Carbohydrate Research Center.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
ProReP - Protein Results Parser v3.0©
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
Semantic Web applications in Financial Industry, Government, Health care and Life Sciences SWEG 2006, March 2006 Amit Sheth LSDIS Lab, Department of Computer.
Knowledge Enabled Information and Services Science GlycO.
Managing Information Quality in e-Science using Semantic Web technology Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University.
Semantics Enabled Industrial and Scientific Applications: Research, Technology and Deployed Applications Part III: Biological Applications Keynote - the.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
Towards the Management of Information Quality in Proteomics David Stead University of Aberdeen.
Knowledge representation
Helping scientists collaborate BioCAD. ©2003 All Rights Reserved.
Semantic empowerment of Life Science Applications October 2006 Amit Sheth LSDIS Lab, Department of Computer Science, University of Georgia Acknowledgement:
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Knowledge Enabled Information and Services Science SAWSDL: Tools and Applications Amit P. Sheth Kno.e.sis Center Wright State University, Dayton, OH Knoesis.wright.edu.
Implementing computational analysis through Web services Arnaud Kerhornou CRG/INB Barcelona - BioMed Workshop IRB November 2007.
ADVANCED DB SYSTEMS BIOMEDICAL ENGINEERING. Index INTRODUCTION  BIOMEDICAL ENGINEERING  B.E. DATASETS APPLICATIONS  DATA MINING ON FDA DATABASE  ONTOLOGY-BASED.
CS 461b/661b: Bioinformatics Tools and Applications Software Algorithm Mathematical Models Biology Experiments and Data.
Knowledge Enabled Information and Services Science Glycomics project overview.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Overview of Bioinformatics 1 Module Denis Manley..
XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.
Proteomics databases for comparative studies: Transactional and Data Warehouse approaches Patricia Rodriguez-Tomé, Nicolas Pinaud, Thomas Kowall GeneProt,
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
Applying Semantic Technologies to the Glycoproteomics Domain W. S York May 15, 2006.
Kemal Baykal Rasim Ismayilov
Mining the Biomedical Research Literature Ken Baclawski.
Japan Consortium for Glycobiology and Glycotechnology DataBase 日本糖鎖科学統合データベース GDGDB - Glyco-Disease Genes Database The complexity of glycan metabolic pathways.
Bioinformatics Research Overview Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for.
A New Strategy of Protein Identification in Proteomics Xinmin Yin CS Dept. Ball State Univ.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Proposed Research Problem Solving Environment for T. cruzi Intuitive querying of multiple sets of heterogeneous databases Formulate scientific workflows.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
Slide 1 Service-centric Software Engineering. Slide 2 Objectives To explain the notion of a reusable service, based on web service standards, that provides.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Web Service Semantics - WSDL-S Meenakshi Nagarajan for the WSDL-SWSDL-S team R. Akkiraju *, J. Farrell *, J.Miller, M. Nagarajan, M. Schmidt *, A. Sheth,
By Jay Krishnan. Introduction Information gathered from Proteomic techniques + neuroscientific research = Information on protein composition and function.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Semantic Interoperability of Web Services – Challenges and Experiences Meenakshi Nagarajan, Kunal Verma, Amit P. Sheth, John Miller, Jon Lathem
High throughput biology data management and data intensive computing drivers George Michaels.
Genomic Medicine Grid Juan Pedro Sánchez Merino Instituto de Salud Carlos III
Java Web Services Orca Knowledge Center – Web Service key concepts.
LSDIS Lab, Department of Computer Science,
Amit Sheth LSDIS Lab & Semagix University of Georgia
Service-centric Software Engineering
“Proteomics is a science that focuses on the study of proteins: their roles, their structures, their localization, their interactions, and other factors.”
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Collaborative RO1 with NCBO
Bioinformatics for Proteomics
Presentation transcript:

Web Services for N-Glycosylation Process Integrated Technology Resource for Biomedical Glycomics NCRR/NIH Satya S. Sahoo, Amit P. Sheth, William S. York, John A. Miller Presentation at International Symposium on Web Services For Computational Biology and Bioinformatics, VBI, Blacksburg, VA, May 26-27, 2005

2 Glycomics  Study of structure, function and quantity of ‘complex carbohydrate’ synthesized by an organism Glycosylation  Carbohydrates added to basic protein structure - Glycosylation Folded protein structure (schematic)

3  Genome (comprised of DNA) or Proteome (proteins) are not the only factors in life functions of an organism glycosylation  Carbohydrates attached to different protein structures (by glycosylation) are important for:  Identification of foreign entities by immune system cells  Markers to accurately diagnose diseases  Regulate signaling activities N-glycosylation  Categorization of glycosylation - the way carbohydrates are attached to proteins. Example: N-glycosylation Glycosylation – why is it important?

4 N-GlycosylationProcessNGP N-Glycosylation Process (NGP) Cell Culture Glycoprotein Fraction Glycopeptides Fraction extract Separation technique I Glycopeptides Fraction n*m n Signal integration Data correlation Peptide Fraction ms datams/ms data ms peaklist ms/ms peaklist Peptide listN-dimensional array Glycopeptide identification and quantification proteolysis Separation technique II PNGase Mass spectrometry Data reduction Peptide identification binning n 1 By N-glycosylation Process, we mean the identification and quantification of glycopeptides

5  This Resource was established by the National Center for Research Resources  The aim is to develop the tools and technology to analyze glycoprotein and glycolipid expression of embryonic stem cells  Our research provides bioinformatics support for four research groups:  Embryonic Stem Cell Culture Program  Glycomic Analysis of Glycoproteins  Glycomic Analyses of Glycosphingolipids and Sphingolipids  Transcript analysis by kinetic RT-PCR NGP – part of the Bioinformatics core Integrated Technology Resource for Biomedical Glycomics

6  Unlike proteomics or genomics, high-throughput experimental protocols are still being established in Glycomics  NGP involves a multitude of heterogeneous tasks, including human-mediated tasks Web Services  NGP attempts to encapsulate particular computational steps as platform-independent, scalable and Web-accessible tools – Web Services  Enables glycobiologists to integrate automated data generation tasks with data processing tools (Web Services) end- to-end experimental lifecycle NGP – need in Glycomics

7  Extremely difficult to identify glycosylated peptide sequences using standard analytical methods consensus sequences  N-glycosylation occurs at particular sites on the protein structure – consensus sequences N-Glycosylation identification - Problems XS/TN An example glycopeptide (schematic) Peptide Glycan Consensus Sequence PNGaseF DJ Asparagine Aspartate

8 NGP - implementation  NGP,currently,implements a Web Process constituted of two Web Services:  DB Modifier NJ  DB Modifier Web Service – modifies the search database by replacing N (in consensus sequences) by J  Collator  Collator Web Service – identifies a probable N-glycosylated peptide, using three parameters:  Calculated molecular mass J  Presence of ‘J’ in a peptide sequence  MASCOT* Score assigned to a hit  NGP also involves propriety Mass Spectrometer search engine service (MASCOT*) as an intermediate task  Hence, NGP Web Process identifies probable glycosylated peptides – enabling rapid processing of data from high throughput experiment *

9 NGP – Architecture (current) ms/ms raw data PEAK LIST FILE Primary Sequence Database ModifyDB Web Service Collator Web Service MASCOT* Mass Spectrometer Search Engine Deglycosylated peptide list MASCOT* output file (contains both glycosylated and non- glycosylated peptide sequences) *

10 NGP Results  A typical MASCOT output file is about 3MB!  High-throughput experiment protocol generate thousands of such files - manual identification is not feasible q1_p1=-1 q2_p1=0, , ,2,APGVAGR,18, ,1.49, ,0,0;"gi| ":0:190:196:1 q2_p2=1, , ,2,APARGR,18, ,1.33, ,0,0;"gi| ":0:2:7:2 q2_p3=0, , ,2,APAVGGR,18, ,1.33, ,0,0;"gi| ":0:212:218:1,"gi| ":0:212:218:1 q3_p3=0, , ,4,DIIFK,12, ,25.26, ,0,0;"gi| ":0:364:368:2,"gi| ":0:328:332:2 q3_p4=0, , ,4,MPLFK,12, ,25.24, ,0,0;"gi| ":0:95:99:1,"gi| ":0:1:5:2 q3_p5=0, , ,3,NNLFK,12, ,15.34, ,0,0;"gi| ":0:539:543:1 q3_p6=0, , ,3,LDIFK,12, ,15.34, ,0,0;"gi| ":0:891:895:1 q3_p7=0, , ,3,NNIFK,12, ,15.34, ,0,0;"gi| ":0:212:216:1 q3_p8=0, , ,3,LDLFK,12, ,15.34, ,0,0;"gi| ":0:237:241:1 q3_p9=0, , ,3,EVIFK,12, ,13.61, ,0,0;"gi| ":0:67:71:1 q3_p10=0, , ,3,VELFK,12, ,13.61, ,0,0;"gi| ":0:493:497:1,"gi| ":0:99:103:1 q4_p1=-1 q5_p1=0, , ,5,DLLFR,14, ,18.41, ,0,0;"gi| ":0:84:88:1,"gi| ":0:17:21:1,"gi| ":0:647:651:1 q5_p2=0, , ,3,DLFLR,14, ,12.81, ,0,0;"gi| ":0:407:411:1,"gi| ":0:330:334:1,"gi| ":0:6:10:1 q5_p3=0, , ,3,DIFIR,14, ,12.81, ,0,0;"gi| ":0:924:928:1,"gi| ":0:1170:1174:1 q5_p4=0, , ,3,NNFIR,14, ,11.84, ,0,0;"gi| ":0:667:671:1 q5_p5=0, , ,4,IDLFR,14, ,9.98, ,0,0;"gi| ":0:602:606:1,"gi| ":0:536:540:1,"gi| ":0:646:650:1 q5_p6=0, , ,4,LDLFR,14, ,9.98, ,0,0;"gi| ":0:335:339:1 q5_p7=0, , ,4,VELFR,14, ,9.98, ,0,0;"gi| ":0:436:440:1 q5_p8=0, , ,4,LDIFR,14, ,9.98, ,0,0;"gi| ":0:2699:2703:1 q5_p9=0, , ,4,NLNFR,64, ,5.89, ,0,0;"gi| ":0:816:820:1 q5_p10=1, , ,2,NRFAR,14, ,3.37, ,0,0;"gi| ":0:97:101:1 q6_p1=0, , ,4,VSDNIK,35, ,11.27, ,0,0;"gi| ":0:935:940:1 q6_p2=0, , ,5,EGDLGGK,21, ,7.97, ,0,0;"gi| ":0:1058:1064:1 q6_p3=0, , ,5,EATVAGK,21, ,7.88, ,0,0;"gi| ":0:527:533:1 q6_p4=1, , ,3,QRMLK,14, ,7.46, ,0,0;"gi| ":0:467:471:2,"gi| ":0:638:642:2 q6_p5=0, , ,5,LSSSPGK,56, ,7.38, ,0,0;"gi| ":0:806:812:1 q6_p6=0, , ,4,WDLGGK,42, ,6.40, ,0,0;"gi| ":0:123:128:1 q6_p7=0, , ,4,QATDLK,56, ,6.21, ,0,0;"gi| ":0:451:456:1 q6_p8=1, , ,3,QTNKGK,14, ,6.03, ,0,0;"gi| ":0:85:90:1 q6_p9=1, , ,6,QMRIK,28, ,5.77, ,0,0;"gi| ":0:269:273:1,"gi| ":0:278:282:1 q6_p10=1, , ,6,QMRLK,28, ,5.77, ,0,0;"gi| ":0:300:304:1 q7_p1=0, , ,4,YDASLK,14, ,8.86, ,0,0;"gi| ":0:2761:2766:1

11  Two Ontologies developed as part of the NCRR-Glycomics project:  GlycO  GlycO: a domain Ontology embodying knowledge of the structure and metabolisms of glycans  Contains 770 classes – describe structural features of glycans  URL:  ProPreO  ProPreO: a comprehensive process Ontology modeling experimental proteomics  Contains 296 classes  Models three phases of experimental proteomics* – Separation techniques, Analytical techniques and, Data analysis  URL: NGP Web Services – Adding Semantics * (PEDRO UML schema)

12  ProPreO models the phases of proteomics experiment using five fundamental concepts:  Data  Data: (Example: a peaklist file from ms/ms raw data)  Data_processing_applications  Data_processing_applications: (Example: MASCOT* search engine)  Hardware  Hardware: embodies instrument types used in proteomics (Example: ABI_Voyager_DE_Pro_MALDI_TOF)  Parameter_list  Parameter_list: describes the different types of parameter lists associated with experimental phases  Task  Task: (Example: component separation, used in chromatography) ProPreO - Experimental Proteomics Process Ontology *

13  Formalize description and classification of Web Services using ProPreO concepts Service description using WSDL-S <wsdl:definitions targetNamespace="urn:ngp" ….. xmlns:xsd=" <schema targetNamespace="urn:ngp“ xmlns=" ….. WSDL ModifyDBWSDL-S ModifyDB <wsdl:definitions targetNamespace="urn:ngp" …… xmlns: wssem=" xmlns: ProPreO=" > <schema targetNamespace="urn:ngp" xmlns=" …… <wsdl:message name="replaceCharacterRequest" wssem:modelReference="ProPreO#peptide_sequence"> ProPreO process Ontology data sequence peptide_sequence Concepts defined in process Ontology Description of a Web Service using: Web Service Description Language

14  There are no current registries that use semantic classification of Web Services in glycoproteomics Stargate  BUDDI classification based on proteomics and glycomics classification – part of integrated glycoproteomics Web Portal called Stargate  NGP to be published in BUDDI  Can enable other systems such as my Grid to use NGP Web Services to build a glycomics workbench Biological UDDI (BUDDI) WS Registry for Proteomics and Glycomics

15  As part of NCRR Integrated Technology Resource for Biomedical Glycomics, we implemented a Semantic Web Process for high throughput glycomics in open, web-centric environment  Large domain specific ontologies with process (ProPreO) and domain (GlycO) knowledge concepts was used to describe and classify Web Services – at Semantic level  Used proposed Semantic Web Service specification (WSDL-S) to add semantics to Web Service description Stargate  Biological UDDI (BUDDI) – part of Stargate is being developed as a single-window resource to discover and publish Web Services in glycoproteomics domain Conclusions

16 Resources  NCRR (Integrated Technology Resource for Biomedical Glycomics):  Bioinformatics core of Glycomics project:  ProPreO process Ontology:  GlycO domain Ontology:  Stargate – GlycoProteomics Web Portal:  WSDL-S: joint UGA-IBM technical note

17 Acknowledgement Special Thanks: James Atwood (CCRC, UGA) Meenakshi Nagarajan (LSDIS Lab, UGA) Blake Hunter (LSDIS Lab, UGA)

18  BUDDI  BUDDI – BioUDDI is envisioned as the ‘yellow pages’ for all WS in life sciences  The classification of WS uses biological taxonomy  Open resource for the worldwide community of life sciences research  Format Converter  Format Converter – Enables conversion of two available representation formats into a xml-based representation  IUPAC to LINUCS to GLYDE (a xml-based representation)  Web Service Generator  Web Service Generator – Enables existing java application to be exposed as Web Services  Generates required files from a java application to allow deployment as a Web Service  Enable the newly generated Web Service to be published on BioUDDI Extra Slides: Stargate subsystems – a bit of detail

19  Group Forum  Group Forum – Members of the research group use it to foster a sense of community  Schedule meetings, discuss issues, collaborate on papers…  Post papers for peer reviews, publications on relevant topic  Stargate Search  Stargate Search – is an integrated unit of the Stargate  Enables search for research publication within the research group  Enables search on the internet  Login  Login – Allows restrictions on accessibility of selected parts of Stargate Extra Slides: Stargate subsystems – a bit of detail

20 Extra Slides: The take home message… InternetForum BUDDI Search Web Service Generator