Download presentation
Presentation is loading. Please wait.
Published bySebastian Sparling Modified over 9 years ago
1
Web Services for N-Glycosylation Process Integrated Technology Resource for Biomedical Glycomics NCRR/NIH Satya S. Sahoo, Amit P. Sheth, William S. York, John A. Miller Presentation at International Symposium on Web Services For Computational Biology and Bioinformatics, VBI, Blacksburg, VA, May 26-27, 2005
2
2 Glycomics Study of structure, function and quantity of ‘complex carbohydrate’ synthesized by an organism Glycosylation Carbohydrates added to basic protein structure - Glycosylation Folded protein structure (schematic)
3
3 Genome (comprised of DNA) or Proteome (proteins) are not the only factors in life functions of an organism glycosylation Carbohydrates attached to different protein structures (by glycosylation) are important for: Identification of foreign entities by immune system cells Markers to accurately diagnose diseases Regulate signaling activities N-glycosylation Categorization of glycosylation - the way carbohydrates are attached to proteins. Example: N-glycosylation Glycosylation – why is it important?
4
4 N-GlycosylationProcessNGP N-Glycosylation Process (NGP) Cell Culture Glycoprotein Fraction Glycopeptides Fraction extract Separation technique I Glycopeptides Fraction n*m n Signal integration Data correlation Peptide Fraction ms datams/ms data ms peaklist ms/ms peaklist Peptide listN-dimensional array Glycopeptide identification and quantification proteolysis Separation technique II PNGase Mass spectrometry Data reduction Peptide identification binning n 1 By N-glycosylation Process, we mean the identification and quantification of glycopeptides
5
5 This Resource was established by the National Center for Research Resources The aim is to develop the tools and technology to analyze glycoprotein and glycolipid expression of embryonic stem cells Our research provides bioinformatics support for four research groups: Embryonic Stem Cell Culture Program Glycomic Analysis of Glycoproteins Glycomic Analyses of Glycosphingolipids and Sphingolipids Transcript analysis by kinetic RT-PCR NGP – part of the Bioinformatics core Integrated Technology Resource for Biomedical Glycomics
6
6 Unlike proteomics or genomics, high-throughput experimental protocols are still being established in Glycomics NGP involves a multitude of heterogeneous tasks, including human-mediated tasks Web Services NGP attempts to encapsulate particular computational steps as platform-independent, scalable and Web-accessible tools – Web Services Enables glycobiologists to integrate automated data generation tasks with data processing tools (Web Services) end- to-end experimental lifecycle NGP – need in Glycomics
7
7 Extremely difficult to identify glycosylated peptide sequences using standard analytical methods consensus sequences N-glycosylation occurs at particular sites on the protein structure – consensus sequences N-Glycosylation identification - Problems XS/TN An example glycopeptide (schematic) Peptide Glycan Consensus Sequence PNGaseF DJ Asparagine Aspartate
8
8 NGP - implementation NGP,currently,implements a Web Process constituted of two Web Services: DB Modifier NJ DB Modifier Web Service – modifies the search database by replacing N (in consensus sequences) by J Collator Collator Web Service – identifies a probable N-glycosylated peptide, using three parameters: Calculated molecular mass J Presence of ‘J’ in a peptide sequence MASCOT* Score assigned to a hit NGP also involves propriety Mass Spectrometer search engine service (MASCOT*) as an intermediate task Hence, NGP Web Process identifies probable glycosylated peptides – enabling rapid processing of data from high throughput experiment *http://www.matrixscience.com/
9
9 NGP – Architecture (current) ms/ms raw data PEAK LIST FILE Primary Sequence Database ModifyDB Web Service Collator Web Service MASCOT* Mass Spectrometer Search Engine Deglycosylated peptide list MASCOT* output file (contains both glycosylated and non- glycosylated peptide sequences) *http://www.matrixscience.com/
10
10 NGP Results A typical MASCOT output file is about 3MB! High-throughput experiment protocol generate thousands of such files - manual identification is not feasible q1_p1=-1 q2_p1=0,626.349945,-0.023321,2,APGVAGR,18,000000000,1.49,00020000000000000,0,0;"gi|51465537":0:190:196:1 q2_p2=1,626.361191,-0.034567,2,APARGR,18,00000000,1.33,00020000000000000,0,0;"gi|10140845":0:2:7:2 q2_p3=0,626.349945,-0.023321,2,APAVGGR,18,000000000,1.33,00020000000000000,0,0;"gi|51470766":0:212:218:1,"gi|51470768":0:212:218:1 q3_p3=0,634.368973,0.006151,4,DIIFK,12,0000000,25.26,00010020000000000,0,0;"gi|47078238":0:364:368:2,"gi|47078240":0:328:332:2 q3_p4=0,634.351227,0.023897,4,MPLFK,12,0000000,25.24,00010020000000000,0,0;"gi|41197108":0:95:99:1,"gi|4557311":0:1:5:2 q3_p5=0,634.343811,0.031313,3,NNLFK,12,0000000,15.34,00010020000000000,0,0;"gi|31377725":0:539:543:1 q3_p6=0,634.368973,0.006151,3,LDIFK,12,0000000,15.34,00010020000000000,0,0;"gi|39725634":0:891:895:1 q3_p7=0,634.343811,0.031313,3,NNIFK,12,0000000,15.34,00010020000000000,0,0;"gi|7661646":0:212:216:1 q3_p8=0,634.368973,0.006151,3,LDLFK,12,0000000,15.34,00010020000000000,0,0;"gi|51474898":0:237:241:1 q3_p9=0,634.368958,0.006166,3,EVIFK,12,0000000,13.61,00010020000000000,0,0;"gi|28376662":0:67:71:1 q3_p10=0,634.368958,0.006166,3,VELFK,12,0000000,13.61,00010020000000000,0,0;"gi|51467300":0:493:497:1,"gi|51467535":0:99:103:1 q4_p1=-1 q5_p1=0,662.375122,0.004702,5,DLLFR,14,0000000,18.41,00020020000000000,0,0;"gi|21536369":0:84:88:1,"gi|21536367":0:17:21:1,"gi|4557871":0:647:651:1 q5_p2=0,662.375122,0.004702,3,DLFLR,14,0000000,12.81,00010020000000000,0,0;"gi|33695153":0:407:411:1,"gi|4504043":0:330:334:1,"gi|11968045":0:6:10:1 q5_p3=0,662.375122,0.004702,3,DIFIR,14,0000000,12.81,00010020000000000,0,0;"gi|4505725":0:924:928:1,"gi|29788751":0:1170:1174:1 q5_p4=0,662.349960,0.029864,3,NNFIR,14,0000000,11.84,00010020000000000,0,0;"gi|24416002":0:667:671:1 q5_p5=0,662.375122,0.004702,4,IDLFR,14,0000000,9.98,00020020000000000,0,0;"gi|12957488":0:602:606:1,"gi|41148707":0:536:540:1,"gi|51464463":0:646:650:1 q5_p6=0,662.375122,0.004702,4,LDLFR,14,0000000,9.98,00020020000000000,0,0;"gi|42657517":0:335:339:1 q5_p7=0,662.375107,0.004717,4,VELFR,14,0000000,9.98,00020020000000000,0,0;"gi|6912230":0:436:440:1 q5_p8=0,662.375122,0.004702,4,LDIFR,14,0000000,9.98,00020020000000000,0,0;"gi|8922081":0:2699:2703:1 q5_p9=0,662.349960,0.029864,4,NLNFR,64,0000000,5.89,00010020000000000,0,0;"gi|19923416":0:816:820:1 q5_p10=1,662.361191,0.018633,2,NRFAR,14,0000000,3.37,00010020000000000,0,0;"gi|4758704":0:97:101:1 q6_p1=0,674.359863,-0.006639,4,VSDNIK,35,00000000,11.27,00010020000000000,0,0;"gi|32130516":0:935:940:1 q6_p2=0,674.323456,0.029768,5,EGDLGGK,21,000000000,7.97,00020020000000000,0,0;"gi|13569928":0:1058:1064:1 q6_p3=0,674.359848,-0.006624,5,EATVAGK,21,000000000,7.88,00020020000000000,0,0;"gi|51475822":0:527:533:1 q6_p4=1,674.389740,-0.036516,3,QRMLK,14,0000000,7.46,00020010000000000,0,0;"gi|24307905":0:467:471:2,"gi|24307905":0:638:642:2 q6_p5=0,674.359863,-0.006639,5,LSSSPGK,56,000000000,7.38,00000020000000000,0,0;"gi|8922075":0:806:812:1 q6_p6=0,674.338730,0.014494,4,WDLGGK,42,00000000,6.40,00010020000000000,0,0;"gi|13375817":0:123:128:1 q6_p7=0,674.359879,-0.006655,4,QATDLK,56,00000000,6.21,00020010000000000,0,0;"gi|21361684":0:451:456:1 q6_p8=1,674.371094,-0.017870,3,QTNKGK,14,00000000,6.03,00020010000000000,0,0;"gi|41117716":0:85:90:1 q6_p9=1,674.389740,-0.036516,6,QMRIK,28,0000000,5.77,00020020000000000,0,0;"gi|28329439":0:269:273:1,"gi|28558993":0:278:282:1 q6_p10=1,674.389740,-0.036516,6,QMRLK,28,0000000,5.77,00020020000000000,0,0;"gi|40255096":0:300:304:1 q7_p1=0,695.348969,0.007855,4,YDASLK,14,00000000,8.86,00020020000000000,0,0;"gi|4758454":0:2761:2766:1
11
11 Two Ontologies developed as part of the NCRR-Glycomics project: GlycO GlycO: a domain Ontology embodying knowledge of the structure and metabolisms of glycans Contains 770 classes – describe structural features of glycans URL: http://lsdis.cs.uga.edu/projects/glycomics/glycohttp://lsdis.cs.uga.edu/projects/glycomics/glyco ProPreO ProPreO: a comprehensive process Ontology modeling experimental proteomics Contains 296 classes Models three phases of experimental proteomics* – Separation techniques, Analytical techniques and, Data analysis URL: http://lsdis.cs.uga.edu/projects/glycomics/propreohttp://lsdis.cs.uga.edu/projects/glycomics/propreo NGP Web Services – Adding Semantics *http://pedro.man.ac.uk/uml.html (PEDRO UML schema)
12
12 ProPreO models the phases of proteomics experiment using five fundamental concepts: Data Data: (Example: a peaklist file from ms/ms raw data) Data_processing_applications Data_processing_applications: (Example: MASCOT* search engine) Hardware Hardware: embodies instrument types used in proteomics (Example: ABI_Voyager_DE_Pro_MALDI_TOF) Parameter_list Parameter_list: describes the different types of parameter lists associated with experimental phases Task Task: (Example: component separation, used in chromatography) ProPreO - Experimental Proteomics Process Ontology *http://www.matrixscience.com/
13
13 Formalize description and classification of Web Services using ProPreO concepts Service description using WSDL-S <wsdl:definitions targetNamespace="urn:ngp" ….. xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <schema targetNamespace="urn:ngp“ xmlns="http://www.w3.org/2001/XMLSchema"> ….. WSDL ModifyDBWSDL-S ModifyDB <wsdl:definitions targetNamespace="urn:ngp" …… xmlns: wssem="http://www.ibm.com/xmlns/WebServices/WSSemantics" xmlns: ProPreO="http://lsdis.cs.uga.edu/ontologies/ProPreO.owl" > <schema targetNamespace="urn:ngp" xmlns="http://www.w3.org/2001/XMLSchema"> …… <wsdl:message name="replaceCharacterRequest" wssem:modelReference="ProPreO#peptide_sequence"> ProPreO process Ontology data sequence peptide_sequence Concepts defined in process Ontology Description of a Web Service using: Web Service Description Language
14
14 There are no current registries that use semantic classification of Web Services in glycoproteomics Stargate BUDDI classification based on proteomics and glycomics classification – part of integrated glycoproteomics Web Portal called Stargate NGP to be published in BUDDI Can enable other systems such as my Grid to use NGP Web Services to build a glycomics workbench Biological UDDI (BUDDI) WS Registry for Proteomics and Glycomics
15
15 As part of NCRR Integrated Technology Resource for Biomedical Glycomics, we implemented a Semantic Web Process for high throughput glycomics in open, web-centric environment Large domain specific ontologies with process (ProPreO) and domain (GlycO) knowledge concepts was used to describe and classify Web Services – at Semantic level Used proposed Semantic Web Service specification (WSDL-S) to add semantics to Web Service description Stargate Biological UDDI (BUDDI) – part of Stargate is being developed as a single-window resource to discover and publish Web Services in glycoproteomics domain Conclusions
16
16 Resources NCRR (Integrated Technology Resource for Biomedical Glycomics): http://cell.ccrc.uga.edu/world/glycomics/glycomics.php http://cell.ccrc.uga.edu/world/glycomics/glycomics.php Bioinformatics core of Glycomics project: http://lsdis.cs.uga.edu/projects/glycomics/ http://lsdis.cs.uga.edu/projects/glycomics/ ProPreO process Ontology: http://lsdis.cs.uga.edu/projects/glycomics/propreo/ http://lsdis.cs.uga.edu/projects/glycomics/propreo/ GlycO domain Ontology: http://lsdis.cs.uga.edu/projects/glycomics/glyco/ Stargate – GlycoProteomics Web Portal: http://128.192.9.86/stargate WSDL-S: joint UGA-IBM technical note http://lsdis.cs.uga.edu/library/download/WSDL-S-V1.pdf
17
17 Acknowledgement Special Thanks: James Atwood (CCRC, UGA) Meenakshi Nagarajan (LSDIS Lab, UGA) Blake Hunter (LSDIS Lab, UGA)
18
18 BUDDI BUDDI – BioUDDI is envisioned as the ‘yellow pages’ for all WS in life sciences The classification of WS uses biological taxonomy Open resource for the worldwide community of life sciences research Format Converter Format Converter – Enables conversion of two available representation formats into a xml-based representation IUPAC to LINUCS to GLYDE (a xml-based representation) Web Service Generator Web Service Generator – Enables existing java application to be exposed as Web Services Generates required files from a java application to allow deployment as a Web Service Enable the newly generated Web Service to be published on BioUDDI Extra Slides: Stargate subsystems – a bit of detail
19
19 Group Forum Group Forum – Members of the research group use it to foster a sense of community Schedule meetings, discuss issues, collaborate on papers… Post papers for peer reviews, publications on relevant topic Stargate Search Stargate Search – is an integrated unit of the Stargate Enables search for research publication within the research group Enables search on the internet Login Login – Allows restrictions on accessibility of selected parts of Stargate Extra Slides: Stargate subsystems – a bit of detail
20
20 Extra Slides: The take home message… InternetForum BUDDI Search Web Service Generator
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.