Applying Semantic Technologies to the Glycoproteomics Domain W. S York May 15, 2006.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Knowledge Modeling and its Application in Life Sciences: A Tale of two ontologies Bioinformatics for Glycan Expression Integrated Technology Resource for.
Semantic empowerment of Health Care and Life Science Applications WWW 2006 W3C Track, May WWW 2006 W3C Track, May Amit Sheth LSDIS LabLSDIS.
Using Ontology Reasoning to Classify Protein Phosphatases K.Wolstencroft, P.Lord, L.tabernero, A.brass, R.stevens University of Manchester.
Using DAML format for representation and integration of complex gene networks: implications in novel drug discovery K. Baclawski Northeastern University.
RDB2RDF: Incorporating Domain Semantics in Structured Data Satya S. Sahoo Kno.e.sis CenterKno.e.sis Center, Computer Science and Engineering Department,
IPY and Semantics Siri Jodha S. Khalsa Paul Cooper Peter Pulsifer Paul Overduin Eugeny Vyazilov Heather lane.
Web Services for N-Glycosylation Process Integrated Technology Resource for Biomedical Glycomics NCRR/NIH Satya S. Sahoo, Amit P. Sheth, William S. York,
Semantic Web & Semantic Web Services: Applications in Healthcare and Scientific Research International IFIP Conference on Applications of Semantic Web.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques.
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Ontology Notes are from:
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
Semantic Web Technology in Support of Bioinformatics for Glycan Expression Amit Sheth Large Scale Distributed Information Systems (LSDIS) lab, Univ. of.
Introduction to Protégé AmphibiaTree 2006 Workshop Sunday 8:45–9:15 J. Leopold & A. Maglia.
Semantics powered Bioinformatics Amit Sheth, William S. York, et al Large Scale Distributed Information Systems Lab & Complex Carbohydrate Research Center.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
BioText Infrastructure Ariel Schwartz Gaurav Bhalotia 10/07/2002.
The bioinformatics of biological processes The challenge of temporal data Per J. Kraulis CMCM, Tartu University.
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Semantic Web applications in Financial Industry, Government, Health care and Life Sciences SWEG 2006, March 2006 Amit Sheth LSDIS Lab, Department of Computer.
Knowledge Enabled Information and Services Science GlycO.
Automated Explanation of Gene-Gene Relationships Wacek Kuśnierczyk.
EUROCarbDB CCRC – Database for high quality mass spectrometry data Khalifeh Al Jadda 1, Haseeb Yousef 1, Kitae Myong 1, Srikalyan Swayampakula 1, David.
Semantics Enabled Industrial and Scientific Applications: Research, Technology and Deployed Applications Part III: Biological Applications Keynote - the.
Semantics in the Semantic Web– the implicit, the formal and the powerful (with a few examples from Glycomics) Amit Sheth Large Scale Distributed Information.
Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.
Helping scientists collaborate BioCAD. ©2003 All Rights Reserved.
Common parameters At the beginning one need to set up the parameters.
Teranode Tools and Platform for Pathway Analysis Michael Kellen, Solution Manager June 16, 2006.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Semantic empowerment of Life Science Applications October 2006 Amit Sheth LSDIS Lab, Department of Computer Science, University of Georgia Acknowledgement:
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Knowledge Enabled Information and Services Science SAWSDL: Tools and Applications Amit P. Sheth Kno.e.sis Center Wright State University, Dayton, OH Knoesis.wright.edu.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
Knowledge Enabled Information and Services Science Glycomics project overview.
Overview of Bioinformatics 1 Module Denis Manley..
OntoQA: Metric-Based Ontology Quality Analysis Samir Tartir, I. Budak Arpinar, Michael Moore, Amit P. Sheth, Boanerges Aleman-Meza IEEE Workshop on Knowledge.
Proteomics databases for comparative studies: Transactional and Data Warehouse approaches Patricia Rodriguez-Tomé, Nicolas Pinaud, Thomas Kowall GeneProt,
Glycan database. Database of molecules Two models (of vocabularies) – Proteins / Nucleic Acids Residues (+ modifications) Genbank / Swissprot – Compounds.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Mining the Biomedical Research Literature Ken Baclawski.
Bioinformatics Research Overview Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for.
Proposed Research Problem Solving Environment for T. cruzi Intuitive querying of multiple sets of heterogeneous databases Formulate scientific workflows.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Japan Consortium for Glycobiology and Glycotechnology DataBase 日本糖鎖科学統合データベース PACDB - Pathogen Adherence to Carbohydrate Database The Pathogen Adherence.
High throughput biology data management and data intensive computing drivers George Michaels.
LSDIS Lab, Department of Computer Science,
Semantic Visualization
knowledge organization for a food secure world
Amit Sheth LSDIS Lab & Semagix University of Georgia
Accelerating Research in Life Sciences
Collaborative RO1 with NCBO
Tryptic glycopeptides of IGFBP-5 from T47D cells separated by HPLC detected by ESI-MS and sequenced by tandem MS.a, ESI-MS spectrum of combined fractions.
Presentation transcript:

Applying Semantic Technologies to the Glycoproteomics Domain W. S York May 15, 2006

Some Goals of Glycoproteomics How do changes in the expression levels of specific genes alter the expression of specific glycans on the cell surface? Are changes in the expression of specific glycans at the cell surface related to cell function, cell development, and disease? What are the mechanisms by which specific glycans at the cell surface affect cell function, cell development, and the progression of disease?

Challenges of Glycoproteomics Vast amounts of data collected by high- throughput experiments - better methods for data archival, retrieval, and analysis are needed Complex structures of glycans and glycoproteins – better methods for representing branched structures and finding structural and functional homologies are needed Complex Biology and Biochemistry – better methods to find relationships between the glycoproteome and biological processes are needed

Glycoproteomics Solutions Brute-force analysis of flat data files Too much data Data is heterogeneous What does the data represent? Relational databases Data is well organized Data organization is relatively rigid What does the data represent? Semantic Technologies Data is well organized Data organization is flexible Concepts represented by data are accessible Relationships between concepts are accessible

What is Semantic Technology? Semantics: 1. (Linguistics) The study or science of meaning in language. 2. (Linguistics) The study of relationships between signs and symbols and what they represent. The American Heritage® Dictionary of the English Language, Fourth Edition Semantic Technology: The use of formal representations of concepts and their relationships to enable efficient, intelligent software. Ontology (Computer Science): A model that represents a domain and is used to reason about the objects in that domain and the relations between them. The implication is that enabling computers to “understand” the meanings of and relationships between concepts will allow them to reason and communicate in a way that is analogous to the way humans do.

A Simple Ontology Organism Animal Plant LionDeerCowHostaAlfalfa Elsa Simba ElsieBambiMy Hosta Peter’s Alfalfa is_a ate

A Simple Ontology Organism Animal Plant LionDeerCowHostaAlfalfa Elsa Simba ElsieBambiMy Hosta Peter’s Alfalfa is_a ate CarnivoreHerbivore is_a eats

chemical entity residue molecule is_a amino acid residue is_a molecular fragment The Structure of GlycO – Concept Taxonomy carbohydrate moiety is_a carbohydrate residue is_a monoglycosyl moiety glycan moiety N-glycan is_a O-glycan

residue amino acid residue is_a carbohydrate residue glycan moiety N-glycan O-glycan is_a – Concept TaxonomyThe Structure of GlycO

residue amino acid residue is_a carbohydrate residue glycan moiety N-glycan O-glycan is_a N-glycan core b- D -Manp is_instance_of N-glycan_00020 is_instance_of has_residue is_instance_of N-glycan a- D -Manp 4 is_linked_to – Concept Taxonomy– Instances and Properties The Structure of GlycO

The GlycO Ontology in Protégé 3 Top-Level Classes are Defined in GlycO

The GlycO Ontology in Protégé Semantics Include Chemical Context This Class Inherits from 2 Parents

The GlycO Ontology in Protégé The  -D-Manp residues in N-glycans are found in 8 different chemical environments

GlycoTree – A Canonical Representation of N-Glycans N. Takahashi and K. Kato, Trends in Glycosciences and Glycotechnology, 15:  - D -GlcpNAc  - D -Manp -(1-4)-  - D -Manp -(1-6)+  - D -GlcpNAc -(1-2)-  - D -Manp -(1-3)+  - D -GlcpNAc -(1-4)-  - D -GlcpNAc -(1-2)+  - D -GlcpNAc -(1-6)+ We give a residue in this position the same name, regardless of the specific structure it resides in Semantics!

The GlycO Ontology in Protégé Bisecting  -D-GlcpNAc

The GlycO Ontology in Protégé

1,3-linked  -L-Fucp

The GlycO Ontology in Protégé

Ontology Population Workflow

[][Asn]{[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-Manp] {[(3+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc] {}[(4+1)][b-D-GlcpNAc] {}}[(6+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc]{}}}}}} Ontology Population Workflow

Ontology Population Workflow

The ProPreO Ontology in Protégé 3 Top-Level Classes are Defined in ProPreO

The ProPreO Ontology in Protégé This Class Inherits from 2 Parents

The ProPreO Ontology in Protégé This Class Inherits from 2 Parents

parent ion m/z fragment ion m/z ms/ms peaklist data fragment ion abundance parent ion abundance parent ion charge Semantic Annotation of MS Data

<parameter instrument=micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer mode = “ms/ms”/> Ontological Concepts Semantically Annotated MS Data

Web Services Based Workflow for Proteomics 1 Storage Standard Format Data Raw Data Filtered Data Search Results Final Output Agent Biological Sample Analysis by MS/MS Raw Data to Standard Format Data Pre- process 2 DB Search (Mascot/ Sequest) Results Post- process (ProValt 3 ) OIOIOIOIO Biological Information 1 Design and Implementation of Web Services based Workflow for proteomics. Journal of Proteome Research. Submitted 2 Computational tools for increasing confidence in protein identifications. Association of Biomolecular Resource Facilities Annual Meeting, Portland, OR, A Heuristic method for assigning a false-discovery rate for protein identifications from Mascot database search results. Mol. Cell. Proteomics. 4(6),

An Integrated Semantic Information System Formalized domain knowledge is in ontologies The schema defines the concepts Instances represent individual objects Relationships provide expressiveness Data is annotated using concepts from the ontologies The semantic annotations facilitate the identification and extraction of relevant information The semantic relationships allow knowledge that is implicit in the data to be discovered

Satya Sahoo Christopher Thomas Cory Henson Ravi Pavagada Amit Sheth Krzysztof Kochut John Miller James Atwood Lin Alison Nairn Gerardo Alvarez-Manilla Saeed Roushanzamir Michael Pierce Ron Orlando Kelley Moremen Parastoo Azadi Alfred Merrill