Bioinformatics Research Overview Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Knowledge Modeling and its Application in Life Sciences: A Tale of two ontologies Bioinformatics for Glycan Expression Integrated Technology Resource for.
Semantic empowerment of Health Care and Life Science Applications WWW 2006 W3C Track, May WWW 2006 W3C Track, May Amit Sheth LSDIS LabLSDIS.
Web Services for N-Glycosylation Process Integrated Technology Resource for Biomedical Glycomics NCRR/NIH Satya S. Sahoo, Amit P. Sheth, William S. York,
Semantic Web & Semantic Web Services: Applications in Healthcare and Scientific Research International IFIP Conference on Applications of Semantic Web.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Semantic Web: promising technologies and current applications in Health care & Life Sciences Amit Sheth Thanks: Kno.e.sis team, collaborators at CCRC,
Data Management in the DOE Genomics:GTL Program Janet Jacobsen and Adam Arkin Lawrence Berkeley National Laboratory University of California, Berkeley.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Semantic Web Technology in Support of Bioinformatics for Glycan Expression Amit Sheth Large Scale Distributed Information Systems (LSDIS) lab, Univ. of.
Semantics powered Bioinformatics Amit Sheth, William S. York, et al Large Scale Distributed Information Systems Lab & Complex Carbohydrate Research Center.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Scaffold Download free viewer:
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
Review of Ondex Bernice Rogowitz G2P Visualization and Visual Analytics Team March 18, 2010.
Semantic Web applications in Financial Industry, Government, Health care and Life Sciences SWEG 2006, March 2006 Amit Sheth LSDIS Lab, Department of Computer.
Knowledge Enabled Information and Services Science GlycO.
EUROCarbDB CCRC – Database for high quality mass spectrometry data Khalifeh Al Jadda 1, Haseeb Yousef 1, Kitae Myong 1, Srikalyan Swayampakula 1, David.
Kno.e.sis Center, Wright State University,
Semantics Enabled Industrial and Scientific Applications: Research, Technology and Deployed Applications Part III: Biological Applications Keynote - the.
The dynamic nature of the proteome
Rahul Raman, Ram Sasisekharan Bioinformatics Core Massachusetts Institute of Technology Glue Grants Bioinformatics Meeting April 22-23, 2004 San Diego,
IPAW'08 – Salt Lake City, Utah, June 2008 Exploiting provenance to make sense of automated decisions in scientific workflows Paolo Missier, Suzanne Embury,
Helping scientists collaborate BioCAD. ©2003 All Rights Reserved.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Laxman Yetukuri T : Modeling of Proteomics Data
Semantic empowerment of Life Science Applications October 2006 Amit Sheth LSDIS Lab, Department of Computer Science, University of Georgia Acknowledgement:
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Knowledge Enabled Information and Services Science SAWSDL: Tools and Applications Amit P. Sheth Kno.e.sis Center Wright State University, Dayton, OH Knoesis.wright.edu.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Knowledge Enabled Information and Services Science Glycomics project overview.
From Domain Ontologies to Modeling Ontologies to Executable Simulation Models Gregory A. Silver Osama M. Al-Haj Hassan John A. Miller University of Georgia.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.
Proteomics databases for comparative studies: Transactional and Data Warehouse approaches Patricia Rodriguez-Tomé, Nicolas Pinaud, Thomas Kowall GeneProt,
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
Applying Semantic Technologies to the Glycoproteomics Domain W. S York May 15, 2006.
Japan Consortium for Glycobiology and Glycotechnology DataBase 日本糖鎖科学統合データベース GDGDB - Glyco-Disease Genes Database The complexity of glycan metabolic pathways.
Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.
Overview of Mass Spectrometry
Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.
Proposed Research Problem Solving Environment for T. cruzi Intuitive querying of multiple sets of heterogeneous databases Formulate scientific workflows.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Interface for Glyco Vault Functionality and requirements. Initial proposal. Maciej Janik.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Copyright OpenHelix. No use or reproduction without express written consent1 1.
An Ontological Approach to Financial Analysis and Monitoring.
Japan Consortium for Glycobiology and Glycotechnology DataBase 日本糖鎖科学統合データベース PACDB - Pathogen Adherence to Carbohydrate Database The Pathogen Adherence.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
High throughput biology data management and data intensive computing drivers George Michaels.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Organellar Proteomics: Turning Inventories into Insights
Databases, Ontologies and Text mining Session Introduction Part 2
LSDIS Lab, Department of Computer Science,
Semantic Visualization
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Amit Sheth LSDIS Lab & Semagix University of Georgia
Accelerating Research in Life Sciences
Collaborative RO1 with NCBO
Accelerating Research in Life Sciences
Presentation transcript:

Bioinformatics Research Overview

Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for Semantic Provenance Annotation Biological Services Registry Demo of User Interface – T.cruzi Knowledge Base

GlycO is a focused ontology for the description of glycomics models the biosynthesis, metabolism, and biological relevance of complex glycans models complex carbohydrates as sets of simpler structures that are connected with rich relationships An ontology for structure and function of Glycopeptides Published through the National Center for Biomedical Ontology (NCBO) and Open Biomedical Ontologies (OBO) See:GlycoDoc, GlycOGlycoDocGlycO

The enzyme ontology EnzyO is highly intertwined with GlycO. While it’s structure is mostly that of a taxonomy, it is highly restricted at the class level and hence allows for comfortable classification of enzyme instances from multiple organisms GlycO together with EnzyO contain all the information that is needed for the description of Metabolic pathways –e.g. N-Glycan Biosynthesis EnzyO

GlycO Challenge – model hundreds of thousands of complex carbohydrate entities But, the differences between the entities are small (E.g. just one component) How to model all the concepts but preclude redundancy → ensure maintainability, scalability

GlycO population Assumption: with a large body of background knowledge, learning and extraction techniques can be used to assert facts. Asserted facts are compositions of individual building blocks Because the building blocks are richly described, the extracted larger structures will be of high quality

GlycO Population Multiple data sources used in populating the ontology oKEGG - Kyoto Encyclopedia of Genes and Genomes oSWEETDB oCARBBANK Database Each data source has a different schema for storing data There is significant overlap of instances in the data sources Hence, entity disambiguation and a common representational format are needed

Diverse Data From Multiple Sources Assures Quality Democratic principle Some sources can be wrong, but not all will be More likely to have homogeneity in correct data than in erroneous data

Ontology population workflow

[][Asn]{[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-Manp] {[(3+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc] {}[(4+1)][b-D-GlcpNAc] {}}[(6+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc]{}}}}}} Ontology population workflow

Ontology population workflow

Diverse Data From Multiple Sources Assures Quality Holds only, when the data in each source is independent In the case of GlycO, the sources that were meant to assure quality were not diverse. One original source (Carbbank) was copied by several Databases without curation Errors in the original propagated Errors in KEGG and Carbbank are the same Cannot use these sources for comparison Needs curation by the expert community ?

N. Takahashi and K. Kato, Trends in Glycosciences and Glycotechnology, 15:  - D -GlcpNAc  - D -Manp -(1-4)-  - D -Manp -(1-6)+  - D -GlcpNAc -(1-2)-  - D -Manp -(1-3)+  - D -GlcpNAc -(1-4)-  - D -GlcpNAc -(1-2)+ GlycoTree

Pathway representation in Ontology Evidence for this reaction from three experiments Pathway visualization tool by M. Eavenson and M. Janik, LSDIS Lab, Univ. of Georgia

Pathway Steps - Reaction Evidence for this reaction from three experiments Pathway visualization tool by M. Eavenson and M. Janik, LSDIS Lab, Univ. of Georgia

Pathway Steps - Glycan Pathway visualization tool by M. Eavenson and M. Janik, LSDIS Lab, Univ. of Georgia Abundance of this glycan in three experiments

An ontology for capturing process and lifecycle information related to proteomic experiments Two aspects of glycoproteomics: What is it? → identification How much of it is there? → quantification Heterogeneity in data generation process, instrumental parameters, formats Need data and process provenance → ontology-mediated provenance Hence, ProPreO models both the glycoproteomics experimental process and attendant data Published through the National Center for Biomedical Ontology (NCBO) and Open Biomedical Ontologies (OBO) ProPreO ontology

N-GlycosylationProcessNGP N-Glycosylation Process (NGP) Cell Culture Glycoprotein Fraction Glycopeptides Fraction extract Separation technique I Glycopeptides Fraction n*m n Signal integration Data correlation Peptide Fraction ms datams/ms data ms peaklist ms/ms peaklist Peptide listN-dimensional array Glycopeptide identification and quantification proteolysis Separation technique II PNGase Mass spectrometry Data reduction Peptide identification binning n 1

Workflow based on Web Services = Web Process

parent ion m/z fragment ion m/z ms/ms peaklist data fragment ion abundance parent ion abundance parent ion charge ProPreO: Ontology-mediated provenance Mass Spectrometry (MS) Data

<parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer” mode=“ms-ms”/> Ontological Concepts ProPreO: Ontology-mediated provenance Semantically Annotated MS Data

Semantic Web Process to incorporate provenance Storage Standard Format Data Raw Data Filtered Data Search Results Final Output Agent Biological Sample Analysis by MS/MS Raw Data to Standard Format Data Pre- process DB Search (Mascot/ Sequest) Results Post- process (ProValt) OIOIOIOIO Biological Information Semantic Annotation Applications ISiS – Integrated Semantic Information and Knowledge System

Raw2mzXMLmzXML2PklPkl2pSplitMASCOT SearchProVault Raw mzXMLPklpSplit MACOT result ProVault result Experimental Data Semantic Annotation Metadata File SPARQL query-based User Interface Semantic Metadata Registry PROTEOMECOMMONS PROTEOMICS WORKFLOW Integrated Semantic Information and knowledge System (Isis) ProPreO ontology EXPERIMENTAL DATA Have I performed an error? Give me all result files from a similar organism, cell, preparation, mass spectrometric conditions and compare results. Is the result erroneous? Give me all result files from a similar organism, cell, preparation, mass spectrometric conditions and compare results.

Evaluate the specific effects of changing a biological parameter: Retrieve abundance data for a given protein expressed by three different cell types of a specific organism. Retrieve raw data supporting a structural assignment: Find all the raw ms data files that contain the spectrum of a given peptide sequence having a specific modification and charge state. Detect errors: Find and compare all peptide lists identified in Mascot output files obtained using a similar organism, cell-type, sample preparation protocol, and mass spectrometry conditions. ProPreO concepts highlighted in red A Web Service Must Be Invoked Semantic Annotation Facilitates Complex Queries

Data Knowledge Base (ProPreO and GlycO Ontology) Data Generation Browsing & Querying Annotation WWW ISiS GlycoVault API

Semantic Biological Web Services Registry

Data, ontologies, more publications at Biomedical Glycomics project web site: Thank You