Semantics powered Bioinformatics Amit Sheth, William S. York, et al Large Scale Distributed Information Systems Lab & Complex Carbohydrate Research Center.

Slides:



Advertisements
Similar presentations
Bioinformatics Platform Three-tier Architecture Object-based Relational Database implemented using Oracle Middleware implemented using Entity-Class Operations,
Advertisements

Knowledge Modeling and its Application in Life Sciences: A Tale of two ontologies Bioinformatics for Glycan Expression Integrated Technology Resource for.
Semantic empowerment of Health Care and Life Science Applications WWW 2006 W3C Track, May WWW 2006 W3C Track, May Amit Sheth LSDIS LabLSDIS.
RDB2RDF: Incorporating Domain Semantics in Structured Data Satya S. Sahoo Kno.e.sis CenterKno.e.sis Center, Computer Science and Engineering Department,
Web Services for N-Glycosylation Process Integrated Technology Resource for Biomedical Glycomics NCRR/NIH Satya S. Sahoo, Amit P. Sheth, William S. York,
Semantic Web & Semantic Web Services: Applications in Healthcare and Scientific Research International IFIP Conference on Applications of Semantic Web.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Bioinformatics Core (B) Progress and Future Goals
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Semantic Web Technology in Support of Bioinformatics for Glycan Expression Amit Sheth Large Scale Distributed Information Systems (LSDIS) lab, Univ. of.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Scaffold Download free viewer:
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)
Scientific Workflows Scientific workflows describe structured activities arising in scientific problem-solving. Conducting experiments involve complex.
Semantic Web applications in Financial Industry, Government, Health care and Life Sciences SWEG 2006, March 2006 Amit Sheth LSDIS Lab, Department of Computer.
Knowledge Enabled Information and Services Science GlycO.
Automated Explanation of Gene-Gene Relationships Wacek Kuśnierczyk.
EUROCarbDB CCRC – Database for high quality mass spectrometry data Khalifeh Al Jadda 1, Haseeb Yousef 1, Kitae Myong 1, Srikalyan Swayampakula 1, David.
Managing Information Quality in e-Science using Semantic Web technology Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University.
Semantics Enabled Industrial and Scientific Applications: Research, Technology and Deployed Applications Part III: Biological Applications Keynote - the.
Towards the Management of Information Quality in Proteomics David Stead University of Aberdeen.
Semantics in the Semantic Web– the implicit, the formal and the powerful (with a few examples from Glycomics) Amit Sheth Large Scale Distributed Information.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Rahul Raman, Ram Sasisekharan Bioinformatics Core Massachusetts Institute of Technology Glue Grants Bioinformatics Meeting April 22-23, 2004 San Diego,
Ontology Summit2007 Survey Response Analysis -- Issues Ken Baclawski Northeastern University.
Teranode Tools and Platform for Pathway Analysis Michael Kellen, Solution Manager June 16, 2006.
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
Semantic Web for Life Sciences Workshop Session VII: Semantic Aggregation, Integration, and Inference Moderator: Joanne Luciano October, Cambridge,
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Semantic empowerment of Life Science Applications October 2006 Amit Sheth LSDIS Lab, Department of Computer Science, University of Georgia Acknowledgement:
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Knowledge Enabled Information and Services Science SAWSDL: Tools and Applications Amit P. Sheth Kno.e.sis Center Wright State University, Dayton, OH Knoesis.wright.edu.
Knowledge Enabled Information and Services Science Glycomics project overview.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
From Domain Ontologies to Modeling Ontologies to Executable Simulation Models Gregory A. Silver Osama M. Al-Haj Hassan John A. Miller University of Georgia.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.
Proteomics databases for comparative studies: Transactional and Data Warehouse approaches Patricia Rodriguez-Tomé, Nicolas Pinaud, Thomas Kowall GeneProt,
Applying Semantic Technologies to the Glycoproteomics Domain W. S York May 15, 2006.
Glycan database. Database of molecules Two models (of vocabularies) – Proteins / Nucleic Acids Residues (+ modifications) Genbank / Swissprot – Compounds.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Mining the Biomedical Research Literature Ken Baclawski.
A collaborative tool for sequence annotation. Contact:
Japan Consortium for Glycobiology and Glycotechnology DataBase 日本糖鎖科学統合データベース GDGDB - Glyco-Disease Genes Database The complexity of glycan metabolic pathways.
Bioinformatics Research Overview Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for.
Proposed Research Problem Solving Environment for T. cruzi Intuitive querying of multiple sets of heterogeneous databases Formulate scientific workflows.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 12 RDF, OWL, Minimax.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Japan Consortium for Glycobiology and Glycotechnology DataBase 日本糖鎖科学統合データベース PACDB - Pathogen Adherence to Carbohydrate Database The Pathogen Adherence.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Special thanks to Christopher Thomas & Satya Sanket Sahoo
LSDIS Lab, Department of Computer Science,
Semantic Visualization
knowledge organization for a food secure world
Amit Sheth LSDIS Lab & Semagix University of Georgia
Kiyoko F. Aoki-Kinoshita Dept. of Bioinformatics, Soka University
Accelerating Research in Life Sciences
Collaborative RO1 with NCBO
Presentation transcript:

Semantics powered Bioinformatics Amit Sheth, William S. York, et al Large Scale Distributed Information Systems Lab & Complex Carbohydrate Research Center University of Georgia Project Information:

Background: SW for Life Sciences Bioinformatics of Glycan Expression – component of the NCRR "Integrated Technology Resource for Biomedical Glycomics”. W3C Interest Group on Semantic Web for Health care and Life Sciences Deployed Active Semantic Electronic Medical Patient Record application at the Athens Heart Center

Agenda Review of Accomplishments/Ongoing Work: oGLYDE standard oGlycO Ontology oProPreO Ontology oSemantic Analytical Glycomics Workflow oVisualization oSemantic Web Services: WSDL-S/METEOR-S

GLYDE standard An XML based representation format for glycan structures Inter-convertible with existing data represented using IUPAC or LINUCS. In progress: Incorporation of Probability based representation In progress: Incorporation of aspects for visualization of structures using GLYDE (XML) files GLYDE - An expressive XML standard for the representation of glycan structure. Carbohydrate Research, 340 (18), Dec 30, 2005.

Enable querying and export of query results in GLYDE format Using GLYDE representation for disambiguation, mapping and matching MonosaccharideDB SweetDB KEGG.. QUERY RESULT GLYDE Collaborative GlycoInformatics

Development of GLYDE semantic web portal Integration with oVisualization aspect integrated with LiGraph (Heidelberg) or OntoVista (UGA) Semantic Annotation of publications in GlycoProteomics domain GLYDE Semantic Portal KEGG MonosaccharideDB Collaborative GlycoInformatics

Evolving collaboration between: LSDIS/CCRC: Will York, Amit Sheth, Michael Pierce EUROCarbDB (German Cancer Research Center): Willi von der Lieth Consortium for Functional Glycomics (CFG): Rahul Raman, Ram Sasisekharan, Thomas Lütteke N.D. Zelinsky Institute of Organic Chemistry (Moscow) Yuriy Knirel Mitsui Knowledge Industry (Japan): Hisashi Narimatsu, Norihiro Kikuchi Kyoto Encyclopedia of Genes and Genomes (KEGG): Minoru Kanehisa, Kiyoko F. Aoki-Kinoshita Palo Alto Research Center (PARC): David Goldberg,

Semantic GlcyoInformatics - Ontologies GlycOGlycO: A domain ontology for glycan structures, glycan functions and enzymes (embodying knowledge of the structure and metabolisms of glycans) oContains 600+ classes and 100+ properties – describe structural features of glycans; unique population strategy oURL: ProPreOProPreO: a comprehensive process Ontology modeling experimental proteomics oContains 330 classes, 6 million+ instances oModels three phases of experimental proteomics URL:

GlycO taxonomy The first levels of the GlycO taxonomy Most relationships and attributes in GlycO GlycO exploits the expressiveness of OWL-DL. Cardinality constraints, value constraints, Existential and Universal restrictions on Range and Domain of properties allow the classification of unknown entities as well as the deduction of implicit relationships.

Pathway representation in GlycO Pathways do not need to be explicitly defined in GlycO. The residue-, glycan-, enzyme- and reaction descriptions contain all the knowledge necessary to infer pathways.

Zooming in a little … The N-Glycan with KEGG ID is the substrate to the reaction R05987, which is catalyzed by an enzyme of the class EC The product of this reaction is the Glycan with KEGG ID Reaction R05987 catalyzed by enzyme adds_glycosyl_residue N-glycan_b-D-GlcpNAc_13

Ontology Population The next slides show the different steps that were necessary to populate GlycO with glycan structures from multiple sources. GLYDE is used to disambiguate between representations from multiple sources

Ontology population workflow

[][Asn]{[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-GlcpNAc] {[(4+1)][b-D-Manp] {[(3+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc] {}[(4+1)][b-D-GlcpNAc] {}}[(6+1)][a-D-Manp] {[(2+1)][b-D-GlcpNAc]{}}}}}}

ProPreO: A process ontology to capture proteomics experimental lifecycle: oSeparation oMass spectrometry oAnalysis o330 classes o110 properties o6 million+ instances ProPreO

Manual annotation of mouse kidney spectrum by a human expert. For clarity, only 19 of the major peaks have been annotated. Usage: Mass spectrometry analysis Goldberg, et al, Automatic annotation of matrix-assisted laser desorption/ionization N-glycan spectra, Proteomics 2005, 5, 865–875

P(S | M = ) = 0.6 P(T | M = ) = 0.4 Goldberg, et al, Automatic annotation of matrix-assisted laser desorption/ionization N-glycan spectra, Proteomics 2005, 5, 865–875 Semantic Annotation of Experimental Data Enables Ontology-mediated Disambiguation Allows correlation between disparate entities using Semantic Relations

Cell Culture Glycoprotein Fraction Glycopeptides Fraction extract Separation technique I Glycopeptides Fraction n*m n Signal integration Data correlation Peptide Fraction ms datams/ms data ms peaklist ms/ms peaklist Peptide listN-dimensional array Glycopeptide identification and quantification proteolysis Separation technique II PNGase Mass spectrometry Data reduction Peptide identification binning n 1 Semantic GlycoProteomics Workflow

Web Services based Workflow = Web Process Web Service 1 Web Service 4 Web Service 2 Web Service 3 WS1 WS 2 WS 3 WS 4 WORKFLOW LINUX Solaris MAC Windows XP

BOWSER Use semantics for describing Web Services WSDL-S (LSDIS/IBM) Use service-level annotation of Web Services Graphical traversal of taxonomy of biological concepts to search for Web Services

Semantic Annotation of Scientific Data ms/ms peaklist data <parameter instrument=micromass_QTOF_2_quadropole_time_of_flight_m ass_spectrometer mode = “ms/ms”/> Annotated ms/ms peaklist data

Semantic annotation of Scientific Data Annotated ms/ms peaklist data <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_s pectrometer” mode = “ms/ms”/>

Identified and quantified peptides Specific cellular process Lectin Collection of N-glycan ligands Collection of Biosynthetic enzymes Discovery of relationship between biological entities Fragment of Specific protein GlycOProPreO Gene Ontology (GO) Genomic database (Mascot / Sequest ) The inference: instances of the class collection of Biosynthetic enzymes (GNT-V) are involved in the specific cellular process (metastasis). processprocess

Formalize description and classification of Web Services using ProPreO concepts Semantic Web Services using WSDL-S <wsdl:definitions targetNamespace="urn:ngp" ….. xmlns:xsd=" <schema targetNamespace="urn:ngp“ xmlns=" ….. WSDL ModifyDBWSDL-S ModifyDB <wsdl:definitions targetNamespace="urn:ngp" …… xmlns: wssem=" xmlns: ProPreO=" > <schema targetNamespace="urn:ngp" xmlns=" …… <wsdl:message name="replaceCharacterRequest" wssem:modelReference="ProPreO#peptide_sequence"> ProPreO process Ontology data sequence peptide_sequence Concepts defined in process Ontology Description of a Web Service using: Web Service Description Language

Semantic Visualization Ontologies are meant for machine consumption Often too convoluted for the human eye The scientist needs to know the concepts she uses for annotation Build a visualization environment that translates the formal concepts into a representation the domain expert understands well

Single Glycan

Customizable Layouts Using customizable layouts, knowledge can be formalized in a machine understandable way and then visually translated for the user’s needs. –Cartoonist representation for the Glycobiologist –Chemical reactions as left side  right side, instead of convoluted representation in the ontology.

Ongoing and Future Work SemURI: Semantic URI based provenance scheme using ProPreO RDF-based version of the GLYDE schema A framework for semantic annotation of experimental data Integration of large datasets (~500MB) into ProPreO for reasoning

Further details at: