Semantics Enabled Industrial and Scientific Applications: Research, Technology and Deployed Applications Part III: Biological Applications Keynote - the.

Slides:



Advertisements
Similar presentations
Knowledge Modeling and its Application in Life Sciences: A Tale of two ontologies Bioinformatics for Glycan Expression Integrated Technology Resource for.
Advertisements

Semantic empowerment of Health Care and Life Science Applications WWW 2006 W3C Track, May WWW 2006 W3C Track, May Amit Sheth LSDIS LabLSDIS.
Jim Hendler Chief Scientist - Information Systems Office DARPA.
RDB2RDF: Incorporating Domain Semantics in Structured Data Satya S. Sahoo Kno.e.sis CenterKno.e.sis Center, Computer Science and Engineering Department,
Web Services for N-Glycosylation Process Integrated Technology Resource for Biomedical Glycomics NCRR/NIH Satya S. Sahoo, Amit P. Sheth, William S. York,
Semantic Web & Semantic Web Services: Applications in Healthcare and Scientific Research International IFIP Conference on Applications of Semantic Web.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
UNCERTML - DESCRIBING AND COMMUNICATING UNCERTAINTY Matthew Williams
Basics of Knowledge Management ICOM5047 – Design Project in Computer Engineering ECE Department J. Fernando Vega Riveros, Ph.D.
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Semantic Web Technology in Support of Bioinformatics for Glycan Expression Amit Sheth Large Scale Distributed Information Systems (LSDIS) lab, Univ. of.
Semantics powered Bioinformatics Amit Sheth, William S. York, et al Large Scale Distributed Information Systems Lab & Complex Carbohydrate Research Center.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
GTL Facilities Characterization and Imaging of Molecular Machines Lee Makowski.
FIGURE 5. Plot of peptide charge state ratios. Quality Control Concept Figure 6 shows a concept for the implementation of quality control as system suitability.
Daehee Hwang Leroy Hood Institute for Systems Biology.
GTL User Facilities Facility II: Whole Proteome Analysis Michelle V. Buchanan.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)
Semantic Web applications in Financial Industry, Government, Health care and Life Sciences SWEG 2006, March 2006 Amit Sheth LSDIS Lab, Department of Computer.
Knowledge Enabled Information and Services Science GlycO.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Computing for Human Experience and Wellness: Views from the LSDIS UGA Amit Sheth Large Scale Distributed Information Systems (LSDIS) lab, Univ. of.
EUROCarbDB CCRC – Database for high quality mass spectrometry data Khalifeh Al Jadda 1, Haseeb Yousef 1, Kitae Myong 1, Srikalyan Swayampakula 1, David.
Managing Information Quality in e-Science using Semantic Web technology Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University.
EXCS Sept Knowledge Engineering Meets Software Engineering Hele-Mai Haav Institute of Cybernetics at TUT Software department.
Semantics in the Semantic Web– the implicit, the formal and the powerful (with a few examples from Glycomics) Amit Sheth Large Scale Distributed Information.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
Teranode Tools and Platform for Pathway Analysis Michael Kellen, Solution Manager June 16, 2006.
10/18/20151 Business Process Management and Semantic Technologies B. Ramamurthy.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
updated CmpE 583 Fall 2008 Ontology Integration- 1 CmpE 583- Web Semantics: Theory and Practice ONTOLOGY INTEGRATION Atilla ELÇİ Computer.
Laxman Yetukuri T : Modeling of Proteomics Data
Semantic empowerment of Life Science Applications October 2006 Amit Sheth LSDIS Lab, Department of Computer Science, University of Georgia Acknowledgement:
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Knowledge Enabled Information and Services Science SAWSDL: Tools and Applications Amit P. Sheth Kno.e.sis Center Wright State University, Dayton, OH Knoesis.wright.edu.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Knowledge Enabled Information and Services Science Glycomics project overview.
From Domain Ontologies to Modeling Ontologies to Executable Simulation Models Gregory A. Silver Osama M. Al-Haj Hassan John A. Miller University of Georgia.
Proteomics databases for comparative studies: Transactional and Data Warehouse approaches Patricia Rodriguez-Tomé, Nicolas Pinaud, Thomas Kowall GeneProt,
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
Applying Semantic Technologies to the Glycoproteomics Domain W. S York May 15, 2006.
Glycan database. Database of molecules Two models (of vocabularies) – Proteins / Nucleic Acids Residues (+ modifications) Genbank / Swissprot – Compounds.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Bioinformatics Research Overview Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for.
Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Proposed Research Problem Solving Environment for T. cruzi Intuitive querying of multiple sets of heterogeneous databases Formulate scientific workflows.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
Web Service Semantics - WSDL-S Meenakshi Nagarajan for the WSDL-SWSDL-S team R. Akkiraju *, J. Farrell *, J.Miller, M. Nagarajan, M. Schmidt *, A. Sheth,
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Salamanca, March 16th 2010 Participants: Laboratori de Proteomica-HUVH Servicio de Proteómica-CNB-CSIC Participants: Laboratori de Proteomica-HUVH Servicio.
High throughput biology data management and data intensive computing drivers George Michaels.
Carbonyl-Reactive Tandem Mass Tags for the Proteome-Wide Quantification of N-Linked Glycans Hannes Hahne, Patrick Neubert, Karsten Kuhn, Chris Etienne,
Special thanks to Christopher Thomas & Satya Sanket Sahoo
‘Ontology Management’ Peter Fox (Semantic Web Cluster lead)
LSDIS Lab, Department of Computer Science,
Semantic Visualization
knowledge organization for a food secure world
Web Ontology Language for Service (OWL-S)
Amit Sheth LSDIS Lab & Semagix University of Georgia
A perspective on proteomics in cell biology
Kiyoko F. Aoki-Kinoshita Dept. of Bioinformatics, Soka University
Collaborative RO1 with NCBO
Presentation transcript:

Semantics Enabled Industrial and Scientific Applications: Research, Technology and Deployed Applications Part III: Biological Applications Keynote - the First Online Metadata and Semantics Research Conference November 23, 2005http:// Amit Sheth LSDIS Lab, Department of Computer Science, University of Georgia Acknowledgement: NCRR funded Bioinformatics of Glycan Expression, collaborators, partners at CCRC (Dr. William S. York) and Satya S. Sahoo, Christopher Thomas, Cartic Ramakrishan.Bioinformatics of Glycan Expression

Computation, data and semantics in life sciences “The development of a predictive biology will likely be one of the major creative enterprises of the 21 st century.” Roger Brent, 1999 “The future will be the study of the genes and proteins of organisms in the context of their informational pathways or networks.” L. Hood, 2000 "Biological research is going to move from being hypothesis-driven to being data-driven." Robert Robbins We’ll see over the next decade complete transformation (of life science industry) to very database-intensive as opposed to wet-lab intensive.” Debra Goldfarb We will show how semantics is a key enabler for achieving the above predictions and visions.

Expressiveness Range: Knowledge Representation and Ontologies Catalog/ID General Logical constraints Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal is-a Formal instance Value Restriction Disjointness, Inverse, part of… Simple Taxonomies Expressive Ontologies Wordnet CYC RDFDAML OO DB SchemaRDFS IEEE SUOOWL UMLS GO KEGG TAMBIS EcoCyc BioPAX GlycO SWETO Pharma Ontology Dimensions After McGuinness and Finin

Bioinformatics Apps & Ontologies GlycOGlycO: A domain ontology for glycan structures, glycan functions and enzymes (embodying knowledge of the structure and metabolisms of glycans)  Contains 600+ classes and 100+ properties – describe structural features of glycans; unique population strategy  URL: ProPreOProPreO: a comprehensive process Ontology modeling experimental proteomics  Contains 330 classes, 40,000+ instances  Models three phases of experimental proteomics* – Separation techniques, Mass Spectrometry and, Data analysis; URL: Automatic semantic annotation of high throughput experimental dataAutomatic semantic annotation of high throughput experimental data (in progress) Semantic Web Process with WSDL-S for semantic annotations of Web ServicesSemantic Web Process with WSDL-S for semantic annotations of Web Services – -> Glycomics project (funded by NCRR)

GlycO – A domain ontology for glycans

GlycO

Structural modeling and population challenges in GlycO Extremely large number of glycans occurring in nature But, frequently there are small differences structural properties Modeling all possible glycans would involve significant amount of redundant classes Redundancy results in often fatal complexities in maintenance and upgrade Population –Manual –Extraction and integration from external knowledge sources –GlycoTree – exploiting structural composition rules

Ontology population workflow GlycoTree Takahashi, Kato 2003

GlycoTree – A Canonical Representation of N-Glycans N. Takahashi and K. Kato, Trends in Glycosciences and Glycotechnology, 15:  - D -GlcpNAc  - D -Manp -(1-4)-  - D -Manp -(1-6)+  - D -GlcpNAc -(1-2)-  - D -Manp -(1-3)+  - D -GlcpNAc -(1-4)-  - D -GlcpNAc -(1-2)+  - D -GlcpNAc -(1-6)+

Beyond expressiveness afforded in OWL Probabilistic more

Manual annotation of mouse kidney spectrum by a human expert. For clarity, only 19 of the major peaks have been annotated. Example: Mass spectrometry analysis Goldberg, et al, Automatic annotation of matrix-assisted laser desorption/ionization N-glycan spectra, Proteomics 2005, 5, 865–875

Mass Spectrometry Experiment Each m/z value in mass spec diagrams can stand for many different structures (uncertainty wrt to structure that corresponds to a peak) Different linkage Different bond Different isobaric structures

Very subtle differences Peak at Same molecular composition One diverging link Found in different organisms background knowledge (found in honeybee venom or bovine cells) can resolve the uncertainty These are core-fucosylated high-mannose glycans CBank: Honeybee venom CBank: Bovine

Even in the same organism Both Glycans found in bovine cells Both have a mass of Same composition Different linkage Since expression levels of different genes can be measured in the cell, we can get probability of each structure in the sample Different enzymes lead to these linkages CBank: CBank: 21982

Model 1: associate probability as part of Semantic Annotation Annotate the mass spec diagram with all possibilities and assign probabilities according to the scientist’s or tool’s best knowledge

P(S | M = ) = 0.6 P(T | M = ) = 0.4 Goldberg, et al, Automatic annotation of matrix-assisted laser desorption/ionization N-glycan spectra, Proteomics 2005, 5, 865–875

Model 2: Probability in ontological representation of Glycan structure Build a generalized probabilistic glycan structure that embodies several possible glycans

N-GlycosylationProcessNGP N-Glycosylation Process (NGP) Cell Culture Glycoprotein Fraction Glycopeptides Fraction extract Separation technique I Glycopeptides Fraction n*m n Signal integration Data correlation Peptide Fraction ms datams/ms data ms peaklist ms/ms peaklist Peptide listN-dimensional array Glycopeptide identification and quantification proteolysis Separation technique II PNGase Mass spectrometry Data reduction Peptide identification binning n 1

Phase II: Ontology Population  Populate ProPreO with all experimental datasets?  Two levels of ontology population for ProPreO:  Level 1: Populate the ontology with instances that a stable across experimental runs Ex: Human Tryptic peptides – 40,000 instances in ProPreO  Level 2: Use of URIs to point to actual experimental datasets

Ontology-mediated Proteomics Protocol RAW Files Mass Spectrometer Conversion To PKL PreprocessingDB SearchPost processing Data Processing Application Instrument DB Storing Output PKL Files (XML-based Format) ‘Clean’ PKL Files RAW Results File Output (*.dat) Micromass_Q_TOF_ultima_quadrupole_time_of_flig ht_mass_spectrometer Masslynx_Micromass_application mass_spec_raw_data Micromass_Q_TOF_micro_quadrupole_time_of_f light_ms_raw_data PeoPreO produces_ms-ms_peak_list All values of the produces ms-ms peaklist property are micromass pkl ms-ms peaklist RAW Files ‘Clean’ PKL Files

Semantic Annotation of Scientific Data ms/ms peaklist data <parameter instrument=micromass_QTOF_2_quadropole_time_of_flight_m ass_spectrometer mode = “ms/ms”/> Annotated ms/ms peaklist data

Semantic annotation of Scientific Data Annotated ms/ms peaklist data <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_s pectrometer” mode = “ms/ms”/>

 Formalize description and classification of Web Services using ProPreO concepts Service description using WSDL-S <wsdl:definitions targetNamespace="urn:ngp" ….. xmlns:xsd=" <schema targetNamespace="urn:ngp“ xmlns=" ….. WSDL ModifyDBWSDL-S ModifyDB <wsdl:definitions targetNamespace="urn:ngp" …… xmlns: wssem=" xmlns: ProPreO=" > <schema targetNamespace="urn:ngp" xmlns=" …… <wsdl:message name="replaceCharacterRequest" wssem:modelReference="ProPreO#peptide_sequence"> ProPreO process Ontology data sequence peptide_sequence Concepts defined in process Ontology Description of a Web Service using: Web Service Description Language

Summary, Observations, Conclusions Ontology Schema: relatively simple in business/industry, highly complex in science Ontology Population: could have millions of assertions, or unique features when modeling complex life science domains Ontology population could be largely automated if access to high quality/curated data/knowledge is available; ontology population involves disambiguation and results in richer representation than extracted sources, rules based population Ontology freshness (and validation—not just schema correctness but knowledge—how it reflects the changing world)

Summary, Observations, Conclusions Some applications: semantic search, semantic integration, semantic analytics, decision support and validation (e.g., error prevention in healthcare), knowledge discovery, process/pathway discovery, …

More information at