Computational Challenges in Metabolomics (Part 1)

Slides:



Advertisements
Similar presentations
Metabolomics and the Human Metabolome Project
Advertisements

Protein Quantitation II: Multiple Reaction Monitoring
From Genome to Proteome Juang RH (2004) BCbasics Systems Biology, Integrated Biology.
Improvements in Mass Spectrometry for Life Science Research – Does Agilent Have the Answer? Ashley Sage PhD.
Welcome! Mass Spectrometry meets Cheminformatics Tobias Kind and Julie Leary UC Davis Course 7: Concepts for LC-MS Class website: CHE Spring 2008.
Proteomics Examination Yvonne (Bonnie) Eyler Technology Center 1600 Art Unit 1646 (703)
Peptide Mass Fingerprinting
Proposal for a Standard Representation of the Results of GC-MS Analysis: A Module for ArMet Helen Fuell 1, Manfred Beckmann 2, John Draper 2, Oliver Fiehn.
Metabolomics DNA RNA Protein Biochemicals (Metabolites) Genomics – 25,000 Genes Transcriptomics – 100,000 Transcripts Metabolomics – 2,800 Compounds Proteomics.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
ProReP - Protein Results Parser v3.0©
Mass Spectrometry Facility
Applications of protomic Presented By: Muhammad Rizwan Roll no: Department of Bioinformatics.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Previous Lecture: Regression and Correlation
HOW MASS SPECTROMETRY CAN IMPROVE YOUR RESEARCH
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
GTL User Facilities Facility II: Whole Proteome Analysis Michelle V. Buchanan.
Evaluated Reference MS/MS Spectra Libraries Current and Future NIST Programs.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Proteome.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Identification of regulatory proteins from human cells using 2D-GE and LC-MS/MS Victor Paromov Christian Muenyi William L. Stone.
Metabolomics 5/2/2014. ‘Omics Family Tree W. M. Claudino, et al., Journal of Clinical Oncology, 2007, 25(19), pp /2/2014.
Chapter 9 Mass Spectrometry (MS) -Microbial Functional Genomics 조광평 CBBL.
2007 GeneSpring MS GeneSpring for Metabolite BioMarker Analysis using Mass Spectrometry data Agilent Q-TOF VIP Visit Jan 16-17, 2007 Santa Clara, CA Thon.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Pharmaceutical analysis Bioavailability studies Drug metabolism studies, pharmacokinetics Characterization of potential drugs Drug degradation product.
ESI and MALDI LC/MS-MS Approaches for Larger Scale Protein Identification and Quantification: Are They Equivalent? 1P. Juhasz, 1A. Falick,1A. Graber, 1S.
Common parameters At the beginning one need to set up the parameters.
1 Chemical Analysis by Mass Spectrometry. 2 All chemical substances are combinations of atoms. Atoms of different elements have different masses (H =
Laxman Yetukuri T : Modeling of Proteomics Data
CS 461b/661b: Bioinformatics Tools and Applications Software Algorithm Mathematical Models Biology Experiments and Data.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
C. Other Enzymes PCA1 PCA2 glycolytic HSPB2 CK Other Enzymes PCA1 PCA2 Other Enzymes PC1 glycolytic HSPB2 CK glycolytic HSPB2 CK Quantitation of Changes.
Genome of the week - Enterococcus faecalis E. faecalis - urinary tract infections, bacteremia, endocarditis. Organism sequenced is vancomycin resistant.
In-Gel Digestion Why In-Gel Digest?
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Overview of Mass Spectrometry
A New Strategy of Protein Identification in Proteomics Xinmin Yin CS Dept. Ball State Univ.
Low lightHigh light High light response in Arabidopsis thaliana 4 days 1100 transcripts change Anthocyanin light response mutant.
Isotope Labeled Internal Standards in Skyline
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Salamanca, March 16th 2010 Participants: Laboratori de Proteomica-HUVH Servicio de Proteómica-CNB-CSIC Participants: Laboratori de Proteomica-HUVH Servicio.
Metabolomics MS and Data Analysis PCB 5530 Tom Niehaus Fall 2015.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
2014 생화학 실험 (1) 6주차 실험조교 : 류 지 연 Yonsei Proteome Research Center 산학협동관 421호
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
What is proteomics? Richard Mbasu and Ben Richards.
Using Scaffold OHRI Proteomics Core Facility. This presentation is intended for Core Facility internal training purposes only.
RANIA MOHAMED EL-SHARKAWY Lecturer of clinical chemistry Medical Research Institute, Alexandria University MEDICAL RESEARCH INSTITUTE– ALEXANDRIA UNIVERSITY.
Yonsei Proteome Research Center Peptide Mass Finger-Printing Part II. MALDI-TOF 2013 생화학 실험 (1) 6 주차 자료 임종선 조교 내선 6625.
Classifying Chemistry: Current Efforts in Canada
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
Data Exchange & Public Reference Data
Ivana Blaženović Postdoctoral Researcher
Bioinformatics Solutions Inc.
Microbiome: Metabolomics
A perspective on proteomics in cell biology
Microbiome: Metabolomics
Presentation transcript:

Computational Challenges in Metabolomics (Part 1) David Wishart, University of Alberta Dagstuhl Seminar on Computational Mass Spectrometry Schloss Dagstuhl, Germany Aug. 23-28, 2015

The Pyramid of Life Genome Metabolomics Proteomics Genomics Proteome Metabolome Physiological Influence Environmental Influence Proteome Genome

Why Small Molecules Count 100% of all agricultural products (herbicides, pesticides, fertilizers) are small molecules >99% of all compounds that give food or drinks their aroma, color and taste are small molecules 91% of all known drugs are small molecules >85% of all common clinical assays test for small molecules 60% of all drugs are derived from pre-existing metabolites 10-15% of identified genetic disorders involve diseases of small molecule metabolism

Proteomics vs. Metabolomics

Proteomics vs. Metabolomics Very MS or MS/MS oriented Good separation is critical Generates lots of raw data Peptide and protein ID Isotopic labeling (ICAT) helps Possible to derive 3D structure Permits protein imaging Very dependent on databases Spectral processing and deconvolution is challenging Quantitation is challenging Data analysis requires MV stats Data integration is challenging Better software is needed Very MS or MS/MS oriented Good separation is critical Generates lots of raw data Chemical ID Isotopic labeling (SIL) helps Possible to derive 3D structure Permits metabolite imaging Very dependent on databases Spectral processing and deconvolution is challenging Quantitation is challenging Data analysis requires MV stats Data integration is challenging Better software is needed

Proteomics vs. Metabolomics

Proteomics Workflow Biofluid/Extracts HPLC or PAGE Tryptic Digest MALDI plate Protein ID Mass Fingerprint MS analysis

Protein ID by PMF-MS

Metabolomics Workflow Biological or Tissue Samples Extraction Biofluids or Extracts Compound ID LC/GC-MS Spectra LC-MS or GC-MS

Compound ID by GC/LC-MS LC/GC-MS total Ion chromatogram CH3

Proteomics vs. Metabolomics Polymers of 20 amino acids (chemically similar) 185 million sequences (from DNA sequencing) Sequence defines MS & MS/MS spectra Trypsin gives definable cleavages MS alone can ID proteins (PMF) MS/MS fragmentation at 1 fixed energy MS/MS fragmentation is easily predictable and very distinct 30 common PTMs PTMs are somewhat predictable 1000s of distinct chemical classes (chemically diverse) No information from DNA sequencing Structure defines MS & MS/MS spectra (adducts, fragments) No trypsin for small molecules (CID only) MS alone cannot ID metabolites Different energies for different molecules MS/MS & EI-MS fragments not easily predictable, often similar >400 PTMs via metabolism PTMs are hard to predict

Challenges for Metabolomics Most MS-based metabolomics studies ID <100 cmpds (<1% of the known metabolome) Metabolite ID requires accurate, referential MS/MS or EI-MS spectra and/or RT information Limited experimental MS/MS, EI-MS & RT data The chemical space of most metabolomes is not fully known (perhaps >5 million compounds total) <1% of the chemicals in PubChem are relevant to metabolomics Metabolomics needs specialized compound and spectral (MS/MS, EI-MS, NMR) databases Metabolomics needs computational tools to predict biologically viable metabolites and their spectra

LC-MS Spectral DBs MoNA – 236,604 spectra, 69,946 cmpds** (12,000) METLIN – 68,124 spectra, 13,048 cmpds mzCloud – 422,349 spectra, 2975 cmpds NIST14 MS/MS – 234,284 spectra, 9344 cmpds MassBank – 28,185 spectra, 11,500 cmpds Wiley LC-MSn – >10,000 spectra, 4500 poisons ReSpect – 9107 spectra, 3595 cmpds GNPS – 9000 spectra, 4200 natural products Total #compounds with exp. MS/MS spectra ~20,000 Less than 60% are biologically relevant

How to Get Missing Spectra? Obtain or synthesize all biologically relevant molecules (metabolites, HPVs, drugs, pollutants, foods, etc.), prepare or synthesize their metabolites and collect their NMR, LC-MS and GC-MS spectra COST - 5,000,000 cmpds X $1000/cmpd = $5 billion OR Do this entire exercise computationally COST - 5,000,000 cmpds X $0.10/cmpd = $500,000

Computational Metabolomics Predicted biotransformations (50,000 --> 5,000,000) Known biomolecules (50,000) Match observed spectra to predicted spectra to ID compounds Predicted MS/MS, NMR, GC-MS Spectra of knowns + biotransformed

The Human Metabolome Database (HMDB) A web-accessible resource containing detailed information on 41,993 “quantified”, “detected” and “expected” metabolites Data mined from the literature and other eDBs 100’s of drug metabolites 1000’s of xenobiotics >10,000 reference spectra Supports sequence, spectral, structure and text searches as well as compound browsing Full data downloads http://www.hmdb.ca

The Drug Database (DrugBank v. 4.3) 1602 small molecule drugs >5000 experimental drugs Data mined from the literature and other eDBs >1000 drugs with metabolizing enzyme data >1200 drug metabolites >600 MS+NMR spectra >4200 unique drug targets 208 data fields/drug Supports sequence, spectral, structure and text searches as well as compound browsing Full data downloads http://www.drugbank.ca

The Toxic Exposome Database (T3DB) Comprehensive data on toxic compounds (drugs, pesticides, herbicides, endocrine disruptors, drugs, solvents, carcinogens, etc.) Data mined from the literature and other eDBs >3600 toxic compounds >1900 reference spectra ~2100 toxic targets Supports sequence, spectral, structure, text searches as well as compound browsing Full data downloads http://www.t3db.ca

Computational Metabolomics Predicted biotransformations (50,000 --> 5,000,000) Known biomolecules (50,000) Match observed spectra to predicted spectra to ID compounds Predicted MS/MS, NMR, GC-MS Spectra of knowns + biotransformed

Secondary Metabolism Diazepam Tempazepam Oxazepam Nordazepam CH3 Tempazepam Oxazepam Nordazepam Diazepam N-(2-Benzoyl-4-chlorophenyl)-2-acetamidoacetamide

BioTransformer

BioTransformer - Flowchart Query Molecule Other Reactions Phase I Reaction-specific structural constraints Enzyme metabolite? (Machine Learning) YES YES YES NO SOM Predictor (Machine Learning) Metabolite Generator NO SOMs NO Metabolites All structures are generated as SMILES, SDF or MOL files No metabolites

BioTransformer – SOM Prediction Preference Learning based on 100 atomic (e.g. atom type) and 10 molecular features (e.g. mass) SOM predictor was trained for 9 CYP450s Average Prediction accuracy of 84.54% Structures generated based on 92 Phase I reactions

BioTransformer Results ? 6,230 Phase I metabolites ? 9,510 Phase II metabolites 5,000 compounds ? 6,110 Microbial metabolites ? 12,340 Other metabolites 34,000 metabolites ~220,000

Computational Metabolomics Predicted biotransformations (50,000 --> 5,000,000) Known biomolecules (50,000) Match observed spectra to predicted spectra to ID compounds Predicted MS/MS, NMR, GC-MS Spectra of knowns + biotransformed

Computational Challenges in Metabolomics (Part 2) Sebastian Böcker, Friedrich Schiller University Dagstuhl Seminar on Computational Mass Spectrometry Schloss Dagstuhl, Germany Aug. 23-28, 2015