Previous Lecture: Regression and Correlation

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Protein Quantitation II: Multiple Reaction Monitoring
Protein Quantitation II: Multiple Reaction Monitoring
Proteomics Informatics – Protein characterization I: post-translational modifications (Week 10)
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Proteomics Informatics – Protein identification III: de novo sequencing (Week 6)
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Mass Spectrometry in a drug discovery setting Claus Andersen Senior Scientist Sienabiotech Spa.
Protein Sequencing and Identification by Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
PROTEOMICS LECTURE. Genomics DNA (Gene) Functional Genomics TranscriptomicsRNA Proteomics PROTEIN Metabolomics METABOLITE Transcription Translation Enzymatic.
ProReP - Protein Results Parser v3.0©
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
De Novo Sequencing of MS Spectra
Proteomics Informatics – Overview of Mass spectrometry (Week 2) Ion Source Mass Analyzer Detector mass/charge intensity.
Scaffold Download free viewer:
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS Gygi et al (2003) PNAS 100(12), presented by Jessica.
Proteomics Informatics Workshop Part III: Protein Quantitation
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications.
Proteomics Informatics – Overview of Mass spectrometry (Week 2)
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
A highly abbreviated introduction to proteomics
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Comparison of chicken light and dark meat using LC MALDI-TOF mass spectrometry as a model system for biomarker discovery WP 651 Jie Du; Stephen J. Hattan.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
Dinosaur Proteomics. 2 Claims Proteins can be extracted from fossilized bones Extracted proteins can be analyzed by LC-MS/MS MS/MS can be matched to.
Common parameters At the beginning one need to set up the parameters.
Laxman Yetukuri T : Modeling of Proteomics Data
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
C. Other Enzymes PCA1 PCA2 glycolytic HSPB2 CK Other Enzymes PCA1 PCA2 Other Enzymes PC1 glycolytic HSPB2 CK glycolytic HSPB2 CK Quantitation of Changes.
Genome of the week - Enterococcus faecalis E. faecalis - urinary tract infections, bacteremia, endocarditis. Organism sequenced is vancomycin resistant.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Overview of Mass Spectrometry
Isotope Labeled Internal Standards in Skyline
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Salamanca, March 16th 2010 Participants: Laboratori de Proteomica-HUVH Servicio de Proteómica-CNB-CSIC Participants: Laboratori de Proteomica-HUVH Servicio.
ISOMATCH-web For automatic matching of isotope peak distributions ■ Automatic matching of a raw spectrum (ASCII format) to theoretical isotopic distributions.
Oct 2011 SDMBT1 Lecture 11 Some quantitation methods with LC-MS a.ICAT b.iTRAQ c.Proteolytic 18 O labelling d.SILAC e.AQUA f.Label Free quantitation.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
Protein quantitation I: Overview (Week 5). Fractionation Digestion LC-MS Lysis MS Sample i Protein j Peptide k Proteomic Bioinformatics – Quantitation.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
B Monoisotopic mass of neutral peptide M r (calc): Fixed modifications: Carbamidomethyl Ions score: 45 † Expect: ‡ Matches (red): 18/50.
Goals in Proteomics Identify and quantify proteins in complex mixtures/complexes Identify global protein-protein interactions Define protein localizations.
LC-MS/MS Identification of Impurities Present in Synthetic Peptide Drugs Dr Anna Meljon*, Dr Alan Thompson, Dr Osama Chahrour, and Dr John Malone Almac.
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
2 Dimensional Gel Electrophoresis
Bioinformatics Solutions Inc.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
Proteomics Informatics –
2D-LC-MS/MS analysis of tryptic digest of HEK293-SUMO3 cells (2 μg inj
Shotgun Proteomics in Neuroscience
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Kuen-Pin Wu Institute of Information Science Academia Sinica
Presentation transcript:

Previous Lecture: Regression and Correlation

This Lecture Introduction to Biostatistics and Bioinformatics Proteomics Informatics

Proteomics Informatics – Learning Objectives Structure of mass spectrometry data Protein identification Protein quantitation

Protein Identification and Quantitation by Mass Spectrometry Samples Peptides Mass Spectrometry Quantity intensity m/z Identity

Sample preparation for protein identification, characterization and quantitation Lysis Fractionation Digestion Mass spectrometry

Overview of Mass spectrometry Ion Source Mass Analyzer Detector intensity mass/charge

Mass Spectrometry (MS)

Example data – MALDI-TOF Peptide intensity vs m/z

Peptide Fragmentation Mass Analyzer 1 Frag-mentation Detector Ion Source Mass Analyzer 2 b y

Liquid Chromatography (LC)-MS/MS Ion Source Mass Analyzer 1 Frag-mentation Mass Analyzer 2 Detector intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge Time

Example data – ESI-LC-MS/MS Peptide intensity vs m/z vs time m/z m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022 MS/MS Fragment intensity vs m/z Time

Charge-State Distributions MALDI ESI 1+ 2+ 3+ Peptide intensity intensity 4+ 2+ 1+ mass/charge mass/charge M - molecular mass n - number of charges H – mass of a proton MALDI ESI 2+ 27+ 3+ 1+ Protein 31+ intensity 4+ intensity 5+ mass/charge mass/charge

Charge-State Example: M - molecular mass n - number of charges H – mass of a proton Example: peptide of mass 898 carrying 1 H+ = (898 + 1) / 1 = 899 m/z carrying 2 H+ = (898 + 2) / 2 = 450 m/z carrying 3 H+ = (898 + 3) / 3 = 300.3 m/z

Isotope Distributions 12C 14N 16O 1H 32S +1Da Intensity +2Da +3Da m/z m/z m/z 0.015% 2H 1.11% 13C 0.366% 15N 0.038% 17O, 0.200% 18O, 0.75% 33S, 4.21% 34S, 0.02% 36S Only 12C and 13C: p=0.0111 n is the number of C in the peptide m is the number of 13C in the peptide Tm is the relative intensity of the peptide m 13C 𝑇 𝑚 = 𝑛 𝑚 𝑝 𝑚 (1−𝑝) 𝑛−𝑚

Isotope Clusters and Charge State 1+ 1 Intensity m/z 2+ 0.5 Intensity m/z 3+ 0.33 Intensity m/z

What is the Charge State? 713.3225 432.8990 713.8239 433.2330 714.3251 433.5671 714.8263 433.9014 between the isotopes is 0.5 Da between the isotopes is 0.33 Da

Protein Identification by Mass Spectrometry Samples Peptides Mass Spectrometry intensity m/z Identity

Protein Identification - Exercise 1. Protein identification: NUP1 was genomically tagged protein A, affinity purified under two conditions, and the resulting protein mixture was analyzed with liquid chromatography mass spectrometry (LC-MS). Search the resulting spectra (NUP1-less-stringent-wash.mgf, NUP1-more-stringent-wash.mgf) using X! Tandem (http://h.thegpm.org/tandem/thegpm_tandem.html). Change the taxon to “S. cerevisiae (budding yeast)” but otherwise keep the default parameter settings. a. Look at the list of identified proteins and explain why they are found in this sample. More information is also available by selecting the “go”, “path”, “ppi”, “doms”, “string” tabs on top of the page. b. Select the “mh” display on top right of the page, and zoom in to +/-100 ppm (the default setting for the mass accuracy that was used in the search). What precursor mass accuracy should we have used? Zoom in further and determine what precursor mass accuracy could have been used if the spectra were recalibrated (the error distribution centered at zero).

Identification – Tandem MS

Tandem MS – Sequence Confirmation K L E D F G S m/z % Relative Abundance 100 250 500 750 1000

Tandem MS – Sequence Confirmation K L E D F G S K 1166 L 1020 E 907 D 778 663 534 405 F 292 G 145 S 88 b ions m/z % Relative Abundance 100 250 500 750 1000

Tandem MS – Sequence Confirmation K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000

Tandem MS – Sequence Confirmation K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

Tandem MS – Sequence Confirmation K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

Tandem MS – Sequence Confirmation K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022 113 113

Tandem MS – Sequence Confirmation K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022 129 129

Tandem MS – de novo Sequencing 762 100 Amino acid masses 875 [M+2H]2+ % Relative Abundance 633 292 405 260 389 534 1022 504 663 778 907 1020 1080 250 500 750 1000 m/z Mass Differences Sequences consistent with spectrum

Tandem MS – de novo Sequencing

Tandem MS – de novo Sequencing

Tandem MS – de novo Sequencing X X X …GF(I/L)EEDE(I/L)… …(I/L)EDEE(I/L)FG… Peptide M+H = 1166 1166 -1079 = 87 => S SGF(I/L)EEDE(I/L)… SGF(I/L)EEDE(I/L)… 1166 – 1020 – 18 = 128 K or Q SGF(I/L)EEDE(I/L)(K/Q) …GF(I/L)EEDE(I/L)… …(I/L)EDEE(I/L)FG… X X X

Tandem MS – de novo Sequencing Challenges in de novo sequencing Neutral loss (-H2O, -NH3) Modifications Background peaks Incomplete information Challenges in de novo sequencing Neutral loss (-H2O, -NH3) Modifications Background peaks Incomplete information

Tandem MS – Database Search Sequence DB Lysis Fractionation Pick Protein Digestion LC-MS Pick Peptide Repeat for all proteins MS/MS All Fragment Masses all peptides Repeat for MS/MS Compare, Score, Test Significance

Information Content in a Single Mass Measurement Human 10 8 6 Avg. #of matching peptides 4 3 2 1 2 3 4 6 8 10 1 #of matching peptides 1000 2000 3000 Tryptic peptide mass [Da] S. cerevisiae 10 8 6 Avg. #of matching peptides 4 3 2 1 2 3 4 6 8 10 1 #of matching peptides 1000 2000 3000 Tryptic peptide mass [Da]

Protein Identification and Quantitation by Mass Spectrometry Samples Peptides Mass Spectrometry Quantity intensity m/z

Protein Quantitation by Mass Spectrometry Sample i Protein j Peptide k Lysis Fractionation Digestion MS LC-MS

Quantitation – Label-Free (MS) Sample i Protein j Peptide k Lysis Assumption: constant for all samples Fractionation Digestion LC-MS MS MS

Quantitation – Metabolic Labeling Light Heavy Lysis Fractionation Digestion LC-MS Sample i Protein j Peptide k MS H L Oda et al. PNAS 96 (1999) 6591 Ong et al. MCP 1 (2002) 376

Quantitation – Labeled Synthetic Peptides Assumption: All losses after mixing are identical for the heavy and light isotopes and Lysis Fractionation Digestion Synthetic Peptides (Heavy) Light Enrichment with Peptide antibody LC-MS Anderson, N.L., et al. Proteomics 3 (2004) 235-44 MS H L Gerber et al. PNAS 100 (2003) 6940

Estimating peptide quantity Peak height Peak height Curve fitting Curve fitting Intensity Peak area m/z

What is the best way to estimate quantity? Peak height - resistant to interference - poor statistics Peak area - better statistics - more sensitive to interference Curve fitting - better statistics - needs to know the peak shape - slow Spectrum counting - resistant to interference - easy to implement - poor statistics for low-abundance proteins

Proteomics Informatics - Summary Structure of mass spectrometry data Protein identification Protein quantitation

Next Lecture: Gene Expression

Protein Quantitation - Exercise 2. Protein quantitation: Two breast tumor xenografts (one basal and one luminal) were analyzed in by LC-MS and the spectral counts for the identified peptides in the different analyses are listed in two-sample-three-replicate-comparison.txt. a. Compare replicate one of Sample 1 with replicate one of Sample 2 using proteomics_no_replicate.py. Which differences are significant? b. Compare replicate one and two of Sample 1 using proteomics_one_replicate.py. Compare to the distribution in 2a. Which differences are significant in 2a? c. Compare the three replicates of Sample 1 with the three replicates of Sample 2 using proteomics_three_replicates.py. Which differences are significant? d. In cases when a protein is not observed in one sample, how many spectra do we need to observe in the other sample to say that there is a significant difference?

Phosphorylation Exercise: an unmodified peptide Theoretical fragment ions You could give that as a help to see what changes etc.

Spectrum of the phosphorylated peptide You could give that as a help to see what changes etc.

Spectrum of the peptide phosphorylated at a different site You could give that as a help to see what changes etc.