Presentation is loading. Please wait.

Presentation is loading. Please wait.

Previous Lecture: Regression and Correlation

Similar presentations


Presentation on theme: "Previous Lecture: Regression and Correlation"— Presentation transcript:

1 Previous Lecture: Regression and Correlation

2 This Lecture Introduction to Biostatistics and Bioinformatics
Proteomics Informatics

3 Proteomics Informatics – Learning Objectives
Structure of mass spectrometry data Protein identification Protein quantitation

4 Protein Identification and Quantitation
by Mass Spectrometry Samples Peptides Mass Spectrometry Quantity intensity m/z Identity

5 Sample preparation for protein identification,
characterization and quantitation Lysis Fractionation Digestion Mass spectrometry

6 Overview of Mass spectrometry
Ion Source Mass Analyzer Detector intensity mass/charge

7 Mass Spectrometry (MS)

8 Example data – MALDI-TOF Peptide intensity vs m/z

9 Peptide Fragmentation
Mass Analyzer 1 Frag-mentation Detector Ion Source Mass Analyzer 2 b y

10 Liquid Chromatography (LC)-MS/MS
Ion Source Mass Analyzer 1 Frag-mentation Mass Analyzer 2 Detector intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge intensity mass/charge Time

11 Example data – ESI-LC-MS/MS
Peptide intensity vs m/z vs time m/z m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022 MS/MS Fragment intensity vs m/z Time

12 Charge-State Distributions
MALDI ESI 1+ 2+ 3+ Peptide intensity intensity 4+ 2+ 1+ mass/charge mass/charge M - molecular mass n - number of charges H – mass of a proton MALDI ESI 2+ 27+ 3+ 1+ Protein 31+ intensity 4+ intensity 5+ mass/charge mass/charge

13 Charge-State Example:
M - molecular mass n - number of charges H – mass of a proton Example: peptide of mass 898 carrying 1 H+ = ( ) / 1 = 899 m/z carrying 2 H+ = ( ) / 2 = 450 m/z carrying 3 H+ = ( ) / 3 = m/z

14 Isotope Distributions
12C 14N 16O 1H 32S +1Da Intensity +2Da +3Da m/z m/z m/z 0.015% 2H 1.11% 13C 0.366% 15N 0.038% 17O, 0.200% 18O, 0.75% 33S, 4.21% 34S, 0.02% 36S Only 12C and 13C: p=0.0111 n is the number of C in the peptide m is the number of 13C in the peptide Tm is the relative intensity of the peptide m 13C 𝑇 𝑚 = 𝑛 𝑚 𝑝 𝑚 (1−𝑝) 𝑛−𝑚

15 Isotope Clusters and Charge State
1+ 1 Intensity m/z 2+ 0.5 Intensity m/z 3+ 0.33 Intensity m/z

16 What is the Charge State?
between the isotopes is 0.5 Da between the isotopes is 0.33 Da

17 Protein Identification
by Mass Spectrometry Samples Peptides Mass Spectrometry intensity m/z Identity

18 Protein Identification - Exercise
1. Protein identification: NUP1 was genomically tagged protein A, affinity purified under two conditions, and the resulting protein mixture was analyzed with liquid chromatography mass spectrometry (LC-MS). Search the resulting spectra (NUP1-less-stringent-wash.mgf, NUP1-more-stringent-wash.mgf) using X! Tandem ( Change the taxon to “S. cerevisiae (budding yeast)” but otherwise keep the default parameter settings. a. Look at the list of identified proteins and explain why they are found in this sample. More information is also available by selecting the “go”, “path”, “ppi”, “doms”, “string” tabs on top of the page. b. Select the “mh” display on top right of the page, and zoom in to +/-100 ppm (the default setting for the mass accuracy that was used in the search). What precursor mass accuracy should we have used? Zoom in further and determine what precursor mass accuracy could have been used if the spectra were recalibrated (the error distribution centered at zero).

19 Identification – Tandem MS

20 Tandem MS – Sequence Confirmation
K L E D F G S m/z % Relative Abundance 100 250 500 750 1000

21 Tandem MS – Sequence Confirmation
K L E D F G S K 1166 L 1020 E 907 D 778 663 534 405 F 292 G 145 S 88 b ions m/z % Relative Abundance 100 250 500 750 1000

22 Tandem MS – Sequence Confirmation
K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000

23 Tandem MS – Sequence Confirmation
K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

24 Tandem MS – Sequence Confirmation
K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

25 Tandem MS – Sequence Confirmation
K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022 113 113

26 Tandem MS – Sequence Confirmation
K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022 129 129

27 Tandem MS – de novo Sequencing
762 100 Amino acid masses 875 [M+2H]2+ % Relative Abundance 633 292 405 260 389 534 1022 504 663 778 907 1020 1080 250 500 750 1000 m/z Mass Differences Sequences consistent with spectrum

28 Tandem MS – de novo Sequencing

29 Tandem MS – de novo Sequencing

30 Tandem MS – de novo Sequencing
X X X …GF(I/L)EEDE(I/L)… …(I/L)EDEE(I/L)FG… Peptide M+H = 1166 = 87 => S SGF(I/L)EEDE(I/L)… SGF(I/L)EEDE(I/L)… 1166 – 1020 – 18 = 128 K or Q SGF(I/L)EEDE(I/L)(K/Q) …GF(I/L)EEDE(I/L)… …(I/L)EDEE(I/L)FG… X X X

31 Tandem MS – de novo Sequencing
Challenges in de novo sequencing Neutral loss (-H2O, -NH3) Modifications Background peaks Incomplete information Challenges in de novo sequencing Neutral loss (-H2O, -NH3) Modifications Background peaks Incomplete information

32 Tandem MS – Database Search
Sequence DB Lysis Fractionation Pick Protein Digestion LC-MS Pick Peptide Repeat for all proteins MS/MS All Fragment Masses all peptides Repeat for MS/MS Compare, Score, Test Significance

33 Information Content in a Single Mass Measurement
Human 10 8 6 Avg. #of matching peptides 4 3 2 1 #of matching peptides Tryptic peptide mass [Da] S. cerevisiae 10 8 6 Avg. #of matching peptides 4 3 2 1 #of matching peptides Tryptic peptide mass [Da]

34 Protein Identification and Quantitation
by Mass Spectrometry Samples Peptides Mass Spectrometry Quantity intensity m/z

35 Protein Quantitation by Mass Spectrometry
Sample i Protein j Peptide k Lysis Fractionation Digestion MS LC-MS

36 Quantitation – Label-Free (MS)
Sample i Protein j Peptide k Lysis Assumption: constant for all samples Fractionation Digestion LC-MS MS MS

37 Quantitation – Metabolic Labeling
Light Heavy Lysis Fractionation Digestion LC-MS Sample i Protein j Peptide k MS H L Oda et al. PNAS 96 (1999) 6591 Ong et al. MCP 1 (2002) 376

38 Quantitation – Labeled Synthetic Peptides
Assumption: All losses after mixing are identical for the heavy and light isotopes and Lysis Fractionation Digestion Synthetic Peptides (Heavy) Light Enrichment with Peptide antibody LC-MS Anderson, N.L., et al. Proteomics 3 (2004) MS H L Gerber et al. PNAS 100 (2003) 6940

39 Estimating peptide quantity
Peak height Peak height Curve fitting Curve fitting Intensity Peak area m/z

40 What is the best way to estimate quantity?
Peak height - resistant to interference - poor statistics Peak area - better statistics - more sensitive to interference Curve fitting - better statistics - needs to know the peak shape - slow Spectrum counting - resistant to interference - easy to implement - poor statistics for low-abundance proteins

41 Proteomics Informatics - Summary
Structure of mass spectrometry data Protein identification Protein quantitation

42 Next Lecture: Gene Expression

43 Protein Quantitation - Exercise
2. Protein quantitation: Two breast tumor xenografts (one basal and one luminal) were analyzed in by LC-MS and the spectral counts for the identified peptides in the different analyses are listed in two-sample-three-replicate-comparison.txt. a. Compare replicate one of Sample 1 with replicate one of Sample 2 using proteomics_no_replicate.py. Which differences are significant? b. Compare replicate one and two of Sample 1 using proteomics_one_replicate.py. Compare to the distribution in 2a. Which differences are significant in 2a? c. Compare the three replicates of Sample 1 with the three replicates of Sample 2 using proteomics_three_replicates.py. Which differences are significant? d. In cases when a protein is not observed in one sample, how many spectra do we need to observe in the other sample to say that there is a significant difference?

44 Phosphorylation Exercise: an unmodified peptide
Theoretical fragment ions You could give that as a help to see what changes etc.

45 Spectrum of the phosphorylated peptide
You could give that as a help to see what changes etc.

46 Spectrum of the peptide phosphorylated at a different site
You could give that as a help to see what changes etc.


Download ppt "Previous Lecture: Regression and Correlation"

Similar presentations


Ads by Google