Download presentation
Presentation is loading. Please wait.
1
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005
2
Outline Proteomics Tandem Mass Spectrometry Peptide Identification Problem Identification Via Database De novo peptide identification
3
Proteomics The systematic analysis of the proteins expressed by a cell or tissue. Identification, Quantification, intractions,… Tandem Mass spectrometry is an essential tool for identification (and quantification) of the proteins in a mixture.
4
Proteins Primary structure of the proteins is a sequence in an alphabet of size 20 of amino acids.
5
Amino Acids
6
`
7
Tandem Mass Spectrum: An Example Secondary Fragmentation Ionized parent peptide
8
What is the goal ? Spectrum Peptide sequence
9
Protein Backbone H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH R i-1 RiRi R i+1 AA residue i-1 AA residue i AA residue i+1 N-terminus C-terminus
10
Breaking of Protein Backbone H...-HN-CH-CO NH-CH-CO-NH-CH-CO-…OH R i-1 RiRi R i+1 AA residue i-1 AA residue i AA residue i+1 N-terminus C-terminus H+H+
11
How Does a Peptide Fragment? m(y 1 )=19+m(A 4 ) m(y 2 )=19+m(A 4 )+m(A 3 ) m(y 3 )=19+m(A 4 )+m(A 3 )+m(A 2 ) m(b 1 )=1+m(A 1 ) m(b 2 )=1+m(A 1 )+m(A 2 ) m(b 3 )=1+m(A 1 )+m(A 2 )+m(A 3 )
14
The identification Algorithms Database Search Algorithms (Sequest, Mascot, …) De novo Algorithms (Lutefisk, Peaks,…)
15
Database Search Algorithms Interpreting the tandem mass spectral data by searching a protein database. SEQUEST (Eng. et al. 1994) Mascot (Perkins et al. 1999) ProteinProspector (Clauser et al. 1999)
16
SEQUEST (Eng et al. 94) Protein database is searched to identify the amino acid sequences with mass tolerance of 1. Produce the theoretical spectra for the candidates. Match the theoretical and experimental spectrum using a score function (Xcorr) Rank the candidates using this score.
17
Other probabilistic models for scores Qin et al. (1997) Danick et al. (2000) Bafna and Edwards (2001)
18
Why do we need de novo? Unknown genomes of certain organisms. The sequences in the protein database are not accurate. Modifications in Amino Acids: RNA editing, Post-Translational Modifications
19
Methods Tree Based Search ( Taylor et al. 97) Spectrum Graph Bases Search (Danick et al. 99) Dynamic Programming Algorithm (Chen et al. 2001) AuDeNS (Baginsky et al. 02) Sub-Optimal Algorithm (Lu and Chen 03) …
20
De Novo Identification Given a spectrum S and a defined scoring function f(), find a peptide q sequence which maximizes f(S|q).
40
AuDeNS Using Grass Mowers to preprocess the spectrum, and then employs the dynamic programming approach. Compute a relevance for peaks by using different mowers. Apply a weighted version of Chen et al. algorithm (DP).
41
Mowers Threshold Mower Window Mower Isotope Mower Intersection Mower Complement Mower
42
Summary: De novo Sequencing Sequence
43
Intensities Intensities are the second dimension of the information in spectrum. Different factors play roles in determination of the intensities.
44
Intensities (2) Amino Acid dependent factors, Ion type factors, Position-based factors (peaks in the middle of the spectrum are higher)
45
Conclusion Tandem Mass Spectrometry is now the most important tool to identify the proteins. Many approaches have been developed but there is still a long way into extracting all information which can be obtained from the mass spectra.
46
Research Themes A mixture of De Novo and Database method. (ex. Extracting tags) Using the intensities Dealing better with the PTMs. (200 types) High-throughput Experiences Clustering. Multi-Dimensional Interpretation.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.