Identify proteins
Proteomic workflow
Trypsin A typical sample We add a solution of 50 mM NH 4 HCO 3 (pH 7.8) containing trypsin ( µg/µl). Volume depends on the dimension of the sample Incubation o/n at 37°C Surnatants are 0.22 µm filtered Depending on the volume, samples are concentrated / vacuum dried Peptide mixtures are then analysed by mass spectrometry
Trypsin
HIQK LHSMK VNELSK TTMPLW EDVPSER EGIHAQQK YLGYLEQLLR FFVAPFPEVFGK VPQLEIVPNSAEER LLILTCLVAVALARPK HQGLPQEVLNENLLR DIGSESTEDQAMEDIK EPMIGVNQELAYFYPELFR QMEAESISSSEEIVPNSVEQK This set of masses consitutes a fingerprint of the protein. An MS analysis can allow identification of this protein. MH + Peptide sequence
Identification by search in protein sequences databases
MALDI-TOF spectrum of the trypsin digestion of the pictorial sample containing milk Signals corresponding to peptides of bovine α-S1 casein
Database search with MS data only MH + data
Database search results Several unassigned signals All the signals in the spectrum are inserted in the search box
Effect of multiple Peptide Masses on Protein Identification by Mass Fingerprinting Search m/z Mass Tolerance Da N° of Hits Moreover…… The number of peptides detected that belong to the same protein strongly influences the identification
Effect of Mass Accuracy and Mass Tolerance on Peptide Mass Fingerprinting Search Results Search m/z Mass Tolerance Da N° of Hits Moreover……
Identification by MS data, (generally MALDI-TOF), suffers from: The complex mixtures of proteins of organic materials The uneven relative quantities of the different components
MS analysis What can give us more informations? The sequence of two or more peptides
Despite the quite similar name Casein α–S1 and Casein α–S2 are quite different proteins
Peptide sequencing by MS
MS/MS of Peptide Mixtures LC MS MS/MS
CHIP LC-MSMS Q-Tof
Single peptides are selected for fragmentation MS/MS Fragmentation spectra Tandem mass spectra Peptide mixture MS of the single peptides What is MS/MS?
Interpretation of an MSMS spectrum to derive structural information is analogous to solving a puzzle Use the fragment ion masses as specific pieces of the puzzle to put the intact molecule back together
-HN--CH--CO--NH--CH--CO--NH- RiRi CH-R’ cici z n-i R” d i+1 v n-i w n-i low energy high energy Cleavages Observed in MS/MS of Peptides aiai x n-i bibi y n-i
Simple Fragmentation rules Ions of the “b” serieIons of the “y” serie
Fragmentation rules
Precursor ion Doubly charged m/z = MH+ = C-term Arg
HIQK LHSMK VNELSK TTMPLW (C-terminus of the protein) EDVPSER EGIHAQQK YLGYLEQLLR FFVAPFPEVFGK VPQLEIVPNSAEER LLILTCLVAVALARPK HQGLPQEVLNENLLR DIGSESTEDQAMEDIK EPMIGVNQELAYFYPELFR QMEAESISSSEEIVPNSVEQK Since the proteolytic enzyme is trypsin, all the peptides end either with Arg (R) or Lys (K). Y 1 ion will always be either 147 (K), or 175 (R) MH + Peptide sequence
m/z = MH+ = C-term Arg 288 – 175 = 113 = Leu (L) RL 401 – 288 = 113 = Leu (L) L – 401 = 128 = GLn (Q) Q 529 E =E 113=L 163=Y 113=L 57=G L 771 Y 934 G 991 L 1104 MH + - y 9 = 163 = Tyr (Y) Y Precursor ion
m/z = MH+ = RLL Q 529 E =E 113=L 57=G L 771 Y 934 G 991 L 1104 Y Precursor ion =Y =Q 867
m/z = MH+ = RLLQELYGLY y1y1 y8y8 y7y7 y6y6 y5y5 y4y4 y3y3 y2y2 y9y9 b2b2 b3b3 b4b4 b6b6 b5b5 b7b7
MSMS Peptide Fragmentation signal b 1 y 1 b 2 y 2 b 3 y 3 b 4 y 4 b 5 y 5 Ala-Gly-His-Leu-….Phe-Glu-Cys-Tyr
Should we manually interpeter each fragmentation spectrum?
Peptide Sequencing This set of numbers are identificative of the fragmentation spectrum of this peptide
Signals in the fragmentation spectra can be predicted b seriey serie ---b1b1 Y y b2b2 L y9y b3b3 G y8y b4b4 Y y7y b5b5 L y6y b6b6 E y5y b7b7 Q y4y b8b8 L y3y b9b9 L y2y b 10 R y1y b seriey serie ---b1b1 Y y b2b2 L y9y b3b3 G y8y b4b4 Y y7y b5b5 W y6y b6b6 E y5y b7b7 Q y4y b8b8 L y3y b9b9 L y2y b 10 R y1y A single change in the sequence changes the profile of expected signals
m/z Database IIGHFYDDWCPLK SPAFDSIMAETLK AFDSLPDDIHEK GGILAQSPFLIIK Real spectrum cross- correlated with theoretical spectrum x m/z Database searches compares the experimental MSMS spectra with the virtual spectra of the peptides generated by the in silico digestion of the proteins in the database
100 fmol BSA injected on column. BPC of m/z , and typical MS/MS spectrum (right inset) Time [min ] MS trace MS/MS trace MS m/z y2 b3 y3 y4 y5 b7 y7 b8 y6 b9 y9 y10 b11 b12 MS/MS NanoLC MS/MS
Database search
Database search with raw data from LC-MSMS MH + data Fragmentation spectra are automatically uploaded
Why Trypsin is preferred? Because MS sees only ions….. Upon fragmentation, the presence of a positively charged residue (Lys or Arg), will ensure the presence of a charge on fragments on the C- terminal side, generating the y serie The amino group at the N-terminus should ensure the presence of a charge on fragments on the N- terminal side, generating the b serie
For this kind of sample we do not use any fixed modification
Variable Modifications take into account modifications induced by sample treatment and/or sample deterioration
These informations depend on the instrument
A typical output of the results
Proteins are ranked as a function of decreasing score Proteins are grouped into families using a novel hierarchical clustering algorithm. If the family contains multiple members, the accessions, scores and descriptions are aligned with a dendrogram, which illustrates the degree of similarity between members. A short description is given of the hit, with the organism of provenance
-The Report Builder tab allows you to build a customised table of protein hits, which is particularly useful if you need a minimal list of proteins for a publication. Number of peptide matchesNumber of significant peptide matches (above the significance threshold) Number of indipendent sequences Number of significant indipendent sequences (above the significance threshold)
A hit Experimental m/z valueExperimental m/z transformed to a relative molecular mass molecular mass calculated from the matched peptide sequence Difference (error) between the experimental and calculated masses Ions score - If there are duplicate matches to the same peptide, then the lower scoring matches are shown in brackets. Expectation value for the peptide match. (The number of times we would expect to obtain an equal or higher score, purely by chance. The lower this value, the more significant the result). A letter U if the peptide sequence is unique to the protein family sequence Sequence of the peptide in 1-letter code. If the peptide sequence is modified, each affected residue is underlined.
Clicking the number These are other possible interpretations of the same fragmentation spectrum Highlighting the number (it is the spectrum number in the query)
The matched fragment ions are shown in tabular format below the spectrum. The ion series are those specified by the INSTRUMENT search parameter. If you choose to label the matches used for scoring, bold italic red means the series contributed to the score. Bold red means that the number of matches in the ion series is greater than would be expected by chance, indicating that the ion series is present. Non-bold red means that the number of matches in the ion series is no greater than would be expected by chance, so that the matches themselves may be by chance.
Advantages: - Univoque identifications - Multiple identifications - Few peptides are sufficient - Proteins in mixtures can be distinguished - Reduced relevance of protein contamination - Deductive results: no hypothesys is requested - Organisms can be recognized and differentiated