Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,
Protein Sequencing and Identification by Mass Spectrometry.
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Mass Spectrometry in a drug discovery setting Claus Andersen Senior Scientist Sienabiotech Spa.
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Protein Sequencing and Identification by Mass Spectrometry.
Fa 05CSE182 CSE182-L7 Protein sequencing and Mass Spectrometry.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
Mass Spectrometry Peptide identification
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Fa 06CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
ProReP - Protein Results Parser v3.0©
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012.
Analysis of tandem mass spectra - I Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Previous Lecture: Regression and Correlation
1 An Algorithmic Approach to Peptide Sequencing via Tandem Mass Spectrometry Ming-Yang Kao Department of Computer Science Northwestern University Evanston,
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
es/by-sa/2.0/. Large Scale Approaches to the Study of Protein Levels and Activity Prof:Rui Alves
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
Algorithmic Problems in Peptide Sequencing
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
Protein Identification by Sequence Database Search Nathan Edwards Department of Biochemistry and Mol. & Cell. Biology Georgetown University Medical Center.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
Protein Identification Using Tandem Mass Spectrometry Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
Proteomics: Technology and Cell Signaling Presenter: Ido Tal Advisor: Prof. Michal Linial י " ג סיון תשע " ה.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
A Database of Peak Annotations of Empirically Derived Mass Spectra
Protein Identification via Database searching
De novo interpretation of peptide mass spectra
Proteomics Informatics David Fenyő
Proteomics Informatics –
Protein Identification Using Tandem Mass Spectrometry
Bioinformatics for Proteomics
High level view of the MAE algorithm.
Proteomics Informatics David Fenyő
Kuen-Pin Wu Institute of Information Science Academia Sinica
(Journal of Computational Biology, 2001) (SODA, 2000)
Presentation transcript:

Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005

Outline Proteomics Tandem Mass Spectrometry Peptide Identification Problem Identification Via Database De novo peptide identification

Proteomics The systematic analysis of the proteins expressed by a cell or tissue. Identification, Quantification, intractions,… Tandem Mass spectrometry is an essential tool for identification (and quantification) of the proteins in a mixture.

Proteins Primary structure of the proteins is a sequence in an alphabet of size 20 of amino acids.

Amino Acids

`

Tandem Mass Spectrum: An Example Secondary Fragmentation Ionized parent peptide

What is the goal ? Spectrum  Peptide sequence

Protein Backbone H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH R i-1 RiRi R i+1 AA residue i-1 AA residue i AA residue i+1 N-terminus C-terminus

Breaking of Protein Backbone H...-HN-CH-CO NH-CH-CO-NH-CH-CO-…OH R i-1 RiRi R i+1 AA residue i-1 AA residue i AA residue i+1 N-terminus C-terminus H+H+

How Does a Peptide Fragment? m(y 1 )=19+m(A 4 ) m(y 2 )=19+m(A 4 )+m(A 3 ) m(y 3 )=19+m(A 4 )+m(A 3 )+m(A 2 ) m(b 1 )=1+m(A 1 ) m(b 2 )=1+m(A 1 )+m(A 2 ) m(b 3 )=1+m(A 1 )+m(A 2 )+m(A 3 )

The identification Algorithms Database Search Algorithms (Sequest, Mascot, …) De novo Algorithms (Lutefisk, Peaks,…)

Database Search Algorithms Interpreting the tandem mass spectral data by searching a protein database. SEQUEST (Eng. et al. 1994) Mascot (Perkins et al. 1999) ProteinProspector (Clauser et al. 1999)

SEQUEST (Eng et al. 94) Protein database is searched to identify the amino acid sequences with mass tolerance of 1. Produce the theoretical spectra for the candidates. Match the theoretical and experimental spectrum using a score function (Xcorr) Rank the candidates using this score.

Other probabilistic models for scores Qin et al. (1997) Danick et al. (2000) Bafna and Edwards (2001)

Why do we need de novo? Unknown genomes of certain organisms. The sequences in the protein database are not accurate. Modifications in Amino Acids: RNA editing, Post-Translational Modifications

Methods Tree Based Search ( Taylor et al. 97) Spectrum Graph Bases Search (Danick et al. 99) Dynamic Programming Algorithm (Chen et al. 2001) AuDeNS (Baginsky et al. 02) Sub-Optimal Algorithm (Lu and Chen 03) …

De Novo Identification Given a spectrum S and a defined scoring function f(), find a peptide q sequence which maximizes f(S|q).

AuDeNS Using Grass Mowers to preprocess the spectrum, and then employs the dynamic programming approach. Compute a relevance for peaks by using different mowers. Apply a weighted version of Chen et al. algorithm (DP).

Mowers Threshold Mower Window Mower Isotope Mower Intersection Mower Complement Mower

Summary: De novo Sequencing Sequence

Intensities Intensities are the second dimension of the information in spectrum. Different factors play roles in determination of the intensities.

Intensities (2) Amino Acid dependent factors, Ion type factors, Position-based factors (peaks in the middle of the spectrum are higher)

Conclusion Tandem Mass Spectrometry is now the most important tool to identify the proteins. Many approaches have been developed but there is still a long way into extracting all information which can be obtained from the mass spectra.

Research Themes A mixture of De Novo and Database method. (ex. Extracting tags) Using the intensities Dealing better with the PTMs. (200 types) High-throughput Experiences Clustering. Multi-Dimensional Interpretation.