Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,

Slides:



Advertisements
Similar presentations
Tandem MS (MS/MS) on the Q-ToF2
Advertisements

Chapter 7 Chemical Quantities
1_Panel Production. 380 pannelli 45 giorni di produzione = 8.4 pannelli/day.
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. *See PowerPoint Lecture Outline for a complete, ready-made.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 116.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 107.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 40.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 28.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 44.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 101.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 38.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 58.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 112.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 75.
Chapter 1 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
0 - 0.
Addition Facts
Version 0.3, 28 January 2004 Slide: 1 APPLICATIONS OF METEOSAT SECOND GENERATION (MSG) DETECTION OF CONTRAILS Author:Jochen Kerkmann (EUMETSAT)
FURTHER MASS SPECTROMETRY KNOCKHARDY PUBLISHING
Break Time Remaining 10:00.
The basics for simulations
Pearls of Functional Algorithm Design Chapter 1 1 Roger L. Costello June 2011.
Research Teaming with Marshmallows. As a team, you will need to construct a catapult that can launch a marshmallow the farthest distance. Your team will.
15. Oktober Oktober Oktober 2012.
We are learning how to read the 24 hour clock
Atom atom atom atom atom 1.True or false? Protons are in the nucleus.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
High School Softball Strike Zone
Addition 1’s to 20.
: 3 00.
5 minutes.
Protein Sequencing and Identification by Mass Spectrometry.
Week 1.
Visions of Australia – Regional Exhibition Touring Fund Applicant organisation Exhibition title Exhibition Sample Support Material Instructions 1) Please.
Clock will move after 1 minute
Chapter 20 Molecular Mass Spectrometry Mass spectrometry is capable of providing information about (1) the elemental composition of samples of matter.
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Depositional Environments (Paleogeography)
Mass Spectrometry The substance being analyzed (solid or liquid) is injected into the mass spectrometer and vaporized at elevated temperature and reduced.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Protein Sequencing and Identification by Mass Spectrometry.
Fa 05CSE182 CSE182-L7 Protein sequencing and Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
My contact details and information about submitting samples for MS
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
1 Chemical Analysis by Mass Spectrometry. 2 All chemical substances are combinations of atoms. Atoms of different elements have different masses (H =
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
Mass spectroscopy – learning objectives Outline the early developments in mass spectrometry. Outline the use of mass spectrometry in the determination.
Temple University MASS SPECTROMETRY INTRODUCTION Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Constructing high resolution consensus spectra for a peptide library
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
A Database of Peak Annotations of Empirically Derived Mass Spectra
Instrumental Chemistry
Bioinformatics Solutions Inc.
Interpretation of Mass Spectra I
Shotgun Proteomics in Neuroscience
Interpretation of Mass Spectra
Presentation transcript:

Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby, Chengzhi Liang, Ming Li The peptide de novo sequencing from MS/MS spectrum

Introduction 4 Tandem mass spectrometry (MS/MS) now plays a very important role in protein identification due to its fastness and its high sensitivity. 4 The derivation of the peptide sequence from its MS/MS spectrum is an important task in proteomics. 4 The derivation without the help from a protein database is called the de novo sequencing which is especially important in the identification of unknown protein.

Introduction (2) 4 The basic lab experimental steps of this method are the following: 4 1. The proteins are digested with an enzyme to produce peptides; 4 2. The peptides are charged (ionized) and separated according to their different mass to charge (m/z) ratios; 4 3. Each peptide is fragmented into fragment ions and the m/z values of the fragment ions are measured.

Introduction (3) 4 Both step 2 and 3 are performed within a tandem mass spectrometer. 4 Since there are many copies of each peptide being fragmented and the fragmentation can occur anywhere along the peptide, a spectrum of the observed m/z values is obtained.

Mass spectrum 4 For each possible fragment ion there could be a peak at the corresponding m/z value. 4 The height of the peak is proportional to the frequency of the m/z value begin observed by the mass spectrometer. 4 In general proteins consist of 20 different types of amino acids, of which most have different masses (except for one pair Leucine and Isoleucine).

Mass spectrum (2) 4 Consequently different peptides usually produce different spectra. 4 It is therefore possible, and now a common practice, to use the spectrum of a peptide to determine its sequence.

Peptide fragmentation 4 A charged peptide may be fragmented into two pieces in three ways, which may produce a pair of a- and x-ions, a pair of b- and y-ions, or a pair of c- and z-ions. 4 Theoretically, a fragmentation can occur at any place in a peptide and a spectrum is expected to contain all the possible ion peaks. 4 In practice, due to uneven strength of the bonds at different positions, different ions occur with different frequencies.

Peptide fragmentation (2)

Peptide fragmentation (3) 4 The most abundant ions are y-ions, which often form the complete series in a spectrum. 4 The next are a- and b-ions, of which many are not observed. 4 The c-, x-, and z-ions occur much less frequently. 4 In addition, these ions can often form new ions due to loss of water or loss of ammonia.

The approximate masses of some atoms that appear in peptides, where C 13 is the isotope of C 4 Atom C C 13 H O N 4 Mass(Dalton)

Mass of an amino acid 4 For any amino acid a, we use ||a|| to denote the mass of C 2 H 2 RNO, i.e., the amino acid a with loss of a water. 4 For P=a 1 a 2 … a k being a sequence of amino acids, let ||P|| =  1  j  k ||a j ||. 4 Therefore the actual mass of peptide P is 18+||P|| because the extra H 2 O in it.

The approximate masses of the 20 amino acids 4 Amino acid A R N D 4 Mass (Dalton) Amino acid C E Q G 4 Mass(Dalton) Amino acid H I L K 4 Mass (Dalton) Amino acid M F P S 4 Mass (Dalton) Amino acid T W Y V 4 Mass (Dalton)

The hypothetical spectrum of P 4 Let A=a 1 a 2 … a n be a sequence of amino acids, we introduce two notations: ||A|| b = 1+||A|| ||A|| y =19+||A||

The hypothetical spectrum of P (2) 4 Let b i be the mass of the b-ion of P with i amino acids, then b i = ||a 1 a 2 …a i || b (1  i < k). 4 Let y i be the mass of the y-ion of P with i amino acids, then y i =||a k-i+1 …a k || y (1  i < k). Clearly, y k-i +b i =20+||P||

The hypothetical spectrum of P (3) 4 Around each y-ion peak, it is possible to have other peaks. 4 For each y-ion with mass x, the corresponding x- ion and z-ion weigh x+26 and x An ion may loss a water to generate a peak at mass x An ion with mass x usually has a peak at x+1 corresponding to the isotopic ion which contains a C 13 in it.

The hypothetical spectrum of P (4) 4 Therefore, for each y-ion with mass x, there are possible peaks at the masses in the following set. 4 Y(x)={x-18,x-17,x,x+1,x+26} 4 Similarly for each b-ion with mass x, the possible masses are from the following set. 4 B(x)={x-28,x-18,x,x+1,x+17}

The hypothetical spectrum of P (5) 4 Therefore, the hypothetical spectrum of the peptide P has peaks at each mass in the following set. 4 S(P)=  0<i< n B(b i )  Y(y i )

The de novo sequencing problem 4 Let P be a peptide and M=||P|| Given a solution containing peptide P, a tandem mass spectrometer can measure a peak list L. 4 L is a set of 2-mers {(x i,h i )| 0 < i < n+1} where 0 < x 1 < … < x n are the masses and h i is the intensity of the peak at x i. 4 The total mass of P=M-2 can also be measured.

The de novo sequencing problem (2) 4 The masses given by the spectrometer are not accurate. 4 The maximum error varies from  0.01 dalton to  0.5 dalton depending on the type of spectrometer used.

The de novo sequencing problem (3) 4 Let  be the error of the spectrometer. 4 Let S be a set of masses, we say a peak (x,h) in L is supported by S if there is a y in S such that |x-y| < . 4 The subset of peaks in L supported by S is denoted by L S. 4 L S ={(x,h)  L|there is y  S s.t. |x-y|<  }

The de novo sequencing problem (4) 4 Therefore L S(P) consists of all the peaks in L that are supported by the masses of the hypothetical ions of P 4 The more peaks with high intensity are in L S(P), the more likely L is the mass spectrum of P.

The de novo sequencing problem (5) 4 For any peak list L’, we define h(L’)=  (x,h)  L’ h 4 The de novo sequencing problem is defined as the follows. 4 Given a mass spectrum L, a positive number M, and an error bound , to construct a peptide P so that | ||P||+20-M | <  and h(L S(P) ) is maximized.

Algorithms 4 There are two major difficulties of the de novo sequencing problem. 4 First, each fragmentation may produce a pair of ions. 4 This means that both ends of the spectrum must be consider at the same time.

Algorithms (2) 4 Second, the types of the peaks is unknown and a peak may be matched by zero, one or two different types of ions. 4 When a peak is matched by two ions, the height of the peak can only be counted once

Algorithms (3) 4 The straightforward approach to “grow” the peptide from one terminal to the other does not work. 4 We use a more sophisticated dynamic programming algorithm for the de novo sequencing problem. 4 Our algorithm gradually “grow” a prefix and a suffix of the optimal solution in a carefully designated pathway until the prefix and the suffix are sufficiently long to form the optimal solution.

Experiments 4 Our model and algorithm account for most of the ion types that have been observed in practice. 4 Overlap of two different ions are correctly modeled. 4 Tolerant the mass error and handle the missing ions in the spectrum.

Experiments (2) 4 Experimental results demonstrated that our algorithm performed extremely well. 4 The program has been integrated into a software package, peaks, which is now online accessible at