Presentation is loading. Please wait.

Presentation is loading. Please wait.

De Novo Peptide Sequencing via Probabilistic Network Modeling PepNovo.

Similar presentations


Presentation on theme: "De Novo Peptide Sequencing via Probabilistic Network Modeling PepNovo."— Presentation transcript:

1 De Novo Peptide Sequencing via Probabilistic Network Modeling PepNovo

2 Peptide Fragmentation ACFETPGR N C ACFET N PGR C M PM-M Collision-Induced Dissociation (CID)

3 Peptide Fragmentation  A peptide with mass PM, that fragments into a prefix of mass m, and a suffix of mass PM-m, can produce different fragment ions:  The intensities at the expected offsets from mass m are used to create an intensity vector: Prefix ionpositionSuffix Ionposition bm+1yPM-m+19 b-H 2 Om-17y-NH 3 PM-m+2 b +2 (m+2)/2y-H 2 O-H 2 OPM-m-17...

4 The Spectrum Graph

5 Scoring for De Novo Sequencing  All masses in spectrum range can be considered putative cleavage sites.  Given observed intensities, how to evaluate if mass m is cleavage site.  A common statistical tool used by many scoring functions is the likelihood ratio test (Dancik et al. 99’, Havilio et al. 03’,...)

6 Dancik et al. ’99 – Hypotheses  The main concept: Give premium for present peaks and penalties for missing peaks.  Uses a probability table:  P R – Probability of observing random peak (~0.1) (Random hypothesis). FragmentProbability y0.71 (P 1 ) b0.66 (P 2 ) a0.26 (P 3 ) y-H 2 OH 2 O0.09 (P k ) Fragmentation Hypothesis

7 Scoring a Cleavage Site (Dancik ‘99)  Out of k possible ions for cleavage at m, t are detected (w.l.o.g fragments 1,..,t ) and k-t are missing ( t+1,..,k ).  Score using a log ratio test: Probability of cleavage site m according to Fragmentation hypothesis Probability of cleavage site m according to Random hypothesis

8 PepNovo Scoring  PepNovo implements a similar likelihood ratio test mechanism.  Can be viewed as extending the scoring model of Dancik et al. 99’.  Includes several factors that are not sufficiently addressed in current scoring functions.

9 Enhancements to Dancik et al. (’99) 1. Several Intensity values. 2. Combinations of fragment ions. 3. Incorporation of additional chemical knowledge (e.g., preferred cleavage sites). 4. Positional influence of the cleavage site. 5. Improved Random Model.

10 pos(m) (region in peptide) y b y2y2 a b2b2 a-NH 3 a-H 2 O b-NH 3 b-H 2 O y-NH 3 y-H 2 O b-H 2 O-NH 3 b-H 2 O-H 2 O y-H 2 O-NH 3 y-H 2 O-H 2 O N-aa (N-terminal amino acid) C-aa (C-terminal amino acid) H CID - Fragmentation Network Amino acid influence Ion combinations Positional influence posyP(y 2 |y,po s) 000.1 010.22 230.52 430.08

11 Discrete Intensity Values  Peak intensity normalized according to grass level (average of weakest 33% of peaks in spectrum).  Normalized intensities Discretized into 4 intensity levels: zero : I < 0.05 low : 0.05 ≤ I < 2 ( 62% of peaks ) medium : 2 ≤ I < 10 ( 26% of peaks ) high : I ≥ 10 ( 12% of peaks )

12 Combinations of Fragments  Different combinations have significantly different probabilities: P(b=high| y=high) = 0.36, vs. P(b=high| y=low) = 0.03. P(b-H 2 O > zero | b=high) = 0.5, vs. P(b-H 2 O > zero | b= zero) = 0.24. y b y2y2 a b2b2 a-NH 3 a-H 2 O b-NH 3 b-H 2 O y-NH 3 y-H 2 O b-H 2 O- NH 3 b-H 2 O- H 2 O y-H 2 O- NH 3 y-H 2 O- H 2 O

13 Additional Chemical Knowledge  The identity of the flanking amino acids influences the peak intensities: Increased intensities N-terminal to Proline and Glycine Increased intensities C-terminal to Aspartic Acid.  400 amino acid combinations reduced to 15 equivalence sets (X-P,X-G, etc.). N-aa (N-terminal amino acid) C-aa (C-terminal amino acid) y b

14 Positional Influence  Creates separate models for different locations in the peptide  Models phenomena such as: weak b/y ions near the ends. prevalence of a-ions in the first half of the peptides. prevalence of b 2 towards the peptide’s C-terminal and y 2 near the N-terminal. pos(m) (region in peptide) yby2y2 ab2b2

15 Probability under H CID  From the decomposition properties of probabilistic networks, each node is independent from the rest of the nodes given the value of its parents so: where (f) are the parents of node f.

16 H Random – Regional Density Bin 0 1 2 3 Intensity levels 1 2 2 2 2 3 3 Window m/z w 2ε2ε

17 Computing the Random Probability  =1-(2ε)/w, is the probability of a single peak missing the bin.  Let n i, 1≤i≤d, be counts of peaks with intensity i in window w:

18 Random Model for H Random  Peak occurrences are treated as random independent events:  The probability of observing a peak at random is estimated from the local density of peaks in the spectrum.

19 The Likelihood Ratio Score  A putative cleavage site is scored according to the log ratio test:  Can be used to score a peptide by summing the score for the prefix masses:

20 PepNovo’s De Novo Sequencing  A spectrum graph is created from the experimental MS/MS spectrum.  The nodes are scored using our method.  Highest scoring anti-symmetric path is found using dynamic programming algorithm.

21 Spectrum Graph  Acyclic graph.  Nodes are cleavage sites, each has a mass m and score s.  Edges connect nodes with mass differences corresponding to an amino acid. m:0 s:5.0 m:163.2 s: 2.8 m:113 s: -1.2 m:71.2 s: 4.3 m:199.4 s: 5.6 A L m:99.1 s:8.1 V S W Q

22 Results AlgorithmAverage Accuracy Sequence Length Tag 3Tag 4Tag 5Tag 6 PepNov o 0.72710.300.94 6 0.87 1 0.80 0 0.654 Shereng a 0.6908.650.82 1 0.71 1 0.5640.364 Peaks0.67310.320.88 9 0.81 4 0.6890.575 Lutefisk0.5668.790.66 1 0.52 1 0.4250.339 Benchmarking reported for 280 spectra.

23 Q & A


Download ppt "De Novo Peptide Sequencing via Probabilistic Network Modeling PepNovo."

Similar presentations


Ads by Google