1 An Algorithmic Approach to Peptide Sequencing via Tandem Mass Spectrometry Ming-Yang Kao Department of Computer Science Northwestern University Evanston,

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Protein Sequencing and Identification by Mass Spectrometry.
Fa 05CSE182 CSE182-L7 Protein sequencing and Mass Spectrometry.
CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.
1. Elements of the Genetic Algorithm  Genome: A finite dynamical system model as a set of d polynomials over  2 (finite field of 2 elements)  Fitness.
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
Introduction to BioInformatics GCB/CIS535
Mass Spectrometry Peptide identification
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
The restriction mapping problem revisited Gopal Pandurangan and H. Ramesh Journal of Computer and System Sciences 526~544(2002)
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
Multiple Sequence Alignment
Applications of protomic Presented By: Muhammad Rizwan Roll no: Department of Bioinformatics.
Previous Lecture: Regression and Correlation
My contact details and information about submitting samples for MS
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
Proteomics Understanding Proteins in the Postgenomic Era.
Proteomics Josh Leung Biology 1220 April 13 th, 2010.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Motif Discovery in Protein Sequences using Messy De Bruijn Graph Mehmet Dalkilic and Rupali Patwardhan.
LECTURE 2 Splicing graphs / Annoteted transcript expression estimation.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Finish up array applications Move on to proteomics Protein microarrays.
Generating Peptide Candidates from Protein Sequence Databases for Protein Identification via Mass Spectrometry Nathan Edwards Informatics Research.
Algorithmic Problems in Peptide Sequencing
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Central dogma: the story of life RNA DNA Protein.
By: Amira Djebbari and John Quackenbush BMC Systems Biology 2008, 2: 57 Presented by: Garron Wright April 20, 2009 CSCE 582.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
1 I. Introduction 1.Definition: Protein Characterization/Proteomics i.Classical Proteomics ii.Functional Proteomics 2.Mass spectrometery I.Advantages in.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
1 Department of Engineering, 2 Department of Mathematics,
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Proteomics Informatics David Fenyő
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
CISC 667 Intro to Bioinformatics (Spring 2007) Genetic networks and gene expression data CISC667, S07, Lec24, Liao.
Proteomics Informatics David Fenyő
(Journal of Computational Biology, 2001) (SODA, 2000)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

1 An Algorithmic Approach to Peptide Sequencing via Tandem Mass Spectrometry Ming-Yang Kao Department of Computer Science Northwestern University Evanston, Illinois U. S. A.

2 Collaborators of This Project University of Southern California Ting Chen Harvard Medical School George M. Church John Rush Matthew Tepel

3 Perspectives A key goal of bioinformatics: To study biological systems based on global knowledge of genomes, transcriptomes, and proteomes. Genome: entire sets of materials in the chromosomes. Transcriptome: entire sets of gene transcripts. Proteome: entire sets of proteins. Genome (DNA)  Transcriptome (RNA)  Proteome (Protein)

4 Perspectives A key goal of bioinformatics: To study biological systems based on global knowledge of genomes, transcriptomes, and proteomes. Genome: entire sets of materials in the chromosomes. Transcriptome: entire sets of gene transcripts. Proteome: entire sets of proteins. Genome (DNA)  Transcriptome (RNA)  Proteome (Protein) this talk’s focus

5 Proteomics Proteome: all proteins encoded within a genome –half millions distinct proteins (temporal, spatial, modifications) –~30,000 human genes –mRNA and protein expressions may not correlate Proteomics: study of protein expression by biological systems –relative abundance and stability; post-translational modifications –fluctuations as a response to environment and altered cellular needs –correlations between protein expression and disease state –protein-protein interactions, protein complexes Technologies: –2D gel electrophoresis –mass spectrometry –yeast two-hybrid system –protein chips this talk’s focus

6 A Key Step of Proteomics How to sequence proteins? How to sequence protein peptides? (this talk’s focus)

7 Outline of This Talk 1.Problem Formulation (Biology) 2.Problem Formulation (Computer Science) 3.Basic Computational Techniques 4.Improved Computational Complexity and More Robust Algorithms 5.Conclusions

8 Outline of This Talk (1) 1.Problem Formulation (Biology) 2.Problem Formulation (Computer Science) 3.Basic Computational Techniques 4.Improved Computational Complexity and More Robust Algorithms 5.Conclusions

9 Protein Identification: HPLC-MS-MS Mass/Charge Tandem Mass Spectrum Mass/Charge Proteins Peptides One PeptideB-ions / Y-ions

10 Protein Identification: HPLC-MS-MS Mass/Charge Tandem Mass Spectrum Mass/Charge Proteins Peptides One Peptide B-ions / Y-ions

11 Peptide Fragmentation and Ionization B-ionY-ion Complementary: Mass(B-ion)+Mass(Y-ion) = Mass(peptide)+4H+O

12 B-ions and Y-ions Fragmentation

13 Tandem Mass Spectrum Mass / Charge Abundance (100%)

14 Raw Tandem Mass Spectrum

15 Prediction from Raw Tandem Mass Spectrum

16 Protein Database Search Find the peptide sequences in a protein database that optimally fit the spectrum. It does not work if the target peptide sequence is not in the database. It does not work if there is an unknown modification at some amino acid. It is very slow because it must search the entire database. E.g., SEQUEST, Yates, Univ. of Washington.

17 De Novo Peptide Sequencing Problem Input: (1) the mass W of an unknown target peptide, and (2) a set S of the masses of some or all b-ions and y-ions of the peptide. Output: a peptide P such that (1) mass(P)=W and (2) S is a subset of all the ion masses of P. Mass / Charge Abundance (100%) Peptide Mass Daltons P = SWR, Mass(P) = , Ions(P) = { , , , , , }

18 Tandem Mass Spectrum Mass / Charge Abundance (100%) Peptide Mass Daltons

19 Amino Acid Mass Table

20 Feature 1 All B-ions form a forward mass ladder. Mass / Charge Abundance (100%) S W R Peptide Mass Daltons b1b1 b2b2 b3b3 1

21 Feature 2 All Y-ions form a reverse mass ladder. Mass / Charge Abundance (100%) S W R R W S Peptide Mass Daltons y1y1 y2y2 y3y3 19

22 Basic Difficulty #1 It is unknown whether an ion is a B-ion or an Y-ion. Mass / Charge Abundance (100%) Peptide Mass Daltons

23 Basic Difficulty #2 There are missing ions. Mass / Charge Abundance (100%) Ion 1 Ion 2 Peptide Mass Daltons

24 Feature 3 (to our Rescue) Complementary Ion Pairs: b 1 / y 2 and b 2 / y 1 Mass / Charge Abundance (100%) S W R R W S Peptide Mass Daltons y1y1 y2y2 y3y3 b1b1 b2b2 b3b3

25 Outline of This Talk (2) 1.Problem Formulation (Biology) 2.Problem Formulation (Computer Science) 3.Basic Computational Techniques 4.Improved Computational Complexity and More Robust Algorithms 5.Conclusions

26 Formulating the Computational Problem 1.T = an alphabet of 20 characters a 1,a 2,…,a two special characters: alpha and beta. 3.the mass of alpha = 1, the mass of beta = 19, the mass of a i is m i. 4.A peptide sequence is x 1,x 2,x 3,…,x n-1,x n, where each x i is from T. 5.A b-ion is x 0,x 1,x 2,…,x i for some 1 <= i <= n, where x 0 = alpha. 6.A y-ion is x i,…,x n-2,x n-1,x n, x n+1 for some 1 <= i <= n, where x n+1 = beta.

27 De Novo Peptide Sequencing Problem Input: (1) the mass W of an unknown target peptide, and (2) a set S of the masses of some or all b-ions and y-ions of the peptide. Output: a peptide P such that (1) mass(P)=W and (2) S is a subset of all the ion masses of P. Mass / Charge Abundance (100%) Peptide Mass Daltons P = SWR, Mass(P) = , Ions(P) = { , , , , , }

28 Amino Acid Mass Table

29 Outline of This Talk (3) 1.Problem Formulation (Biology) 2.Problem Formulation (Computer Science) 3.Basic Computational Techniques 4.Improved Computational Complexity and More Robust Algorithms 5.Conclusions

30 peptide mass Wtandem mass spectrum S NC-spectrum graph Find feasible paths to order the masses in S to identify all the b-ions and y-ions consistent with S. Basic Computing Scheme Convert feasible paths into legal peptide sequences

31 NC-Spectrum Graph: Nodes (1) N0N0 C0C0 mass of this peptide

32 NC-Spectrum Graph: Nodes (2) mass of this peptide N0N0 C0C mass( ) + mass( ) = mass(P) + 18 Ion # 1 (274.11) Assumption 1: If Ion 1 is an y-ion C1: a b-ion node Assumption 2: If Ion 1 is a b-ion N1: a b-ion node C1C1 N1N1

33 NC-Spectrum Graph: Nodes (3) N0N0 C0C mass( ) + mass( ) = mass(P) + 18 Ion # 2 (88.10) C1C1 N1N1 C2C2 N2N2

34 NC-Spectrum Graph: Edges (1) N0N0 C0C C1C1 N1N1 C2C2 N2N2 Mass(S) = S

35 NC-Spectrum Graph: Edges (2) N0N0 C0C C1C1 N1N1 C2C2 N2N2 Mass(S) = S Mass(W) = W

36 NC-Spectrum Graph: Edges (3) N0N0 C0C C1C1 N1N1 C2C2 N2N2 Mass(S) = S Mass(W) = W S+W Mass(S+W) =

37 NC-Spectrum Graph: Edges (4) N0N0 C0C C1C1 N1N1 C2C2 N2N2 Mass(S) = S Mass(W) = W S+W Mass(S+W) = R Mass(R) =

38 NC-Spectrum Graph N0N0 C0C C1C1 N1N1 C2C2 N2N2

39 NC-Spectrum Graph: Paths = Sequences N0N0 C0C C1C1 N1N1 C2C2 N2N2 S WR b-ions

40 NC-Spectrum Graph: A Feasible Path (1) N0N0 C0C C1C1 N1N1 C2C2 N2N2 Definition: A feasible path is a path from N 0 to C 0 that goes through exactly one node for each pair (either N j or C j ). a feasible path S WR b-ions

41 NC-Spectrum Graph: A Feasible Path (2) N0N0 C0C C1C1 N1N1 C2C2 N2N2 Definition: A feasible path is a path from N 0 to C 0 that goes through exactly one node for each pair (either N j or C j ). a feasible path SS GVV b-ions y-ions

42 NC-Spectrum Graph: Not A Feasible Path (1) N0N0 C0C C1C1 N1N1 C2C2 N2N2 Definition: A feasible path is a path from N 0 to C 0 that goes through exactly one node for each pair (either N j or C j ). not a feasible path: (1)miss ion #2

43 NC-Spectrum Graph: Not A Feasible Path (2) N0N0 C0C C1C1 N1N1 C2C2 N2N2 Definition: A feasible path is a path from N 0 to C 0 that goes through exactly one node for each pair (either N j or C j ). not a feasible path: (2) repeat ion #1

44 NC-Spectrum Graph: Not A Feasible Path (3) N0N0 C0C C1C1 N1N1 C2C2 N2N2 Definition: A feasible path is a path from N 0 to C 0 that goes through exactly one node for each pair (either N j or C j ). not a feasible path: (1)miss ion #2 (2)repeat ion #1

45 Reformulating the De Novo Peptide Sequencing Problem Input: an NC-spectrum graph G. Output: a feasible path from N 0 to C 0.

46 Observations A longest path does not always go through exactly one of each pair of nodes. It is an NP-hard problem if the spectrum graph is a general directed graph.

47 Basic Algorithm Input: a peptide mass W and a tandem mass spectrum S. Output: a feasible peptide sequence. Steps: 1.Compute the nodes of the NC-spectrum graph G. 2.Compute the edges of G. 3.Compute a feasible path P in G. 4.Convert P into a feasible sequence.

48 Basic Algorithm (1) Input: a peptide mass W and a tandem mass spectrum S. Output: a feasible peptide sequence. Steps: 1.Compute the nodes of the NC-spectrum graph G. 2.Compute the edges of G. 3.Compute a feasible path P in G. 4.Convert P into a feasible sequence.

49 Compute the Nodes of the NC-Spectrum Graph Step 2. Rename the nodes from left to right as X 0,…, X k,Y k,…,Y X0X0 Y0Y X2X2 Y2Y2 Y1Y1 X1X N0N0 C0C C1C1 N1N1 C2C2 N2N2 Step 1. Compute the nodes and place them in the increasing order of masses. Observation: X i and Y i form a complementary pair of nodes N i and C i for ion i. Running Time: O(k), where k = # of masses in the spectrum.

50 Basic Algorithm (2) Input: a peptide mass W and a tandem mass spectrum S. Output: a feasible peptide sequence. Steps: 1.Compute the nodes of the NC-spectrum graph G. 2.Compute the edges of G. inverse of each other 3.Compute a feasible path P in G. 4.Convert P into a feasible sequence.

51 Compute the Edges of the NC-Spectrum Graph X0X0 Y0Y X2X2 Y2Y2 Y1Y1 X1X1 Basic Question: Given a mass u, is there a protein sequence with that mass? Solution: dynamic programming via a Boolean array E( ). 1.precision = Boolean array length L = peptide mass W / precision. 3.Boolean array E(u/0.01) = 1 if u is the mass of a peptide; otherwise 0. 4.dynamic programming E(j) = 1 if only E(j – m i ) =1 for some amino acid mass m i. 5.Running Time: (1) Computing E() takes O(L) time; or O(L/log L) via 4-Russian preprocessing. (2) Computing the edges takes O(k^2) time.

52 Basic Algorithm (3) Input: a peptide mass W and a tandem mass spectrum S. Output: a feasible peptide sequence. Steps: 1.Compute the nodes of the NC-spectrum graph G. 2.Compute the edges of G. 3.Compute a feasible path P in G. 4.Convert P into a feasible sequence.

53 Compute a Feasible Path (1) X0X0 Y0Y Y1Y1 X1X X0X0 Y0Y X2X2 Y2Y2 Y1Y1 X1X1 Recursion: Use the feasible paths of X 0,…, X i,Y j,…,Y 0 to compute the feasible paths of X 0,…, X i, X i+1,Y j+1,Y j,…,Y 0. Dynamic Programming: M(i,j) = 1 if there exist a path PL from X 0 to X i and a path PR from Y j to Y 0 such that PL and PR together contain exactly one of X q and Y q for each q = 0, …, max{i,j}. Observation: There is a feasible path if and only if (1) for some i and k, there is an edge e from X i to Y k and M(i,k) = 1, or (2) for some k and j, there is an edge e from X k to Y j and M(k,j) = 1

54 Compute a Feasible Path (2) Dynamic Programming: M(i,j) = 1 if there exist a path PL from X 0 to X i and a path PR from Y j to Y 0 such that PL and PR together contain exactly one of X q and Y q for each q = 0, …, max{i,j}. Observation: There is a feasible path if and only if (1) for some i and k, there is an edge e from X i to Y k and E(i,k) = 1, or (2) for some k and j, there is an edge e from X k to Y j and E(k,j) = 1 X0X0 Y0Y0 YkYk XiXi PLPR e X0X0 Y0Y0 XkXk PLPR e YjYj

55 Compute a Feasible Path (3) Dynamic Programming: M(i,j) = 1 if there exist a path PL from X 0 to X i and a path PR from Y j to Y 0 such that PL and PR together contain exactly one of X q and Y q for each q = 0, …, max{i,j}. Base Case: M(0,0), M(0,1), M(1,0). Recurrence: (1)If M(i,j-1) = 1 and edge(X i, X j ) = 1, then M(j,j-1) = 1. (2)If M(i,j-1) = 1 and edge(Y j, Y j-1 ) = 1, then M(i,j) = 1. (3)If M(j-1,i) = 1 and edge(X j-1, X j ) = 1, then M(j,i) = 1. (4)If M(j-1,i) = 1 and edge(Y j, Y i ) = 1, then M(j-1,j) = 1. Idea: Extend PL and PR by one edge at a time.

56 Compute a Feasible Path (4) Dynamic Programming: M(i,j) = 1 if there exist a path PL from X 0 to X i and a path PR from Y j to Y 0 such that PL and PR together contain exactly one of X q and Y q for each q = 0, …, max{i,j}. Recurrence: (1)If M(i,j-1) = 1 and edge(X i, X j ) = 1, then M(j,j-1) = 1. (2)If M(i,j-1) = 1 and edge(Y j, Y j-1 ) = 1, then M(i,j) = 1. (3)If M(j-1,i) = 1 and edge(X j-1, X j ) = 1, then M(j,i) = 1. (4)If M(j-1,i) = 1 and edge(Y j, Y i ) = 1, then M(j-1,j) = 1. X0X0 Y0Y0 Y j-1 XiXi PLPR e XjXj YjYj X0X0 Y0Y0 Y j-1 XiXi PLPR e XjXj YjYj

57 Compute a Feasible Path (5) Dynamic Programming: M(i,j) = 1 if there exist a path PL from X 0 to X i and a path PR from Y j to Y 0 such that PL and PR together contain exactly one of X q and Y q for each q = 0, …, max{i,j}. Recurrence: (1)If M(i,j-1) = 1 and edge(X i, X j ) = 1, then M(j,j-1) = 1. (2)If M(i,j-1) = 1 and edge(Y j, Y j-1 ) = 1, then M(i,j) = 1. (3)If M(j-1,i) = 1 and edge(X j-1, X j ) = 1, then M(j,i) = 1. (4)If M(j-1,i) = 1 and edge(Y j, Y i ) = 1, then M(j-1,j) = 1. Computational Complexity: O(k^2).

58 Algorithmic Result #1: Finding a Feasible Path Input: an NC-Spectrum Graph G=(V,E) Output: a feasible path in G. Computational Complexity: O(|V| 2 ) time & O(|V| 2 ) space.

59 Outline of This Talk (4) 1.Problem Formulation (Biology) 2.Problem Formulation (Computer Science) 3.Basic Computational Techniques 4.Improved Computational Complexity and More Robust Algorithms 5.Conclusions

60 Algorithmic Result #2: Finding a Feasible Path (Improved) Input: an NC-spectrum graph G=(V,E). Output: A feasible path can be found in O(|V|+|E|) time. Idea: Speed up via pre-processing.

61 Amino Acid Modifications A modification is an amino acid with slightly different atoms (and thus a different mass) from the typical molecule. Importance of modifications: Amino acid modifications are related to functions. For example, a protein is active when phosphorylated and inactive when de-phosphorylated.

62 Modification in the Tandem Mass Spectrum Mass / Charge Abundance (100%) S W+d R R S

63 Spectrum Graph: As Before

64 Spectrum Graph: A Modified Feasible Path Idea: One mass change leads to one missing edge.

65 Algorithmic Result #3: Finding One Modification A modification is an amino acid with slightly different atoms (and thus a different mass) from the typical molecule. Theorem: Finding the position of the modification takes O(|V|+|E|) space and O(|V| |E|) time.

66 Algorithmic Result #4: Noisy Data Define a scoring function s(): –s(edge) = function(mass). –s(node) = function(abundance). Redefine the problem: Find the maximum score path that goes through at most one node for each ion. Solution: dynamic programming in O(|V|+|E|) space and O(|V| |E|) time.

67 Outline of This Talk (5) 1.Problem Formulation (Biology) 2.Problem Formulation (Computer Science) 3.Basic Computational Techniques 4.Improved Computational Complexity and More Robust Algorithms 5.Conclusions

68 Further Difficulties for Tandem Mass Spectrum Interpretation Each ion has a couple of isotopic forms. Other ions (a or z) may appear. Some ions may lose a water or an ammonia. Multiple ion charges. Noise. Amino acid modifications.

69 Further Research Directions Efficient algorithms to deal with more modifications in conjunction with data noise. Efficient algorithms to combine de novo peptide sequencing with peptide database search. Efficient algorithms to assess statistical significance of feasible peptide sequences. Efficient algorithms to deal with multiple peptides. Practical implementation; speed-up via preprocessing. More …

70 Further Research Directions Looking for top-rate graduate students for this project (and other projects). Immediate and expedited admission for the coming fall semester.