The dynamic nature of the proteome

Slides:



Advertisements
Similar presentations
Tandem MS (MS/MS) on the Q-ToF2
Advertisements

Protein Sequencing and Identification by Mass Spectrometry.
Protein Sequencing and Identification by Mass Spectrometry
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Profiles for Sequences
Mass Spectrometry in a drug discovery setting Claus Andersen Senior Scientist Sienabiotech Spa.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Mass Spectrometry Peptide identification
Sequence similarity.
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Mass spectrometry in proteomics Modified from: I519 Introduction to Bioinformatics, Fall, 2012.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Previous Lecture: Regression and Correlation
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
Scaffold Download free viewer:
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
1 Mass Spectrometry-based Proteomics Xuehua Shen (Adapted from slides with textbook)
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Common parameters At the beginning one need to set up the parameters.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Chapter 3 Computational Molecular Biology Michael Smith
Introduction to Bioinformatics Algorithms Protein Sequencing and Identification by Mass Spectrometry.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Introduction to Bioinformatics Algorithms Protein Sequencing and Identification by Mass Spectrometry.
Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
De Novo Peptide Sequencing via Probabilistic Network Modeling PepNovo.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Post translational modification n- acetylation Peptide Mass Fingerprinting (PMF) is an analytical technique for identifying unknown protein. Proteins to.
Introduction to Profile HMMs
The ideal approach is simultaneous alignment and tree estimation.
Proteomics Informatics David Fenyő
Intro to Alignment Algorithms: Global and Local
Interpretation of Mass Spectra I
NoDupe algorithm to detect and group similar mass spectra.
Presentation transcript:

The dynamic nature of the proteome The proteome of the cell is changing Various extra-cellular, and other signals activate pathways of proteins. A key mechanism of protein activation is post-translational modification (PTM) These pathways may lead to other genes being switched on or off Mass spectrometry is key to probing the proteome and detecting PTMs

Post-Translational Modifications Proteins are involved in cellular signaling and metabolic regulation. They are subject to a large number of biological modifications. Almost all protein sequences are post-translationally modified and 200 types of modifications of amino acid residues are known.

Examples of Post-Translational Modification Post-translational modifications increase the number of “letters” in amino acid alphabet and lead to a combinatorial explosion in both database search and de novo approaches.

Search for Modified Peptides: Virtual Database Approach Yates et al.,1995: an exhaustive search in a virtual database of all modified peptides. Exhaustive search leads to a large combinatorial problem, even for a small set of modifications types. Problem (Yates et al.,1995). Extend the virtual database approach to a large set of modifications.

Exhaustive Search for modified peptides. YFDSTDYNMAK 25=32 possibilities, with 2 types of modifications! Phosphorylation? Oxidation? For each peptide, generate all modifications. Score each modification.

Peptide Identification Problem Revisited Goal: Find a peptide from the database with maximal match between an experimental and theoretical spectrum. Input: S: experimental spectrum database of peptides Δ: set of possible ion types m: parent mass Output: A peptide of mass m from the database whose theoretical spectrum matches the experimental S spectrum the best

Modified Peptide Identification Problem Goal: Find a modified peptide from the database with maximal match between an experimental and theoretical spectrum. Input: S: experimental spectrum database of peptides Δ: set of possible ion types m: parent mass Parameter k (# of mutations/modifications) Output: A peptide of mass m that is at most k mutations/modifications apart from a database peptide and whose theoretical spectrum matches the experimental S spectrum the best

Database Search: Sequence Analysis vs. MS/MS Analysis similar peptides (that a few mutations apart) have similar sequences MS/MS analysis: similar peptides (that a few mutations apart) have dissimilar spectra

Peptide Identification Problem: Challenge Very similar peptides may have very different spectra! Goal: Define a notion of spectral similarity that correlates well with the sequence similarity. If peptides are a few mutations/modifications apart, the spectral similarity between their spectra should be high.

Deficiency of the Shared Peaks Count Shared peaks count (SPC): intuitive measure of spectral similarity. Problem: SPC diminishes very quickly as the number of mutations increases. Only a small portion of correlations between the spectra of mutated peptides is captured by SPC.

SPC Diminishes Quickly no mutations SPC=10 1 mutation SPC=5 2 mutations SPC=2 S(PRTEIN) = {98, 133, 246, 254, 355, 375, 476, 484, 597, 632} S(PRTEYN) = {98, 133, 254, 296, 355, 425, 484, 526, 647, 682} S(PGTEYN) = {98, 133, 155, 256, 296, 385, 425, 526, 548, 583}

Spectral Convolution

Elements of S2 S1 represented as elements of a difference matrix Elements of S2 S1 represented as elements of a difference matrix. The elements with multiplicity >2 are colored; the elements with multiplicity =2 are circled. The SPC takes into account only the red entries

Spectral Convolution: An Example 1 2 3 4 5 -150 -100 -50 0 50 100 150 x

Spectral Comparison: Difficult Case Which of the spectra S’ = {10, 20, 30, 40, 50, 55, 65, 75,85, 95} or S” = {10, 15, 30, 35, 50, 55, 70, 75, 90, 95} fits the spectrum S the best? SPC: both S’ and S” have 5 peaks in common with S. Spectral Convolution: reveals the peaks at 0 and 5.

Spectral Comparison: Difficult Case S S’ S S’’

Limitations of the Spectrum Convolutions Spectral convolution does not reveal that spectra S and S’ are similar, while spectra S and S” are not. Clumps of shared peaks: the matching positions in S’ come in clumps while the matching positions in S” don't. This important property was not captured by spectral convolution.

Shifts A = {a1 < … < an} : an ordered set of natural numbers. A shift (i,) is characterized by two parameters, the position (i) and the length (). The shift (i,) transforms {a1, …., an} into {a1, ….,ai-1,ai+,…,an+  }

Shifts: An Example The shift (i,) transforms {a1, …., an} into {a1, ….,ai-1,ai+,…,an+  } e.g. 10 20 30 40 50 60 70 80 90 10 20 30 35 45 55 65 75 85 10 20 30 35 45 55 62 72 82 shift (4, -5) shift (7,-3)

Spectral Alignment Problem Find a series of k shifts that make the sets A={a1, …., an} and B={b1,….,bn} as similar as possible. k-similarity between sets D(k) - the maximum number of elements in common between sets after k shifts.

Representing Spectra in 0-1 Alphabet Convert spectrum to a 0-1 string with 1s corresponding to the positions of the peaks.

Comparing Spectra=Comparing 0-1 Strings A modification with positive offset corresponds to inserting a block of 0s A modification with negative offset corresponds to deleting a block of 0s Comparison of theoretical and experimental spectra (represented as 0-1 strings) corresponds to a (somewhat unusual) edit distance/alignment problem where elementary edit operations are insertions/deletions of blocks of 0s Use sequence alignment algorithms!

Spectral Alignment vs. Sequence Alignment Manhattan-like graph with different alphabet and scoring. Movement can be diagonal (matching masses) or horizontal/vertical (insertions/deletions corresponding to PTMs). At most k horizontal/vertical moves.

Spectral Product A={a1, …., an} and B={b1,…., bn} Spectral product AB: two-dimensional matrix with nm 1s corresponding to all pairs of indices (ai,bj) and remaining elements being 0s. 10 20 30 40 50 55 65 75 85 95  1 1 1 1 1 1 1 1 1 1 SPC: the number of 1s at the main diagonal. -shifted SPC: the number of 1s on the diagonal (i,i+ )

Spectral Alignment: k-similarity k-similarity between spectra: the maximum number of 1s on a path through this graph that uses at most k+1 diagonals. k-optimal spectral alignment = a path. The spectral alignment allows one to detect more and more subtle similarities between spectra by increasing k.

Use of k-Similarity SPC reveals only D(0)=3 matching peaks. Spectral Alignment reveals more hidden similarities between spectra: D(1)=5 and D(2)=8 and detects corresponding mutations.

Black line represent the path for k=0 Red lines represent the path for k=1 Blue lines (right) represents the path for k=2

Spectral Convolution’ Limitation The spectral convolution considers diagonals separately without combining them into feasible mutation scenarios. 10 20 30 40 50 55 65 75 85 95 10 20 30 40 50 60 70 80 90 100 10 15 30 35 50 55 70 75 90 95  D(1) =10 shift function score = 10 D(1) =6

Dynamic Programming for Spectral Alignment Dij(k): the maximum number of 1s on a path to (ai,bj) that uses at most k+1 diagonals. Running time: O(n4 k)

Edit Graph for Fast Spectral Alignment diag(i,j) – the position of previous 1 on the same diagonal as (i,j)

Fast Spectral Alignment Algorithm Running time: O(n2 k)

Spectral Alignment: Complications Spectra are combinations of an increasing (N-terminal ions) and a decreasing (C-terminal ions) number series. These series form two diagonals in the spectral product, the main diagonal and the perpendicular diagonal. The described algorithm deals with the main diagonal only.

Spectral Alignment: Complications Simultaneous analysis of N- and C-terminal ions Taking into account the intensities and charges Analysis of minor ions