Proteomics February 15, 2017 Dr. ir. Perry Moerland

Slides:

Advertisements

Similar presentations

Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.

Advertisements

MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.

How to identify peptides October 2013 Gustavo de Souza IMM, OUS.

Peptide Mass Fingerprinting

Mass Fingerprint. Protease A protease is any enzyme that conducts proteolysis, that is, begins protein catabolism by hydrolysis of the peptide bonds that.

De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.

Protein Sequencing and Identification by Mass Spectrometry.

Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.

Proteomics The proteome is larger than the genome due to alternative splicing and protein modification. As we have said before we need to know All protein-protein.

PROTEIN IDENTIFICATION BY MASS SPECTROMETRY. OBJECTIVES To become familiar with matrix assisted laser desorption ionization-time of flight mass spectrometry.

Basics of 2-DE and MALDI-ToF MS

Previous Lecture: Regression and Correlation

Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.

Scaffold Download free viewer:

Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information

My contact details and information about submitting samples for MS

Goals in Proteomics 1.Identify and quantify proteins in complex mixtures/complexes 2.Identify global protein-protein interactions 3.Define protein localizations.

Facts and Fallacies about de Novo Sequencing & Database Search.

Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information

Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.

The dynamic nature of the proteome

PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.

INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.

INF380 - Proteomics-61 INF380 – Proteomics Chapter 6 – Mass Spectrometry – MALDI TOF The MALDI-TOF instruments are the simplest MS instruments suitable.

Common parameters At the beginning one need to set up the parameters.

Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:

Laxman Yetukuri T : Modeling of Proteomics Data

PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.

Temple University MASS SPECTROMETRY FURTHER INVESTIGATIONS Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.

Lecture 9. Functional Genomics at the Protein Level: Proteomics.

Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.

Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?

INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.

CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.

Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.

EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.

Separates charged atoms or molecules according to their mass-to-charge ratio Mass Spectrometry Frequently.

Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information

Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information

2014 생화학 실험 (1) 6주차 실험조교 : 류 지 연 Yonsei Proteome Research Center 산학협동관 421호

Constructing high resolution consensus spectra for a peptide library

김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.

Yonsei Proteome Research Center Peptide Mass Finger-Printing Part II. MALDI-TOF 2013 생화학 실험 (1) 6 주차 자료 임종선 조교 내선 6625.

Identify proteins. Proteomic workflow Trypsin A typical sample We add a solution of 50 mM NH 4 HCO 3 (pH 7.8) containing trypsin ( µg/µl). Volume.

Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.

Peptide Mass Finger-Printing Part II. MALDI-TOF

Mass Spectrometry makes it possible to measure protein/peptide masses (actually mass/charge ratio) with great accuracy Major uses Protein and peptide identification.

Mass Spectrometry 101 (continued) Hackert - CH 370 / 387D

A Database of Peak Annotations of Empirically Derived Mass Spectra

The Covalent Structure of Proteins

The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.

MassMatrix Search Results Explained

Protein Identification via Database searching

Lecture 2 Techniques in proteomics By Ms. Shumaila Azam

Proteomics Lecture 4 Proteases.

Proteomics Informatics David Fenyő

Interpretation of Mass Spectra I

Peptide & Protein Identification by MS/MS

Protein Identification by Peptide Mass Fingerprinting

Proteomics Informatics –

Protein Identification Using Tandem Mass Spectrometry

Protein Identification Using Mass Spectrometry

Bioinformatics for Proteomics

Pierre P. Massion, MD, Richard M. Caprioli, PhD

Mass Spectrometry THE MAIN USE OF MS IN ORG CHEM IS:

Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.

Proteomics Informatics David Fenyő

Interpretation of Mass Spectra

Kuen-Pin Wu Institute of Information Science Academia Sinica

Presentation transcript:

Proteomics February 15, 2017 Dr. ir. Perry Moerland Bioinformatics Laboratory Academic Medical Center p.d.moerland@amc.uva.nl Graduate School ‘Bioinformatics’

Mass spectrometry: MALDI-TOF need only femtomole of peptide material ionized peptides are accelerated time of flight spectrum From Lodish et al: Ion source + mass analyzer MALDI –TOF MS: Matrix-Assisted Laser Desorption Ionization Time Of Flight Mass Spectrometry mass spectrum intensity mass 2

MS ID: peptide mass fingerprint wet lab: gel protein peptides spectrum digest in silico: protein in database theoretical spectrum digest compare http://www.uniprot.org/: > 59,000,000 entries score From Graves & Haystead: K=lysine, R=arginine We will come back to how to obtain the sticks representation of a spectrum later in this lecture 3

Identification: peptide mass fingerprint Why a fingerprint and not the intact protein mass? Pre-sequences Post translational modifications Alternative splicing Errors in databases: 1 in 1000-10000 bases (i.e. 300-3000 amino acids) Measurement error: 10 ppm error on 100kDa = 1 Da Pre-sequence: signal peptide 4

Intact mass and protein identity Variable/unknown modifications Intron/exon boundaries N terminal processing Pi X Ac GlcNAc Sequencing errors & polymorphisms C terminal processing Known/fixed modifications Only a few modified peptides Most peptides will have the predicted mass 5

Identification: how to score? Simply counting the number of matches is not enough The limited mass accuracy of MS can lead to both false positives and false negatives 6

Identification: how to score? Simply counting the number of matches is not enough False positive, because MS has only limited mass accuracy Bias towards ‘heavy’ proteins … False negatives, because Peptides can be post-translationally modified Sometimes digestion is not perfect: missed cleavages The limited mass accuracy of MS can lead to both false positives and false negatives 7 7

Importance of mass accuracy 1H 1.0078250 12C 12 14N 14.0030740 16O 15.9949140 31P 30.9737620 32S 31.9720707 8

Peptide mass fingerprinting: Mascot Trade-off between false positives (specificity) and false negatives (sensitivity) http://www.matrixscience.com 9

Peptide mass fingerprinting: missed cleavages Higher sensitivity, lower specificity 10

Scoring scheme: MASCOT Score, Expect? 11

Scoring scheme: MOWSE/MASCOT (I) # of matching proteins For protein A: Mowse score = 50/(NH) 50 : average protein of 50 kDa H : molecular weight of protein A N : product of match scores, each match score M (0≤M ≤ 1) weighted inversely proportionally to peptide mass http://www.matrixscience.com/help/scoring_help.html: accumulate statistics on the size distribution of peptide masses as a function of protein mass https://proteomicsresource.washington.edu/mascot/help/scoring_help.html Note: this implicitly incorporates the number of matches Warning: many scoring schemes (and there are tens of them) are very ad-hoc and prone to false positives 12

Scoring scheme: MOWSE/MASCOT (II) Probabilistic: transformation of the Mowse score in probability P that match is a random event Score: correct for size of database P = 1 / (20 x 5000) S = -10LogP = 50 Expect (E-value): number of times you could expect to get this score or better by chance E-value≥1: completely random 13

LC-MS/MS in 30 sec. LC chromatogram MS mass spectrum fragmentation From Peng and Gygi 14

MS/MS fragmentation Breaks a selected peptide in smaller parts fragmentation spectrum: identifies individual amino acids 15

MS/MS peptide identification Sequence searching Sequence tag De novo 16

MS/MS ID: sequence searching dBase shortlist Predict MSMS spectra  tubulin 353-370 Compare MATCH Very dominant fragmentation pattern from some daughter ions: -good chemical reasons for this behaviour. No good contiguous sequence can be established to a high degree of confidence. In these cases raw MSMS searching is much more apt in identifying parent. RAW MSMS searching will be the future of fragmentation - db searching. Computer generates a theoretical fragmentation spectrum: Mascot 17

MS/MS ID: sequence TAG Internal sequence dBase Format: 409,76-----T1A2G3-------528,13 mass1 internal sequence mass2 Compare mass Internal sequence tag shortlist Compare fragments Strech of contiguous high peaks easily found Peak to peak distance corresponds with aminoacid mass Predominantly Y” fragmentation @ low energy CID C&N masses correspond to a combination of aa’s (all permutations) 2+ precursor charge state: Fragment masses> parent mass ensure these are products from a non singly charged molecule. match 18

MS/MS ID: de novo (I) De novo: for peptides/proteins not yet in database or not identified by fingerprint techniques: unsequenced species modified peptides Algorithms for extracting long stretches of peptide are needed: high-resolution & very good quality spectra 19

MS/MS ID: de novo (II) Problem: MS/MS spectra display a mixture of C- and N-terminal fragment ion series (a, b, c, and x, y, z, respectively) Solution: reduce complexity of MS/MS spectra Protein digestion using metalloendopeptidase with Lys-N specificity: cleavage N-terminally of lysine electron-transfer-induced dissociation MS/MS spectrum is dominated by c-ions MS/MS spectrum has few gaps In a gold standard dataset 42% of peptides identified via database search can be identified de novo Van Breukelen et al, Proteomics (2010) 20

MS/MS ID: validation (I) Many MS/MS search engines are threshold-based Probabilistic approaches: compute probability that a match is correct: Mascot PeptideProphet Nesvizhskii et al, Nature Methods (2007) 21

MS/MS ID: validation (II) Many MS/MS search engines are threshold-based Target-decoy searching: Database augmented with reversed or randomized sequences of DB Filter using various score cutoffs (FDR) http://www.matrixscience.com/help/decoy_help.html Nesvizhskii et al, Nature Methods (2007) 23

Cellular changes in T-cell expression patterns upon infection with HIV Analysis of human T cells upon HIV-1 infection 1921 protein spots detected with 2D DIGE 288 spots differentially expressed at peak infection 93 unique proteins identified via peptide mass fingerprinting 188 unidentified spots Goal: try to find candidate proteins in silico Nandal et al, 16:25, BMC Bioinformatics (2015) 24

Recapitulation: 2D gel electrophoresis Iso-electric point Horizontal axis: separation by pI Vertical axis: separation by Mw, lower = lighter Molecular weight 2D-DIGE: quantification of changes in protein expression using fluorescent labeling 25

In-depth mining of 2D-DIGE data 26

Step 1: Calculation of pI and Mw http://web.expasy.org/compute_pi/ 27

Step 2: Fitting calibration curves Cubic smoothing splines 28 28 28

Step 3: Generation of candidate list Properties & pI/Mw ranges of proteins sent to TagIdent -> get list of UNIPROT identifiers back of proteins that are close to predicted pI/Mw Isoforms are disregarded, as STRING does not handle isoforms. 29 http://web.expasy.org/tagident/ 29 29

Step 4: Prioritization of candidates STRING 30

STRING: functional protein association network STRING is a database of known and predicted protein interactions, derived from different sources: Interactions are visualized via graphs v9.1: >5,200,000 proteins from 1,133 organisms http://string-db.org/ Different line colors represent the types of evidence for the association STRING 10: database currently covers 9'643'763 proteins from 2'031 organisms. 31

STRING network: 2D-DIGE identified proteins Visualisation of the STRING network. Associations are represented with lines. Different colours of lines code for different evidence categories. New proteins have to fit in this network as well as possible. 377 observed interactions as compared to 66.2 expected interactions (P << 0.0001) 32

? Step 5: gene expression-based filtering gene expression ~ protein expression ? http://barcode.luhs.org/ 33

Properties of candidate lists Post-translational modifications Hydrophobic proteins 34

Results TPR: true-positive rate With optimal settings for pI range and Mw range: ~44% of correct proteins in top-5 35

Results: gene expression-based filtering http://string-db.org/9_1/p/885743793 36

Further pointers Catalogues: http://www.humanproteomemap.org/ https://www.proteomicsdb.org/ http://www.proteinatlas.org/ Galaxy: https://usegalaxy.org/ Boekel et al, Nature Biotechnology, 33:2, 139 (2015) 37