Fa 06CSE182 CSE182-L13 Mass Spectrometry Quantitation and other applications.

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Protein Quantitation II: Multiple Reaction Monitoring
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
UC Mass Spectrometry Facility & Protein Characterization for Proteomics Core Proteomics Capabilities: Examples of Protein ID and Analysis of Modified Proteins.
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Novel labeling technologies on proteins
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
CSE182 CSE182-L13 Mass Spectrometry Quantitation and other applications.
Fa 05CSE182 CSE182-L7 Protein sequencing and Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Mass Spectrometry Peptide identification
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Fuzzy K means.
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
PROTEOMICS LECTURE. Genomics DNA (Gene) Functional Genomics TranscriptomicsRNA Proteomics PROTEIN Metabolomics METABOLITE Transcription Translation Enzymatic.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Previous Lecture: Regression and Correlation
My contact details and information about submitting samples for MS
A combination of the words Proteomics and Genomics. Proteogenomics commonly refer to studies that use proteomic information, often derived from mass spectrometry,
CSE182-L12 Mass Spectrometry Peptide identification CSE182.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
CSE182 CSE182-L13 Mass Spectrometry Quantitation and other applications.
with an emphasis on DNA microarrays
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Data Type 1: Microarrays
© 2010 SRI International - Company Confidential and Proprietary Information Quantitative Proteomics: Approaches and Current Capabilities Pathway Tools.
ESI and MALDI LC/MS-MS Approaches for Larger Scale Protein Identification and Quantification: Are They Equivalent? 1P. Juhasz, 1A. Falick,1A. Graber, 1S.
Classifier Evaluation Vasileios Hatzivassiloglou University of Texas at Dallas.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
Quantification of Membrane and Membrane- Bound Proteins in Normal and Malignant Breast Cancer Cells Isolated from the Same Patient with Primary Breast.
Genome of the week - Enterococcus faecalis E. faecalis - urinary tract infections, bacteremia, endocarditis. Organism sequenced is vancomycin resistant.
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Peptide Identification via Tandem Mass Spectrometry Sorin Istrail.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Proteome and Gene Expression Analysis Chapter 15 & 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Proteome and Gene Expression Analysis Chapter 15 & 16.
Sequence Alignment.
Isotope Labeled Internal Standards in Skyline
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
Salamanca, March 16th 2010 Participants: Laboratori de Proteomica-HUVH Servicio de Proteómica-CNB-CSIC Participants: Laboratori de Proteomica-HUVH Servicio.
Oct 2011 SDMBT1 Lecture 11 Some quantitation methods with LC-MS a.ICAT b.iTRAQ c.Proteolytic 18 O labelling d.SILAC e.AQUA f.Label Free quantitation.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Protein quantitation I: Overview (Week 5). Fractionation Digestion LC-MS Lysis MS Sample i Protein j Peptide k Proteomic Bioinformatics – Quantitation.
Using Scaffold OHRI Proteomics Core Facility. This presentation is intended for Core Facility internal training purposes only.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Quantitation using Pseudo-Isobaric Tags (QuPIT) and Quantitation using Pseudo-isobaric Amino acids in Cell culture (QuPAC) Parimal Samir Andrew J. Link.
Mass Spectrometry makes it possible to measure protein/peptide masses (actually mass/charge ratio) with great accuracy Major uses Protein and peptide identification.
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
Protein/Peptide Quantification
Proteomics Informatics David Fenyő
A perspective on proteomics in cell biology
NoDupe algorithm to detect and group similar mass spectra.
Proteomics Informatics David Fenyő
Kuen-Pin Wu Institute of Information Science Academia Sinica
Presentation transcript:

Fa 06CSE182 CSE182-L13 Mass Spectrometry Quantitation and other applications

Fa 06CSE182 What happens to the spectrum upon modification? Consider the peptide MSTYER. Either S,T, or Y (one or more) can be phosphorylated Upon phosphorylation, the b-, and y-ions shift in a characteristic fashion. Can you determine where the modification has occurred? If T is phosphorylated, b 3, b 4, b 5, b 6, and y 4, y 5, y 6 will shift

Fa 06CSE182 Effect of PT modifications on identification The shifts do not affect de novo interpretation too much. Why? Database matching algorithms are affected, and must be changed. Given a candidate peptide, and a spectrum, can you identify the sites of modifications

Fa 06CSE182 Db matching in the presence of modifications Consider MSTYER The number of modifications can be obtained by the difference in parent mass. With 1 phosphorylation event, we have 3 possibilities: –MS*TYER –MST*YER –MSTY*ER Which of these is the best match to the spectrum? If 2 phosphorylations occurred, we would have 6 possibilities. Can you compute more efficiently?

Fa 06CSE182 Scoring spectra in the presence of modification Can we predict the sites of the modification? A simple trick can let us predict the modification sites? Consider the peptide ASTYER. The peptide may have 0,1, or 2 phosphorylation events. The difference of the parent mass will give us the number of phosphorylation events. Assume it is 1. Create a table with the number of b,y ions matched at each breakage point assuming 0, or 1 modifications Arrows determine the possible paths. Note that there are only 2 downward arrows. The max scoring path determines the phosphorylated residue A S T Y E R 0101

Fa 06CSE182 Modifications Modifications significantly increase the time of search. The algorithm speeds it up somewhat, but is still expensive

Fa 06CSE182 Fast identification of modified peptides

Fa 06CSE182 Filtering Peptides to speed up search FilterSignificanceScoreextension De novo Db 55M peptides Candidate Peptides Database MDERHILNMKLQWVCS DLPTYWASDLENQIKRS ACVMTLACHGGEMNGA LPQWRTHLLERTYKMN VVGGPASSDALITGMQS DPILLVCATRGHEWAILF GHNLWACVNMLETAIKL EGVFGSVLRAEKLNKAA PETYIN.. As with genomic sequence, we build computational filters that eliminate much of the database, leaving only a few candidates for the more expensive scoring.

Fa 06CSE182 Basic Filtering Typical tools score all peptides with close enough parent mass and tryptic termini Filtering by parent mass is problematic when PTMs are allowed, as one must consider multiple parent masses

Fa 06CSE182 Tag-based filtering A tag is a short peptide with a prefix and suffix mass Efficient: An average tripeptide tag matches Swiss-Prot ~700 times Analogy: Using tags to search the proteome is similar to moving from full Smith-Waterman alignment to BLAST

Fa 06CSE182 Tag generation W R A C V G E K D W L P T L T TAG Prefix Mass AVG 0.0 WTD PET Using local paths in the spectrum graph, construct peptide tags. Use the top ten tags to filter the database Tagging is related to de novo sequencing yet different. Objective: Compute a subset of short strings, at least one of which must be correct. Longer tags=> better filter.

Fa 06CSE182 Tag based search using tries YFD DST STD TDY YNM Y M FD N Y M FD N …..YFDSTGSGIFDESTMTKTYFDSTDYNMAK…. De novo trie scan

Fa 06CSE182 Modification Summary Modifications shift spectra in characteristic ways. A modification sensitive database search can identify modifications, but is computationally expensive Filtering using de novo tag generation can speed up the process making identification of modified peptides tractable.

Fa 06CSE182 MS based quantitation

Fa 06CSE182 The consequence of signal transduction The ‘signal’ from extra- cellular stimulii is transduced via phosphorylation. At some point, a ‘transcription factor’ might be activated. The TF goes into the nucleus and binds to DNA upstream of a gene. Subsequently, it ‘switches’ the downstream gene on or off

Fa 06CSE182 Transcription Transcription is the process of ‘transcribing’ or copying a gene from DNA to RNA

Fa 06CSE182 Translation The transcript goes outside the nucleus and is translated into a protein. Therefore, the consequence of a change in the environment of a cell is a change in transcription, or a change in translation

Fa 06CSE182 Counting transcripts cDNA from the cell hybridizes to complementary DNA fixed on a ‘chip’. The intensity of the signal is a ‘count’ of the number of copies of the transcript

Fa 06CSE182 Quantitation: transcript versus Protein Expression mRNA Protein 1 Protein 2 Protein 3 Sample 1Sample 2 Sample 1Sample2 Our Goal is to construct a matrix as shown for proteins, and RNA, and use it to identify differentially expressed transcripts/proteins

Fa 06CSE182 Gene Expression Measuring expression at transcript level is done by micro- arrays and other tools Expression at the protein level is being done using mass spectrometry. Two problems arise: –Data: How to populate the matrices on the previous slide? (‘easy’ for mRNA, difficult for proteins) –Analysis: Is a change in expression significant? (Identical for both mRNA, and proteins). We will consider the data problem here. The analysis problem will be considered when we discuss micro-arrays.

Fa 06CSE182 MS based Quantitation The intensity of the peak depends upon –Abundance, ionization potential, substrate etc. We are interested in abundance. Two peptides with the same abundance can have very different intensities. Assumption: relative abundance can be measured by comparing the ratio of a peptide in 2 samples.

Fa 06CSE182 Quantitation issues The two samples might be from a complex mixture. How do we identify identical peptides in two samples? In micro-array this is possible because the cDNA is spotted in a precise location? Can we have a ‘location’ for proteins/peptides

Fa 06CSE182 LC-MS based separation As the peptides elute (separated by physiochemical properties), spectra is acquired. HPLC ESI TOF Spectrum (scan) p1 p2 pn p4 p3

Fa 06CSE182 LC-MS Maps time m/z I Peptide 2 Peptide 1 x x x x x x x x x x x x x x time m/z Peptide 2 elution A peptide/feature can be labeled with the triple (M,T,I): –monoisotopic M/Z, centroid retention time, and intensity An LC-MS map is a collection of features

Fa 06CSE182 Peptide Features Isotope pattern Elution profile Peptide (feature) Capture ALL peaks belonging to a peptide for quantification !

Fa 06CSE182 Data reduction (feature detection) Features First step in LC-MS data analysis Identify ‘Features’: each feature is represented by –Monoisotopic M/Z, centroid retention time, aggregate intensity

Fa 06CSE182 Feature Identification Input: given a collection of peaks (Time, M/Z, Intensity) Output: a collection of ‘features’ –Mono-isotopic m/z, mean time, Sum of intensities. –Time range [T beg -T end ] for elution profile. –List of peaks in the feature. Int M/Z

Fa 06CSE182 Feature Identification Approximate method: Select the dominant peak. –Collect all peaks in the same M/Z track –For each peak, collect isotopic peaks. –Note: the dominant peak is not necessarily the mono- isotopic one.

Fa 06CSE182 Relative abundance using MS Recall that our goal is to construct an expression data- matrix with abundance values for each peptide in a sample. How do we identify that it is the same peptide in the two samples? Differential Isotope labeling (ICAT/SILAC) External standards (AQUA) Direct Map comparison

Fa 06CSE182 ICAT The reactive group attaches to Cysteine Only Cys-peptides will get tagged The biotin at the other end is used to pull down peptides that contain this tag. The X is either Hydrogen, or Deuterium (Heavy) –Difference = 8Da

Fa 06CSE182 ICAT ICAT reagent is attached to particular amino-acids (Cys) Affinity purification leads to simplification of complex mixture “diseased” Cell state 1 Cell state 2 “Normal” Label proteins with heavy ICAT Label proteins with light ICAT Combine Fractionate protein prep - membrane - cytosolic Proteolysis Isolate ICAT- labeled peptides Nat. Biotechnol. 17: ,1999

Fa 06CSE182 Differential analysis using ICAT ICAT pairs at known distance heavy light Time M/Z

Fa 06CSE182 ICAT issues The tag is heavy, and decreases the dynamic range of the measurements. The tag might break off Only Cysteine containing peptides are retrieved Non-specific binding to strepdavidin

Fa 06CSE182 Serum ICAT data MA13_02011_02_ALL01Z3I9A* Overview (exhibits ’stack-ups’)

Fa 06CSE182 Serum ICAT data Instead of pairs, we see entire clusters at 0, +8,+16,+22 ICAT based strategies must clarify ambiguous pairing.

Fa 06CSE182 ICAT problems Tag is bulky, and can break off. Cys is low abundance MS 2 analysis to identify the peptide is harder.

Fa 06CSE182 SILAC A novel stable isotope labeling strategy Mammalian cell-lines do not ‘manufacture’ all amino-acids. Where do they come from? Labeled amino-acids are added to amino-acid deficient culture, and are incorporated into all proteins as they are synthesized No chemical labeling or affinity purification is performed. Leucine was used (10% abundance vs 2% for Cys)

Fa 06CSE182 SILAC vs ICAT Leucine is higher abundance than Cys No affinity tagging done Fragmentation patterns for the two peptides are identical –Identification is easier Ong et al. MCP, 2002

Fa 06CSE182 Incorporation of Leu-d3 at various time points Doubling time of the cells is 24 hrs. Peptide = VAPEEHPVLLTEAPLNPK What is the charge on the peptide?

Fa 06CSE182 Quantitation on controlled mixtures

Fa 06CSE182 Identification MS/MS of differentially labeled peptides

Fa 06CSE182 Peptide Matching SILAC/ICAT allow us to compare relative peptide abundances without identifying the peptides. Another way to do this is computational. Under identical Liquid Chromatography conditions, peptides will elute in the same order in two experiments. –These peptides can be paired computationally

Fa 06CSE182 Map 1 (normal) Map 2 (diseased) Map Comparison for Quantification

Fa 06CSE182 Comparison of features across maps Hard to reduce features to single spots Matching paired features is critical M/Z is accurate, but time is not. A time scaling might be necessary

Fa 06CSE182 Time scaling: Approach 1 (geometric matching) Match features based on M/Z, and (loose) time matching. Objective  f (t 1 - t 2 ) 2 Let t 2 ’ = a t 2 + b. Select a,b so as to minimize  f (t 1 -t’ 2 ) 2

Fa 06CSE182 Geometric matching Make a graph. Peptide a in LCMS1 is linked to all peptides with identical m/z. Each edge has score proportional to t 1/ t 2 Compute a maximum weight matching. The ratio of times of the matched pairs gives a. Rescale and compute the scaling factor T M/Z

Fa 06CSE182 Approach 2: Scan alignment Each time scan is a vector of intensities. Two scans in different runs can be scored for similarity (using a dot product) S 11 S 12 S 22 S 21 M(S 1i,S 2j ) =  k S 1i (k) S 2j (k) S 1i = S 2j =

Fa 06CSE182 Scan Alignment Compute an alignment of the two runs Let W(i,j) be the best scoring alignment of the first i scans in run 1, and first j scans in run 2 Advantage: does not rely on feature detection. Disadvantage: Might not handle affine shifts in time scaling, but is better for local shifts S 11 S 12 S 22 S 21

Fa 06CSE182