Download presentation
Presentation is loading. Please wait.
Published byLucinda Harmon Modified over 9 years ago
1
Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College Park
2
2 Synopsis MS/MS spectra provide evidence for the amino-acid sequence of functional proteins. Key concepts: Spectrum acquisition is unbiased Direct observation of amino-acid sequence Sensitive to small sequence variations
3
3 Synopsis MS/MS spectra provide evidence for the amino-acid sequence of functional proteins. Applications: Cancer biomarkers Genome annotation
4
4 Mass Spectrometry for Proteomics Measure mass of many (bio)molecules simultaneously High bandwidth Mass is an intrinsic property of all (bio)molecules No prior knowledge required
5
5 Mass Spectrometer Ionizer Sample + _ Mass Analyzer Detector MALDI Electro-Spray Ionization (ESI) Time-Of-Flight (TOF) Quadrapole Ion-Trap Electron Multiplier (EM)
6
6 High Bandwidth
7
7 Mass is fundamental!
8
8 Mass Spectrometry for Proteomics Measure mass of many molecules simultaneously...but not too many, abundance bias Mass is an intrinsic property of all (bio)molecules...but need a reference to compare to
9
9 Mass Spectrometry for Proteomics Mass spectrometry has been around since the turn of the century......why is MS based Proteomics so new? Ionization methods MALDI, Electrospray Protein chemistry & automation Chromatography, Gels, Computers Protein / genome sequences A reference for comparison
10
10 Sample Preparation for Peptide Identification Enzymatic Digest and Fractionation
11
11 Single Stage MS MS m/z
12
12 Tandem Mass Spectrometry (MS/MS) Precursor selection m/z
13
13 Tandem Mass Spectrometry (MS/MS) Precursor selection + collision induced dissociation (CID) MS/MS m/z
14
14 Peptide Identification For each (likely) peptide sequence 1. Compute fragment masses 2. Compare with spectrum 3. Retain those that match well Peptide sequences from (any) sequence database Swiss-Prot, IPI, NCBI’s nr, ESTs, genomes,... Automated, high-throughput peptide identification in complex mixtures
15
15 Peptide Identification...can provide direct experimental evidence for the amino-acid sequence of functional proteins. Evidence for: Functional protein isoforms Translation start and frame Proteins with short open-reading-frames
16
16 Why is this useful for...... genome annotation? Evidence for SNPs and alternative splicing stops with transcription No genomic or transcript evidence for translation start-site. Conservation doesn’t stop at coding bases! Statistical gene-finders struggle with micro- exons, translation start-site, and short ORFs.
17
17 Why is this useful for...... cancer biomarkers? Alternative splicing is the norm! Only 20-25K human genes Each gene makes many proteins Some splicing is believed to be silencing Lots of splicing in cancer Proteins have clinical implications Statistical biomarker discovery Putative malfunctioning proteins
18
18 What can be observed? Known coding SNPs Novel coding mutations Alternative splicing isoforms Microexons ( non-cannonical splice-sites ) Alternative translation start-sites ( codons ) Alternative translation frames “Dark” open-reading-frames
19
19 Splice Isoform Human Jurkat leukemia cell-line Lipid-raft extraction protocol, targeting T cells von Haller, et al. MCP 2003. LIME1 gene: LCK interacting transmembrane adaptor 1 LCK gene: Leukocyte-specific protein tyrosine kinase Proto-oncogene Chromosomal aberration involving LCK in leukemias. Multiple significant peptide identifications
20
20 Splice Isoform
21
21 Novel Splice Isoform
22
22 Novel Mutation HUPO Plasma Proteome Project Pooled samples from 10 male & 10 female healthy Chinese subjects Plasma/EDTA sample protocol Li, et al. Proteomics 2005. (Lab 29) TTR gene Transthyretin (pre-albumin) Defects in TTR are a cause of amyloidosis. Familial amyloidotic polyneuropathy late-onset, dominant inheritance
23
23 Novel Mutation Ala2→Pro associated with familial amyloid polyneuropathy
24
24 Novel Mutation
25
25 Translation Start-Site Human erythroleukemia K562 cell-line Depth of coverage study Resing et al. Anal. Chem. 2004. THOC2 gene: Part of the heteromultimeric THO/TREX complex. Initially believed to be a “novel” ORF RefSeq mRNA in Jun 2007, no RefSeq protein TrEMBL entry Feb 2005, no SwissProt entry Genbank mRNA in May 2002 (complete CDS) Plenty of EST support ~ 100,000 bases upstream of other isoforms
26
26 Translation Start-Site
27
27 Translation Start-Site
28
28 Translation Start-Site
29
29 Translation Start-Site
30
30 Easily distinguish minor sequence variations Two B. anthracis Sterne α/β SASP annotations RefSeq/Gb: MVMARN... (7441 Da) CMR: MARN... (7211 Da) Intact proteins differ by 230 Da 7441 Da vs 7211 Da N-terminal tryptic peptides: MVMAR (606.3 Da), MVMARNR (876.4 Da), vs MARNR (646.3 Da) Very different MS/MS spectra
31
31 Bacterial Gene-Finding …TAGAAAAATGGCTCTTTAGATAAATTTCATGAAAAATATTGA… Stop codon Find all the open-reading-frames......courtesy of Art Delcher
32
32 Bacterial Gene-Finding …TAGAAAAATGGCTCTTTAGATAAATTTCATGAAAAATATTGA… Stop codon …ATCTTTTTACCGAGAAATCTATTTAAAGTACTTTTTATAACT… Shifted Stop Stop codon Reverse strand Find all the open-reading-frames......but they overlap – which ones are correct?...courtesy of Art Delcher
33
33 Coding-Sequence “Score”...courtesy of Art Delcher
34
34 Glimmer3 Performance Glimmer3 trained & compared to RefSeq genes with annotated function Correct STOP: 99.6% Correct START: 84.3% “Not all the genomes necessarily have carefully/accurately annotated start sites, so the results for number of correct starts may be suspect.”
35
35 N-terminal peptides (Protein) N-terminal peptides establish start-site of known & unexpected ORFs Use: Directly to annotate genomes Evaluate and improve algorithms Map cross-species
36
36 N-terminal peptide workflows Typical proteomics workflows sample peptides from the proteome “randomly” Caulobacter crescentus (70%) 3733 Proteins (RefSeq Genome annot.) 66K tryptic peptides (600 Da to 3000 Da) 2085 N-terminal tryptic peptides (3%)
37
37 N-terminal peptide workflow Protect protein N-terminus Digest to peptides Chemically modify free peptide N-term Use chem. mod. to capture unwanted peptides Nat Biotech, Vol. 21, pp. 566-569, 2003.
38
38 Increasing N-terminal peptide coverage Multiple (digest) enzymes: trypsin-R: 60% (80%) acid + lys-C + trypsin: 85% (94%) Repeated LC-MS/MS Precursor Exclusion / Inclusion lists MALDI / ESI Protein separation and/or orthogonal fractionation Anal Chem, Vol. 76, pp. 4193-4201, 2004.
39
39 Proteomics Informatics Search spectra against: Entire bacterial genome; All Met initiated peptides; or Statistically likely Met initiated peptides. Easily consider initial Met loss PTM, too Off-the-shelf MS/MS search engines (Mascot / X!Tandem / OMSSA)
40
40 Other Practical Issues Suitable for commonly available instrumentation Only the sample prep. is (somewhat) novel. Need living organism Stage of life-cycle? Bang for buck? N-terminal peptides / $$$$ In discussions with JCVI (ex TIGR) Possible pilot project?
41
41 Other Research Projects Improving peptide identification by MS/MS Spectral matching using HMMs Combining search engine results Spectral matching for detection and quantitation Microorganism identification using MS Live public web-site and database (Inexact) uniqueness guarantees Primer/Probe oligo design Pathogen detection (DNA & Peptide) Significant false-positive peptide identifications
42
42 Spectral Matching Detection vs. identification Increased sensitivity No novel peptides NIST GC/MS Spectral Library Identifies small molecules, 100,000’s of (consensus) spectra Bundled/Sold with many instruments “Dot-product” spectral comparison Current project: Peptide MS/MS
43
43 Peptide DLATVYVDVLK
44
44 Peptide DLATVYVDVLK
45
45 Hidden Markov Models for Spectral Matching Capture statistical variation and consensus in peak intensity Capture semantics of peaks Extrapolate model to other peptides Good specificity with superior sensitivity for peptide detection Assign 1000’s of additional spectra (w/ p-value < 10 -5 )
46
46 www.RMIDb.org
47
47 www.RMIDb.org Statistics: 16.7 x 10 6 (6.4 x 10 6 ) protein sequences ~ 40,000 organisms, ~ 19,700 species 557 (415) complete genomes Sources: TIGR’s CMR, SwissProt, TrEMBL, Genbank Proteins, RefSeq Proteins & Genomes Inclusive Glimmer3 predictions on Genomes Pfam and GO assignments using BOINC grid
48
48 www.RMIDb.org Accessed from all over the world...
49
49 Uniqueness guarantees 20-mer oligo signatures for B. anthracis In all available strains as exact match No (inexact) match to other Bacillus species Specificity# Signatures% of genome Exact203508639.4% k = 186678716.8% k = 2757951.5% k = 31740.003%
50
50 Uniqueness guarantees Human genome primer design problem “4-unique” DNA 20-mers: Edit-distance ≥ 5 to any non-specific hybridization site No such valid loci on Chr. 22! Currently analyzing entire genome “3-unique” DNA 20-mers: Initial experiments suggest ~ 0.01% valid Approx. 1 valid oligo every 10,000 bases
51
51 Future Research Plans Cancer biomarkers: Optimize proteomics workflow for protein sequence coverage Improve informatics infrastructure to make interpretation easier Identify splice variants in cancer cell-lines (MCF-7) and clinical brain tumor samples
52
52 Future Research Plans Genome Annotation Collect evidence for functional alternative splicing in public datasets into dbPEP. Conduct pilot project for bacterial genome annotation with JCVI. Improve informatics infrastructure to make interpretation easier.
53
53 Future Research Plans Peptide Identification Expand library of HMM models for high- confidence spectral matching Spectral matching for biomarkers and quantitation (with Calibrant). Specificity metric for peptides identified using MS/MS
54
54 Future Research Plans Microorganism identification by mass spectrometry Specificity of tandem mass spectra Revamp RMIDb prototype Incorporate spectral matching, top-down.
55
55 Future Research Plans Oligonucleotide Design Uniqueness oracle for inexact match in human Integration with Primer3 Tiling, multiplexing, pooling, & tag arrays
56
56 Acknowledgements Catherine Fenselau, Steve Swatkoski UMCP Biochemistry Chau-Wen Tseng, Xue Wu UMCP Computer Science Cheng Lee, Brian Balgley Calibrant Biosystems PeptideAtlas, HUPO PPP, X!Tandem Funding: NIH/NCI, USDA/ARS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.