Karl Clauser Proteomics and Biomarker Discovery Sample Experiment Summary 1 Spider 1 ug venom Reduce/alkylate De-salt LC-MS/MS 300ng C18 300A Microsorb 3um Orbitrap Elite CID/HCD/ETD Spectrum Mill Cys iodoacetamide Cys MeAziridine Digest Lys-C Database Venom gland mRNA library Ion Torrent sequencing bp reads Transcriptome assembly (Trinity) Translation (extractORFS.pl) Database Venom gland mRNA library 454 sequencing bp reads Transcriptome assembly (MIRA) Translation (extractORFS.pl) Centipede 1 ug venom Reduce/alkylate Cys-ethanolyl-gp Digest 500ng & 500 ng TrypsinGlu-C De-salt LC-MS/MS 150 ng 300A Microsorb 3um C18 C18 120A Reprosil 3um Orbitrap Elite CID/HCD/ETD Spectrum Mill Scorpion 1 ug venom Reduce/alkylate De-salt LC-MS/MS 300,600 ng C18 300A Microsorb 3um Orbitrap Elite CID/HCD/ETD Spectrum Mill Cys-ethanolyl-sol Cys-iodoacetamide C18 200A Magic 3um Cys-MeAziridine Digest Lys-C Database Venom gland mRNA library 454 sequencing bp reads Transcriptome assembly (Trinity) Translation (extractORFS.pl)
Karl Clauser Proteomics and Biomarker Discovery Spider Protein Group.subgroup List 2 Matches to Arachnoserver but not RNA-Seq Suggests need to improve paralog inclusion in RNA-Seq Transcript Assembly Samples are from Different animals Nov 2012Feb 2012 MeAzCysIAAcys Lys-Cundigested
Karl Clauser Proteomics and Biomarker Discovery Group 2 omega-hexatoxin Hi2 3 6 Cys Sub Group MTPSCTLGICAPSVGGLVGGLLG reads: 4382 MTPSCTLGICAPSVGGIVGGLLG reads: 0251 MTPSCTLGICAPNVGGLVGGLLG reads: 0201 MTPSCTMGLCVPNVGGLVGGILG reads: 0000 MTPSCTMGICVPNVGGLVGGILG reads: 0000 MTPSCTMGICVPNVGGLVGGLLG reads: 0000 CAPNVGG reads: 0442 CVPNVGG reads: 0000 Evidence exists in spider.q17 reads for paralogs missing from the assembly. ERGVLDCVVNTLGC reads: 635 ERGVVDCVLNTLGC reads: 106 ERGLVDCVLNTLGC reads: 553 ERGLADCVLNTLGC reads: 21 ER_QLDCVLNTLGC reads: 26 ERGVVGCVLNTLGC reads: 8 VLDCVVNTLGCSSDKDCCGMTPSCTLGICAPSVGGL reads: 99 VVDCVLNTLGCSSDKDCCGMTPSCTLGICAPSVGGL reads: 33 LVDCVLNTLGCSSDKDCCGMTPSCTLGICAPSVGGL reads: 72 LADCVLNTLGCSSDKDCCGMTPSCTLGICAPSVGGL reads: 4 VLDCVVNTLGCSSDKDCCGMTPSCTLGICAPNVGGL reads: 2 QLDCVLNTLGCSSDKDCCGMTPSCTLGICAPSVGGL reads: 1 QLDCVLNTLGCSSDKDCCGMTPSCTLGICAPNVGGL reads: 2
Karl Clauser Proteomics and Biomarker Discovery Only 1 spectrum supports LCVPN 4 (G)L C V P n\V\G\G\L\V\G\G\I\L(G) MTPSCTLGICAPSVGGLVGGLLG reads: 4382 MTPSCTLGICAPSVGGIVGGLLG reads: 0251 MTPSCTLGICAPNVGGLVGGLLG reads: 0201 MTPSCTMGLCVPNVGGLVGGILG reads: 0000 MTPSCTMGICVPNVGGLVGGILG reads: 0000 MTPSCTMGICVPNVGGLVGGLLG reads: 0000 CAPNVGG reads: 0442 CVPNVGG reads: 0000
Karl Clauser Proteomics and Biomarker Discovery Group 23 Fusion Toxin or Transcript Misassembly 5 23 looks like a fusion toxin, but could be transcript misassembly? 12 Cys, 2 pairs of adjacent CC’s maybe 11 Cys in mature form, active dimer? perhaps 2 concatenated 6 Cys toxins Present in 2 libraries >as:pi-theraphotoxin-Pc1a|sp:P Toxin from venom of the spider Psalmopoeus cambridgei that inhibits ASIC1a channels Score = 64.7 bits (156), Expect = 1e-13 Identities = 26/39 (66%), Positives = 28/39 (71%) Query: 52 QECIAKWKSCAGRKLDCCEGLECWKRRWGHEVCVPITQK 90 ++CI KWK C R DCCEGLECWKRR EVCVP T K Sbjct: 1 EDCIPKWKGCVNRHGDCCEGLECWKRRRSFEVCVPKTPK 39 Top BLAST Hit does not span fusion junction pro/mature
Karl Clauser Proteomics and Biomarker Discovery Alignment of CCC 8 Cys containing groups 7,14,16,30 6 Protein Group WCAKNEDCCCPMKCIGAWYN reads:312 WCGKEGDCCCPWKCIGQWYN reads: 25 RCNANSDCCCPLKCVIRLVG reads: 3 YCEKDKDCCCPMRCVKSYWK reads: 10 GroupProposed names 16.1delta-hexatoxin-Hi3 7.1delta-hexatoxin-Hi1 14.1delta-hexatoxin-Hi2 30.1delta-hexatoxin-Hi4
Karl Clauser Proteomics and Biomarker Discovery Coverage of CCC 8 Cys containing groups 7,14,16,30 7
Karl Clauser Proteomics and Biomarker Discovery Alignment of noCCC 8 Cys containing groups 8 Protein Group ,5,8,12,13,15,20,21 5,13,15,20,21 Protein Group CC, CXC, CXC
Karl Clauser Proteomics and Biomarker Discovery Group 1 9 topBLAST hit not in Arachnoserver gi| |gb|ADF |putative mature sequence toxin-like ACSKQ [Pelinobius muticus] EQIAAEENQLVEDLVQYAGTRLTQKRATRCSKKLGEKCNYHCECCGATVACSTVYVGGKETNFCSDKTSNNGALNTVGQGLNVVSNGLSAFQCWG + A+E ++L+E L + + Q+ A CSK++GEKC + C+CCGATV C T+YVGG C KTSNN LNT+G G+N V N ++ CWG RKTASETSKLLEKL-GVSREAIPQEMARACSKQIGEKCEHDCQCCGATVVCGTIYVGGNAVEQCMSKTSNNAVLNTMGHGMNAVQNAFTSVMCWG 8 Cys CSDKTSNNGALNTVGQGLNVVSNGLSAFQC reads: 1023 CSKKLGEKCNYHCECCGATVACS reads: 371 CSKKLGEKCDYHCECCGATVACD reads: 49 CSTVYVGGKETNFC reads: 5251 TVGQGLNVVSNGLSAFQC reads: 4739 CSDKTSNNGALNTVGQGLN reads: 1887 GGKETNFCSDKTSNN reads: 3771 GGRETNFCSDKTSNN reads: 10 CSKKLGEKCNYHC reads: 769 CSKKLGEKCDYHC reads: 111 Evidence exists in spider.q17 reads for paralogs missing from the assembly. N to D supported in MS/MS matched as N to n deamidation NVVVNGFSAFQC reads: 25
Karl Clauser Proteomics and Biomarker Discovery Alignment of CECCG 8 Cys containing groups 1,11,29 10 Groups 1,11,29 are highly related. Lower level variant reads not assembled with group 1 may be the missing N-term for Group 11,29 Protein Group
Karl Clauser Proteomics and Biomarker Discovery Alignment of CECCG 8 Cys containing groups 1,11,12 11 Protein Group Protein Group Group 1 and 11 are highly related. Lower level variant reads not assembled with group 1 may be the missing N-term for Group 11
Karl Clauser Proteomics and Biomarker Discovery Alignment of CC, CXC, CXC 8 Cys containing groups 5,13,15,20,21, ,13,15,20,21 CC, CXC, CXC Protein Group Protein Group
Karl Clauser Proteomics and Biomarker Discovery Coverage of CC, CXC, CXC 8Cys containing groups 5,20 13
Karl Clauser Proteomics and Biomarker Discovery Coverage of CC,CXC,CXC 8Cys containing groups 13,15,21,26 14 Protein Group
Karl Clauser Proteomics and Biomarker Discovery Coverage of CC, CYIC 8Cys containing group 8,
Karl Clauser Proteomics and Biomarker Discovery Alignment of 10 Cys containing protein groups 3,4,10,19 16 Protein Group Groups 3,4,10 too divergent to be paralogs. Examine group 19 to determine spectral coverage of variant AAs GroupProposed names 4.1mu1-hexatoxin-Hi1a 19.1mu1-hexatoxin-Hi1b 3.1mu1-hexatoxin-Hi2a 3.2mu1-hexatoxin-Hi2b 10.1mu1-hexatoxin-Hi3a
Karl Clauser Proteomics and Biomarker Discovery Coverage of 10 Cys containing groups 3,4,10,19 17
Karl Clauser Proteomics and Biomarker Discovery PSM Overlap for Group
Karl Clauser Proteomics and Biomarker Discovery Coverage of 6Cys containing groups 6, 9 19
Karl Clauser Proteomics and Biomarker Discovery Coverage of 6Cys groups 17, 22, 25, 27,
Karl Clauser Proteomics and Biomarker Discovery Group Cys - only intact MS/MS z (R)M F C K P L D\Q|Q\C\N K|D L H\C C|K P L/K C/R|R/S/N/N/G/R/K Y C/K P(-) (R)M F\C\K P L D\Q|Q\C N K\D L H\C\C|K P/L/K C/R/R/S/N/N/G R/K Y C/K P(-) (R)M F\C\K P L D\Q|Q\C N K\D L H\C C K P L K C|R/R/S/N/N/G/R/K/Y C/K P(-) (R)M F C\K P L D\Q\Q\C N K\D/L H\C\C|K P L K\C/R/R S/N/N/G/R/K/Y C K P(-) (R)M/F C/K P L\D\Q|Q\C N K D L H C C K P L K\C\R\R/S/N/N/G/R/K Y C K P(-) z9 z8 z7 z6 z5 (R)M/F/C\K/P L D\Q Q C N K D|L H/C C K P L K C R R S N N G R K\Y C K\P(-) z6 z5 (R)M/F/C|K/P L D\Q Q/C N/K/D|L H C C K P L K C R R S N N G R K\Y C K\P(-) (R)M F C|K/P/L/D/Q/Q/C/N/K/D/L H/C C K P L K C R R S N N G R K\Y C K P(-) z5 ETD CID HCD Cys Iodoacetamide 56 spectra 2 peptides
Karl Clauser Proteomics and Biomarker Discovery Group Cys - only intact MS/MS z Cys Iodoacetamide 56 spectra 2 peptides MFCKPLDQQCN reads: 6 CRRSNNGRKYCKP reads: 12 CNKDLHCCKPLKC reads: 1 CCKPLKCRR reads: 11 CRKPLKKRL reads: 66
Karl Clauser Proteomics and Biomarker Discovery Group Cys - only intact MS/MS z8-9 ETD 23 (R)M F C K P L D\Q|Q\C\N K|D L H\C C|K P L/K C/R|R/S/N/N/G/R/K Y C/K P(-) (R)M F\C\K P L D\Q|Q\C N K\D L H\C\C|K P/L/K C/R/R/S/N/N/G R/K Y C/K P(-)
Karl Clauser Proteomics and Biomarker Discovery Group Cys - only intact MS/MS z6-7, ETD 24 (R)M F\C\K P L D\Q|Q\C N K\D L H\C C K P L K C|R/R/S/N/N/G/R/K/Y C/K P(-) (R)M F C\K P L D\Q\Q\C N K\D/L H\C\C|K P L K\C/R/R S/N/N/G/R/K/Y C K P(-)
Karl Clauser Proteomics and Biomarker Discovery Group Cys - only intact MS/MS z5 25 (R)M/F C/K P L\D\Q|Q\C N K D L H C C K P L K\C\R\R/S/N/N/G/R/K Y C K P(-) (R)M F C|K/P/L/D/Q/Q/C/N/K/D/L H/C C K P L K C R R S N N G R K\Y C K P(-) ETD HCD
Karl Clauser Proteomics and Biomarker Discovery BLAST2GO – Functional Annotation 26 Export FASTA of Valid Hits from SM Run BLAST step Run GO mapping step Run annotation step Run InterProScan, SIGNALP Run GO-Slim Export results to SM categories file
Karl Clauser Proteomics and Biomarker Discovery Venom Toxin Nomenclature 27 King GF, Gentz MC, Escoubas P, Nicholson GM A rational nomenclature for naming peptide toxins from spiders and other venomous animals. Toxicon 52 (2008) 264–276.
Karl Clauser Proteomics and Biomarker Discovery Next Steps 28 Improve paralog inclusion in RNA-Seq Transcript Assembly. Call full length gene, predict signal & propeptide. 10 Cys toxins lack propeptide? CCC 8 cys toxins propeptide so long that the assembly doesn’t extend upstream enough to cover signal peptide? Group 23 looks like a fusion toxin, misassembled? 12 Cys, 2 pairs of adjacent CC’s perhaps 2 concatenated 6 Cys toxins Present in 2 libraries Run SM homology searches Improve SM monoisotopic m/z assignment for z>4. Obtain ETD MS/MS of intact toxins after MeAziridine Cys mod. Assemble spectra de novo. Name novel toxins.
Karl Clauser Proteomics and Biomarker Discovery Adding Charge to Cys for better ETD 29 Fig. 1. Overview of the de novo sequencing strategy. (I) UV trace of HPLC separation of crude venom extract from C. textile. (II) MALDI TOF MS of fraction i after no treatment, reduction, and alkylation. (III) On-line LC ESIMS/MS using CAD and ETD on reduced and alkylated aliquots of fraction i. The final step shows the conversion of Cys residues to dimethylated Lys analogs followed by ETD fragmentation; MS/MS is shown for the (M5H)5 ion of the 1, Da species in II. c ions are indicated by and z ions by. Shell image Copyright 2005, Richard Ling. Ueberheide BM, Fenyo¨ D, Alewood PF, and Chait BT. PNAS , 6910– Methylaziridine
Karl Clauser Proteomics and Biomarker Discovery Lys-C Cleaves at MeAziridine Cys mod 30 Cys cleavages observed only in most abundant proteins Kinetics: K >> C
Karl Clauser Proteomics and Biomarker Discovery 31 MeAziridine Cys mod Yields Great ETD Spectra (K)S/Y\W|K/G|H/G\V\C\S A\S|L|F|E|R|L|K\G\C(-) (K)C/I|G|Q|W|Y/N|G|Q|A S C|Q|S/T\F|m|G|L\F\K(S)
Karl Clauser Proteomics and Biomarker Discovery MeAziridine Cys mod Yields Great ETD Spectra 32 (K)C/N|Y|H\C|E\C\C|G/A T V\A\C|S/T|V\Y/V|G G\K(E) (K)C Y\C/D|Y|G|L|F|G|N\C|N\C|Y\K(R)
Karl Clauser Proteomics and Biomarker Discovery MeAziridine Cys mod Yields Great ETD Spectra 33 (Q)T C/G|G P|D D\C G|E|G/S C\C V|G|S|F|S|R\K(C) (N)W C|A|K/N\E|D\C\C\C P M\K(C)
Karl Clauser Proteomics and Biomarker Discovery Centipedes Spiders Scorpions S. Morsitans A. xerolimniorum L. weigensis H. jugulans C. westwoodi E. rubripes U. manicatus L. variatus L. buchari C. squama I. vescus Thereuopoda sp. S. foelschei C. Tropix H. infensa 7 scorpions 4 centipedes 1 spider