Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004: Agencourt, 454, Microchip, 2005: Nanofluidics, Network, VisiGen Affymetrix, Helicos, Solexa-Lynx
"Open-source" Personal Genome Project (PGP) Harvard Medical School IRB Human Subjects protocol submitted 16-Sep-2004, approved Aug 31, Gradual plan. Start with "highly-informed" individuals consenting to non-anonymous genomes & extensive phenotypes (medical records, imaging, omics). Cell lines in Coriell NIGMS Repository Diploid genome subsets at $0.1/kb, <3E-7 FP Errors How? Polony bead Sequencing-by-Ligation (SbL)
Analyses of single chromosomes (single cells, RNAs, particles) (1) When we only have one cell as in Preimplantation Genetic Diagnosis (PGD) or environmental samples (2) Candidate chromosome region sequencing (3) Prioritizing or pooling (rare) species based on an initial DNA screen. (4) Multiple chromosomes in a cell or virus (5) RNA splicing (6) Cell-cell interactions (predator-prey, symbionts, commensals, parasites)
CD44 Exon Combinatorics (Zhu & Shendure) Alternatively Spliced Cell Adhesion Molecule Specific variable exons are up-or-down-regulated in various cancers (>2000 papers) v6 & v7 enable direct binding to chondroitin sulfate, heparin… Zhu,J, et al. Science. 301:836-8.
Zhu J, Shendure J, Mitra RD, Church GM. Science 301: Single molecule profiling of alternative pre-mRNA splicing. Eph4 = murine mammary epithelial cell line Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic) CD44 RNA isoforms
Molecular Weight Assessment of Proteins in Total Proteome Profiles Using 1D-PAGE and LC/MS/MS. Proteome Sci. 3:6 (2005) Ahmad R, Nguyen DH, Wingerd MA, Church GM, Steffen MA. Candidates for alternative splicing (AS), endoproteolytic processing (EPP), & post-translational modifications (PTMs) in Lymphoblastoid cells Protein Name Predicted MW Observed MW Difference before & after leader cleavage Cytochrome c oxidase subunit IV isoform NADH dehydrogenase Coproporphyrinogen oxidase MHC II, DQ NADH (ubiquinone) Fe-S protein Mito short-chain enoyl-coA hydratase Peptidylprolyl isomerase B (cyclophilin)
-Glc-1P ADP-Glc -1,4-glucosyl-glucan glycogen Central Carbon Metabol. glgC glgX glgA glgB glgP Zinser et al. unpubl. Light regulated Circadian metabolism
Viral Photosynthetic Proteins Podovirus P-SSP7 46 kb PCHLIPsFdD1 12kb 24kb PCHLIPsFdD1 12kb 24kb ~500bp HLIPsD1D2 6.4kb2.8kb ~500bp Myovirus P-SSM4 181 kb HLIPsD1D2 6.4kb2.8kb Lindell, Sullivan, Chisholm et al HLIPD1 Myovirus P-SSM2 255 kb
Photosynthesis genes in marine viruses yield proteins during host infection. Nature :86-9. Lindell D, Jaffe JD, Johnson ZI, Church GM, Chisholm SW.
Photosynthesis genes in marine viruses yield proteins during host infection. Nature :86-9. Lindell D, Jaffe JD, Johnson ZI, Church GM, Chisholm SW. 15N 13C synthetic standards host phage
Improving MS Peptide Coverage ? Ionization efficiency X Ions outside the mass range of the analyzer ? Chromatographic behavior ? Sample preparation bias X Instrument duty cycle Improve Spectra interpretation over current algorithms –Details of fragmentation patterns –Dipeptide P, DE/KR, V.G intensity effects –B & Y ions unequal & co-dependent –More intense ions in middle of peptides MDQuest: Mike Chou, Dan Schwartz, Steve Gygi, Josh Elias
SEQUEST vs MDQUEST Performance
MapQuant is a program designed to isolate unique organic species and quantify their relative abundances from an LC/MS experiment. Scheme: Data from an LC/MS experiment are analyzed after being formatted into a data structure called a 2-D map, analogous to a gray-scale image. Scan number: N N+1 N+2 N+3 2-D peptide map time or scans m/z units
2-D map Retention time m/z units MapQuant Gives a List of All Organic Species In the Sample MapQuant
MapQuant is a program designed to isolate unique organic species and quantify their relative abundances from an LC/MS experiment. Scheme: Data from an LC/MS experiment are analyzed after being formatted into a data structure called a 2-D map, analogous to a gray-scale image. Scan number: N N+1 N+2 N+3 2-D peptide map time or scans m/z units
2-D map Retention time m/z units MapQuant Gives a List of All Organic Species In the Sample MapQuant
Leptos et al. Proteomics 2006
MapQuant is publicly available at
Leptos et al. Proteomics 2006
retention time (in min) m/z units EKLAVSAR QEPERSEK DAFLSGER ? ? ? MapQuant gives me a list of all organic species in the sample BUT WHAT ARE THEIR IDENTITIES?
MapQuant identifies approx. 2x10 4 organic species per LC/MS experiment. ONLY ~ 500 (3%) organic species have fragmentation (CID) spectra and hence sequence IDs retention time (in min) EKLAVSAR QEPERSEK DAFLSGER ? ? ? m/z units Dealing With Many Peptides (Organic Species) 22 = CID spectrum or MS/MS event
Dealing With Many Peptides (Organic Species) retention time (in min) EKLAVSAR QEPERSEK DAFLSGER ? ? ? Database of peptides from ALL LC/MS experiments carried out on Prochlorococcus samples (rt, m/z) coordinates m/z units
TOTAL NUMBER OF ORFS: Protein Distribution Among Experiments
Sequence Coverage of the Protein groES
Summary Proteome Sci. 3:6 (2005) Ahmad R, Nguyen DH, Wingerd MA, Church GM, Steffen MA. Open Personal Genome Project (PGP) including Proteomics Single molecule RNAs for alternative splicing (AS) Gel –MS methods for endoproteolytic processing MapQuest for MS quantitation without isotopic labeling
Thanks to: George Church (Harvard GTL & CEGS Centers) 5-Jan-2006 HPCGG Landsdowne 2 PM Personal Genomics meets Quantitative Proteomics NHGRI Seq Tech 2004: Agencourt, 454, Microchip, 2005: Nanofluidics, Network, VisiGen Affymetrix, Helicos, Solexa-Lynx