28-Apr 8:15AM – 8:45AM Next-Gen Seq Data Management Thanks to: Advancing Personal Genetics with Second Generation Sequencing
Context: Personal Genomics Landscape direct-to-consumer -- hybrid -- research only REVEAL * * * *
23andme
Over 600 alleles of BRCA1 (Myriad/DNAdirect sequencing not chips)
PersonalGenomes.org Project Goals 1) Low cost: <$1K : 98% exome (or more) 2) Active subject participation, informed redaction 3) Avoid over-promising de-identification 4) Entrance exam to ensure highly informed consent 5) Multiple samples to ensure consistent IDs 6) Open access (not just researcher subset) 7) Trait questionnaire, stem cell RNA, biome 8) Cells available for personal functional genomics 9) Scaleable to 100,000 diverse research subjects Coriell GM2 Employers/Insurers > Non-Discrimination Act Actionable alleles are rare > all at risk Non-actionable alleles > activism 1731
3 Exponential technologies 3 to 18 month doubling times Shendure J, Mitra R, Varma C, Church GM, 2004 Nature Reviews of Genetics. Kurzweil 2002; Moore 1965 urea B12 tRNA telegraph Computation & Communication Analytic tRNA Synthetic chemistry human Gbp chips
Chips vs. Gen-2 Sequencing Illumina Affymetrix bead-array Roche-454 Illumina ABI-SOLiD Harvard-Danaher Polonator-G007 Chips: 0.02% of the genome – assumes common DNA variants stay associated with deleterious variants over 50,000 years Sequencing 98% genome accesses the deleterious variants directly Helicos
G A C T Multiplex Cyclic Sequencing by Synthesis Single instrument, multiple chemistries: polonies on slides or beads Polymerase -or- Ligase Shendure, Porreca, et al Science Illumina, IBS AB- SOLiD, CGI Mitra, et al Analyt. Biochem NAR
36 to 64 flowcells (+ DNA barcodes) 2 to 4 billion beads 8.5 thick sequence image
Open-source hardware, software, wetware: Polonator G.007 (12TB image > 120 Gbp /run) Enzyme/oligo kits Polymerase or Ligase chemistries $150K including computer & 1 yr service, software, support Danaher Inc.
Effect of improvements on cost ImprovementFactorFeature cost/run Sequencing cost/run Gb/runReagent cost/Gb Fold decrease None1$1,677$68510$292 Flowcell volume5$1,677$13710$ Useable yield6$1,677$68560$396 Instrument speed2$3,354$1,37020$2361 Emulsion sorting18$93$68510$783 Readlength 48bp3.7$1,677$2,53437$1142 ALL$186$ $ Polonator instrument 3 yr amortization: $150k / 300 runs = $500/run = $50/Gb $150k / 81 runs = $1850/run = $4.2/Gb ($10 vs. $2000 / Gb for other 2 nd gen)
Personal genome sequencing options/goals Technology Genome Cost Raw bp AB %$30M 7x = 42 Gb (3.5x each) Knome 98%$350K 15x = 84 Gb SNP-chip 0.02%$1K 2 Mbp PGP coding 1%$90 30x = 1Gb PGP RNA 99%$20 30x 20K*n = 60 Mb (n=100 cell types for RNA) -path/resistome - $20 rRNA + 20K genes VDJ-Immunome- $20 ?
Selective genome sequencing Shendure, et al. Science 309(5741): Nilsson et al. (2006) Trends Biotechnol 24:83. Red=Synthetic; Yellow=genome/cDNA How do we optimize >100K 100mers ? 8 ways to capture alleles from genomic or c-DNA In vitro Paired-end- tags (PET) Gap fill Cleave & ligate Zhang, Chou, Shendure, Li, Leproust, Dahl, Davis, Nilsson, Church For rearrangements Hybr-select-chip 5. Hybr-select-solution 6. fluidic PCR 7. Multiplex PCR 1.
Circle Capture DNA from Chips
Aug 2007 R=.53 Jan 2008 R=.986 Zhang, Li et al. unpublished Gap fill Circle-capture 1% genome
Genome to Phenome: Population Variation G A T C Zhang & Church unpublished cis Trans Gene products Gene Expression Genome Environment Traits
G A T C Allele-specific expression (ASE) Combine all cis element variants G A AAAAA T C T T Enhancer, promoter, splicing, polyA, termination, transport, decay. Eliminate environmental & trans-acting variation among individuals. G A G G Allele-specific transcription factor binding TF ChIP-Seq Digital RNA allelotyping Zhang, LI, Church unpublished Forton et al. Genome Res. 2007
Genomic DNA Lymphocyte cDNA Lymphocyte cDNA Fibroblast cDNA Keratinocyte rs , ATP5F1, ATP synthase T/C = 0.51 T/C = 3.47 T/C = 3.73 Tissue specific & allele specific gene expression confirmatory assays Kun Zhang & Alice Li
25X probe * 72X time =1800X Better efficiency. Kun Zhang & Billy Li Genomic DNA Aug 2007 Genomic DNA Jan 2008 cDNA Jan 2008
Challenge: Multiple cell types from healthy adults 3mm skin sample
PGP Physicians Network Volunteers Induction of Multiple Gene Sets (not necessarily functional tissues) Primary fibroblasts Complex Traits via Allele-Specific Gene Expression Induced Stem Cells mRNA Multiplexed Differentiation Multiplexed Reprogramming Sequence tag quantitation Jay Lee et al. unpublished
Induced Pluripotent Stem Cell Generation & Transdifferentiation (Oct4/Sox2/Myc/Klf4) Retroviral Infection Tissue Culture on a Mouse Feeder Layer ES Cell Colony Identification Clonal Isolation and Propagation Embryoid Body Induction & Guided Differentiation Adenoviral Infection Mixture of differentiated cell types & Guided Differentiation 2 months Multiple integration sites 1 week No genomic integration Yamanaka, Daley, Thomson Hochedlinger, Jaenisch labs Lee & Church
Multiple cell-types with transdifferentiation Retroviral Infection Adenoviral Infection MyoD CD34 Collagen
Kun Zhang & Fan Liang Green: phase contrast image Red: Cy5-labelled Alu probe Nunc or UCSD Haplotyping by amplification of single chromosomes or fragments
Ultra-clean conditions for reduction of background amplification + Real-Time monitoring Post-amplification chip hybridization distinguishes alleles Amplification variation random & easily filled by PCR error rate < –5 Single-cell or Single DNA-fragment (haplotype) sequencing: 5 Mbp Zhang et al. Nature Biotec 2006
Environments of Genomes VDJ-ome TRAITS biome RNAome PERSONAL GENOME One in a life-time genome + yearly ( to daily) tests Bio-weather map : Allergens, Microbes, Viruses
PGP Resistome: 18 Antibiotics Dantas, Sommer, Church unpublished
Bacteria Subsisting on 18 Antibiotics Dantas Sommer Church Science 2008
Personal genome sequencing options/goals Technology Genome Cost Raw bp AB %$30M 7x = 42 Gb (3.5x each) Knome 98%$350K 15x = 84 Gb SNP-chip 0.02%$1K 2 Mbp PGP coding 1%$90 30x = 1Gb PGP RNA 99%$20 30x 20K*n = 60 Mb (n=100 cell types for RNA) -path/resistome - $20 rRNA + 20K genes VDJ-Immunome- $20 ?