28-Apr 8:15AM – 8:45AM Next-Gen Seq Data Management Thanks to: Advancing Personal Genetics with Second Generation Sequencing.

Slides:



Advertisements
Similar presentations
High throughput sequencing Barbera van Schaik
Advertisements

The Past, Present, and Future of DNA Sequencing
Virus discovery-454 sequencing
The Good, Bad, and Ugly of Next-Gen Sequencing
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
Next–generation DNA sequencing technologies – theory & practice
Recombinant DNA technology
20,000 GENES IN HUMAN GENOME; WHAT WOULD HAPPEN IF ALL THESE GENES WERE EXPRESSED IN EVERY CELL IN YOUR BODY? WHAT WOULD HAPPEN IF THEY WERE EXPRESSED.
Detecting DNA-protein Interactions Xinghua Lu Dept Biomedical Informatics BIOST 2055.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
What Is Genomics? Genomics is the study of how the entire genome of a species functions as a unit and evolves over time. It is the study of life’s blueprint,
Next-generation sequencing and PBRC. Next Generation Sequencer Applications DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene.
Greg Phillips Veterinary Microbiology
Transcriptomics Jim Noonan GENE 760.
Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome Jay Shendure, Gregory J. Porreca, Nikos B. Reppas, Xiaoxia Lin, John P. McCutcheon.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
Personal health citzenship Huntington's Chorea Nancy Wexler’s family Adrenoleukodystrophy Augusto Odone’s son Doug Melton’s son, Sam, has diabetes Parkison’s.
1 8-Jul-08 Secretary’s Advisory Committee on Genetics, Health, & Society Thanks to: Personal Genomes Services : Research.
Molecular Genomic Imaging Center (CEGS) Harvard / Wash U George Church, Rob Mitra Greg Porreca, Jay Shendure Sequencing by Ligation on Polony Beads with.
Single Cell, RNA, & Chromosome Sequencing Technologies
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
Chris Chander, Luke Adea BioSci D145 Feb. 12, 2015
Targeted Sequencing of Human Genomes, Transcriptomes, and Methylomes Jin Billy Li George Church Lab Harvard Medical School
What Can You Do With qPCR?
High Throughput Sequencing
CS 6293 Advanced Topics: Current Bioinformatics
Next Generation DNA Sequencing Platforms: Evolving Tools for
Diabetes and Endocrinology Research Center The BCM Microarray Core Facility: Closing the Next Generation Gap Alina Raza 1, Mylinh Hoang 1, Gayan De Silva.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
Analysis of Transgenic Plants. 1.Regeneration on Selective Medium Selectable Marker Gene.
Sequencing Technologies and Applications at JGI
Fine Structure and Analysis of Eukaryotic Genes
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
How do you identify and clone a gene of interest? Shotgun approach? Is there a better way?
Gene expression and DNA microarrays Old methods. New methods based on genome sequence. –DNA Microarrays Reading assignment - handout –Chapter ,
Genetics: Chapter 7. What is genetics? The science of heredity; includes the study of genes, how they carry information, how they are replicated, how.
DNA Chips Attach DNA to tiny spots on glass slides (i.e., chip). Hybridize fluorescently-labeled DNA probes to chip. Detect hybridization to different.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Literature reviews revised is due4/11 (Friday) turn in together: revised paper (with bibliography) and peer review and 1st draft.
Transcription factor profiling in individual hematopoietic progenitors by digital RT-PCR Luigi Warren, David Bryder, Irving L. Weissman, and Stephen R.
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Different approaches: Site degeneracy data from libraries with randomized target sites (in vitro, complexity of the initial library is crucial, involves.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Proteome and Gene Expression Analysis Chapter 15 & 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Jan. 13, 2011 B4730/5730 Plant Physiological Ecology Introduction to Physiology and Genetics.
Rest of Chapter 11 Chapter 12 Genomics, Proteomics, and Transgenics Jones and Bartlett Publishers © 2005.
Cell Therapy in CV Disease: Newest Evidence. 2 Induction of pluripotent stem cells from human fibroblasts Study Source of fibroblasts Transcription factors.
Microarray: An Introduction
Introduction to Illumina Sequencing
Next-generation sequencing technology
Next generation sequencing
Lecture 6: Genotype by sequencing
Cancer Genomics Core Lab
Lecture 8 A toolbox for mechanistic biologists (continued)
Next-generation sequencing technology
Functional Genomics in Evolutionary Research
Design and Analysis of Single-Cell Sequencing Experiments
Chapter 20 – DNA Technology and Genomics
Lecture 6: Genotype by sequencing
Induction of Pluripotent Stem Cells from Mouse Embryonic and Adult Fibroblast Cultures by Defined Factors  Kazutoshi Takahashi, Shinya Yamanaka  Cell 
Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine
Volume 13, Issue 1, Pages (July 2013)
Next-generation DNA sequencing
Luigi Warren, David Bryder, Irving L. Weissman, and Stephen R. Quake
Tools for Molecular Biology
Presentation transcript:

28-Apr 8:15AM – 8:45AM Next-Gen Seq Data Management Thanks to: Advancing Personal Genetics with Second Generation Sequencing

Context: Personal Genomics Landscape direct-to-consumer -- hybrid -- research only REVEAL * * * *

23andme

Over 600 alleles of BRCA1 (Myriad/DNAdirect sequencing not chips)

PersonalGenomes.org Project Goals 1) Low cost: <$1K : 98% exome (or more) 2) Active subject participation, informed redaction 3) Avoid over-promising de-identification 4) Entrance exam to ensure highly informed consent 5) Multiple samples to ensure consistent IDs 6) Open access (not just researcher subset) 7) Trait questionnaire, stem cell RNA,  biome 8) Cells available for personal functional genomics 9) Scaleable to 100,000 diverse research subjects Coriell GM2 Employers/Insurers > Non-Discrimination Act Actionable alleles are rare > all at risk Non-actionable alleles > activism 1731

3 Exponential technologies 3 to 18 month doubling times Shendure J, Mitra R, Varma C, Church GM, 2004 Nature Reviews of Genetics. Kurzweil 2002; Moore 1965 urea B12 tRNA telegraph Computation & Communication Analytic tRNA Synthetic chemistry human Gbp chips

Chips vs. Gen-2 Sequencing Illumina Affymetrix bead-array Roche-454 Illumina ABI-SOLiD Harvard-Danaher Polonator-G007 Chips: 0.02% of the genome – assumes common DNA variants stay associated with deleterious variants over 50,000 years Sequencing 98% genome accesses the deleterious variants directly Helicos

G A C T Multiplex Cyclic Sequencing by Synthesis Single instrument, multiple chemistries: polonies on slides or beads Polymerase -or- Ligase Shendure, Porreca, et al Science Illumina, IBS AB- SOLiD, CGI Mitra, et al Analyt. Biochem NAR

36 to 64 flowcells (+ DNA barcodes) 2 to 4 billion beads 8.5  thick sequence image

Open-source hardware, software, wetware: Polonator G.007 (12TB image > 120 Gbp /run) Enzyme/oligo kits Polymerase or Ligase chemistries $150K including computer & 1 yr service, software, support Danaher Inc.

Effect of improvements on cost ImprovementFactorFeature cost/run Sequencing cost/run Gb/runReagent cost/Gb Fold decrease None1$1,677$68510$292 Flowcell volume5$1,677$13710$ Useable yield6$1,677$68560$396 Instrument speed2$3,354$1,37020$2361 Emulsion sorting18$93$68510$783 Readlength 48bp3.7$1,677$2,53437$1142 ALL$186$ $ Polonator instrument 3 yr amortization: $150k / 300 runs = $500/run = $50/Gb $150k / 81 runs = $1850/run = $4.2/Gb ($10 vs. $2000 / Gb for other 2 nd gen)

Personal genome sequencing options/goals Technology Genome Cost Raw bp AB %$30M 7x = 42 Gb (3.5x each) Knome 98%$350K 15x = 84 Gb SNP-chip 0.02%$1K 2 Mbp PGP coding 1%$90 30x = 1Gb PGP RNA 99%$20 30x 20K*n = 60 Mb (n=100 cell types for RNA)  -path/resistome - $20 rRNA + 20K genes VDJ-Immunome- $20 ?

Selective genome sequencing Shendure, et al. Science 309(5741): Nilsson et al. (2006) Trends Biotechnol 24:83. Red=Synthetic; Yellow=genome/cDNA How do we optimize >100K 100mers ? 8 ways to capture alleles from genomic or c-DNA In vitro Paired-end- tags (PET) Gap fill Cleave & ligate Zhang, Chou, Shendure, Li, Leproust, Dahl, Davis, Nilsson, Church For rearrangements Hybr-select-chip 5. Hybr-select-solution 6.  fluidic PCR 7. Multiplex PCR 1.

Circle Capture DNA from Chips

Aug 2007 R=.53 Jan 2008 R=.986 Zhang, Li et al. unpublished Gap fill Circle-capture 1% genome

Genome to Phenome: Population Variation G A T C Zhang & Church unpublished cis Trans Gene products Gene Expression Genome Environment Traits

G A T C Allele-specific expression (ASE) Combine all cis element variants G A AAAAA T C T T Enhancer, promoter, splicing, polyA, termination, transport, decay. Eliminate environmental & trans-acting variation among individuals. G A G G Allele-specific transcription factor binding TF ChIP-Seq Digital RNA allelotyping Zhang, LI, Church unpublished Forton et al. Genome Res. 2007

Genomic DNA Lymphocyte cDNA Lymphocyte cDNA Fibroblast cDNA Keratinocyte rs , ATP5F1, ATP synthase T/C = 0.51 T/C = 3.47 T/C = 3.73 Tissue specific & allele specific gene expression confirmatory assays Kun Zhang & Alice Li

25X probe * 72X time =1800X Better efficiency. Kun Zhang & Billy Li Genomic DNA Aug 2007 Genomic DNA Jan 2008 cDNA Jan 2008

Challenge: Multiple cell types from healthy adults 3mm skin sample

PGP Physicians Network Volunteers Induction of Multiple Gene Sets (not necessarily functional tissues) Primary fibroblasts Complex Traits via Allele-Specific Gene Expression Induced Stem Cells mRNA Multiplexed Differentiation Multiplexed Reprogramming Sequence tag quantitation Jay Lee et al. unpublished

Induced Pluripotent Stem Cell Generation & Transdifferentiation (Oct4/Sox2/Myc/Klf4) Retroviral Infection Tissue Culture on a Mouse Feeder Layer ES Cell Colony Identification Clonal Isolation and Propagation Embryoid Body Induction & Guided Differentiation Adenoviral Infection Mixture of differentiated cell types & Guided Differentiation 2 months Multiple integration sites 1 week No genomic integration Yamanaka, Daley, Thomson Hochedlinger, Jaenisch labs Lee & Church

Multiple cell-types with transdifferentiation Retroviral Infection Adenoviral Infection MyoD CD34 Collagen

Kun Zhang & Fan Liang Green: phase contrast image Red: Cy5-labelled Alu probe Nunc or UCSD Haplotyping by amplification of single chromosomes or fragments

Ultra-clean conditions for reduction of background amplification + Real-Time monitoring Post-amplification chip hybridization distinguishes alleles Amplification variation random & easily filled by PCR error rate < –5 Single-cell or Single DNA-fragment (haplotype) sequencing: 5 Mbp Zhang et al. Nature Biotec 2006

Environments of Genomes VDJ-ome TRAITS  biome RNAome PERSONAL GENOME One in a life-time genome + yearly ( to daily) tests Bio-weather map : Allergens, Microbes, Viruses

PGP  Resistome: 18 Antibiotics Dantas, Sommer, Church unpublished

Bacteria Subsisting on 18 Antibiotics Dantas Sommer Church Science 2008

Personal genome sequencing options/goals Technology Genome Cost Raw bp AB %$30M 7x = 42 Gb (3.5x each) Knome 98%$350K 15x = 84 Gb SNP-chip 0.02%$1K 2 Mbp PGP coding 1%$90 30x = 1Gb PGP RNA 99%$20 30x 20K*n = 60 Mb (n=100 cell types for RNA)  -path/resistome - $20 rRNA + 20K genes VDJ-Immunome- $20 ?