Targeted Sequencing of Human Genomes, Transcriptomes, and Methylomes Jin Billy Li George Church Lab Harvard Medical School

Slides:



Advertisements
Similar presentations
RNA-seq library prep introduction
Advertisements

Functional Genomics with Next-Generation Sequencing
1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences.
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
Epigenetics Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Next–generation DNA sequencing technologies – theory & practice
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.
Canadian Bioinformatics Workshops
Tools for Molecular Biology Amplification. The PCR reaction is a way to quickly drive the exponential amplification of a small piece of DNA. PCR is a.
Greg Phillips Veterinary Microbiology
Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome Jay Shendure, Gregory J. Porreca, Nikos B. Reppas, Xiaoxia Lin, John P. McCutcheon.
SNP Genotyping Without Probes by High Resolution Melting of Small Amplicons Robert Pryor 1, Michael Liew 2 Robert Palais 3, and Carl Wittwer 1, 2 1 Dept.
Microarray Type Analyses using Second Generation Sequencing
SNP Discovery in the Human Genome C244/144 November 21, 2005.
Genomics tools to identify the molecular basis of complex traits Justin Borevitz Salk Institute naturalvariation.org.
Global dissection of cis and trans regulatory variations in Arabidopsis thaliana Xu Zhang Borevitz Lab.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Microarray Data Analysis Using R Studies in Tissue Databases Mark Reimers, NCI.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
Review of important points from the NCBI lectures. –Example slides Review the two types of microarray platforms. –Spotted arrays –Affymetrix Specific examples.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang.
High Throughput Sequencing
and analysis of gene transcription
DNA Forensics. DNA Fingerprinting - What is It? Use of molecular genetic methods that determine the exact genotype of a DNA sample in a such a way that.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
The virochip (UCSF) is a spotted microarray. Hybridization of a clinical RNA (cDNA) sample can identify specific viral expression.
The Genome is Organized in Chromatin. Nucleosome Breathing, Opening, and Gaping.
Investigating the use of Multiple Displacement Amplification (MDA) to amplify nanogram quantities of DNA to use for downstream mutation screening by sequencing.
The Center for Medical Genomics facilitates cutting-edge research with state-of-the-art genomic technologies for studying gene expression and genetics,
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Expression of the Genome The transcriptome. Decoding the Genetic Information  Information encoded in nucleotide sequences contained in discrete units.
Verna Vu & Timothy Abreo
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Microarrays and Their Uses Brad Windle, Ph.D
Development and Application of SNP markers in Genome of shrimp (Fenneropenaeus chinensis) Jianyong Zhang Marine Biology.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
HaloPlexHS Get to Know Your DNA. Every Single Fragment.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Other genomic arrays: Methylation, chIP on chip… UBio Training Courses.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Whole Genome Approaches to Cancer 1. What other tumor is a given rare tumor most like? 2. Is tumor X likely to respond to drug Y?
Introduction to RNAseq
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.

Lecture-3 EXOME SEQUENCING Huseyin Tombuloglu, Phd GBE423 Genomics & Proteomics.
No reference available
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Validation of RNA-Seq data An introduction to qPCR Sarah Diermeier, Ph.D. Cold Spring Harbor Laboratory
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
Statistical Genomics Zhiwu Zhang Washington State University Lecture 6: Genotype.
Canadian Bioinformatics Workshops
Next generation sequencing
Lecture 6: Genotype by sequencing
Expression of the Genome
Introduction to NGS.
Stephen Clark – Reik Lab, Babraham Institute
Expression of the Genome
Lecture 6: Genotype by sequencing
Sequencing Data Analysis
DNA Clean-Up Using MagNA Beads
“TaqMan genotyping Assay’’
Reliable Identification of Genomic Variants from RNA-Seq Data
Digital Gene Expression – Tag Profiling Sample Preparation
Genomic & RNA Profiling Core Facility
Sequencing Data Analysis
Presentation transcript:

Targeted Sequencing of Human Genomes, Transcriptomes, and Methylomes Jin Billy Li George Church Lab Harvard Medical School

Genetic Loci X Sample Size = Information # samples # genetic loci PCR seq Mass-spec Shotgun seq RNA-seq ChIP-seq SNP array

Target Capturing with Padlock Probes (aka MIPs) feature 1feature n pol lig … PCR (or RCA) … Porreca et al., Nat Methods 2007

Mass Production of Padlock Oligos 100 nt 150 nt 50 nt 55k features of up to 200nt

~10,000-fold Improvement Since Nov longer hybridization time; 2. more probes; 3. right [dNTP] 123 * * 20-fold improvement already by better probe design and synthesis Li et al., in prepration

~10,000-fold Improvement Since Nov longer hybridization time; 2. more probes; 3. right [dNTP] 123 * * 20-fold improvement already by better probe design and synthesis Li et al., in prepration

~10,000-fold Improvement Since Nov longer hybridization time; 2. more probes; 3. right [dNTP] 123 * * 20-fold improvement already by better probe design and synthesis Li et al., in prepration

Improved Technology -> Better Performance 95% captured 85% within 100-fold range 55% within 10-fold range Sensitivity + Uniformity Correlation Nov 2007 Current Li et al., in prepration

Summary of Improvements Nov 2007Current Specificity~100% Sensitivity/Multiplexity (of 55k)18%95% Uniformity (in 100-fold range)16%85% Correlation of replicates (r) Accuracy (heterozygous calls)31%99%

Targeted Capturing of Genomes –Exome: PGP etc. –Contiguous regions or gene panels –SNPs –Hypermutable CpG dinucleotides Transcriptomes –Alleotyping –RNA editing sites Methylomes –CpG methylation

Targeted Capturing of Genomes –Exome: PGP etc. –Contiguous regions or gene panels –SNPs –Hypermutable CpG dinucleotides Transcriptomes –Alleotyping –RNA editing sites Methylomes –CpG methylation

Predicting Putative Editing Sites A in the genome G in some mRNAs or ESTs A -> I (G) RNA Editing Post-transcriptional A -> I I is read as G during translation Only 10 targets are known in human coding regions

36,000 predicted editing sites gDNA + 7 tissue cDNAs from an individual Padlock + Solexa: 239 sites found to be edited Validation (PCR + Sanger): 18 of 20 random sites are obviously edited Discovery of 100’s of Novel Editing Sites with Erez Levanon, in preparation

Genomic DNA RNA - intestine RNA - kidney RNA - diencephalon RNA - frontal lobe RNA - corpus callosum RNA - cerebellum RNA - adrenal Example: VEZF1

Bisulfite Padlock Probes (BSP): CpG Methylation Bisulfite-treated genome “3-base” genome High specificity of padlock

Methylation Level Accurately Measured r = BSP-BSP correlation BSP-Sanger correlation Methylation level measured by BSP sequencing Methylation level estimated by Sanger sequencing Methylation level, replicate 1 Methylation level, replicate 2 r = 0.966

Methylation Pattern around Genes Gene-Body Methylation with Madeleine Price Ball, in preparation (poster)

George Church Padlock technology Kun Zhang John Aach Abraham Rosenbaum Jay Shendure Greg Porreca Annika Ahlford RNA editing Erez Levanon Jung-Ki Yoon CpG methylation Madeleine Price Ball Church Lab Acknowledgements Agilent Emily Leproust Wilson Woo Sequencing Yuan Gao Bin Xie Bob Steen

Superior Quality of Padlock Oligos 100 nt 150 nt 50 nt PCR (2x) Solexa sequencing 55k features of up to 200nt Fraction of probes

U From Agilent Oligos to Padlock Probes amplification and selection T 18bp Agilent oligo, 136 bp 18bp PCR * p exonuclease USER + DpnII DpnII NN UAUA U Annealed with DpnII guide oligo Padlock probe * *

Heterozygous Genotypes Correctly Called Homozygous wild type Heterozygous variation Homozygous variation beforeafter

Methods in Comparison PadlockArray-based hyb Upfront probe cost (10-20% of exome) $12,000 per 55k 100mers$600 per 385k 70mers Probes amplifiable?YesNo Reaction phaseSolution, μlSurface, 200 μl Enzymatic hyb?YesNo gDNA required~0.5-1 μg20 μg (WGA) Efficiency (->accuracy)1%N/A (<0.1%?) Uniformity100-fold range10-fold range Specificity~100% on target30-80% on or near target

Differential Clamping at Ligation Junction

% GC VS Capturing Efficiency

99% Concordance Between Padlock and HapMap

The Editing “Calls” Are Well Correlated r = 0.964

Bisulfite-treated genome 10k CpG sites tiling the ENCODE regions –1 CpG site every 3kb region on average High specificity –79 of 80 Sanger reads match correct locations Bisulfite Padlock Probes (BSP): CpG Methylation

B strep B P P B B collected in a tube PCR λ exonuclease shearing, end polishing adapter ligation hybridization in closed-tube solution denaturing, PCR Li et al., unpublished

Methods in Comparison PadlockArray-based hybBiotin-coupled hyb Upfront probe cost (10-20% of exome) $12,000 per 55k 100mers $600 per 385k 70mers $500 per 244k 60mers Probes amplifiable?YesNoYes Reaction phaseSolution, μlSurface, 200 μlSolution, μl Enzymes in hyb?YesNo gDNA required~0.5-1 μg20 μg (WGA)~0.5-1 μg Efficiency (->accuracy)1%N/A (<0.1%?)~10%? Uniformity100-fold range10-fold range10-fold range? Specificity~100% on target 30-80% on or near target ~55% on or near target

Two Tech Replicates Are Well Correlated Ranked target sites Number of reads per site Counts, replicate 1 Counts, replicate 2 Uniformity Correlation of counts