Download presentation
Presentation is loading. Please wait.
Published bySheryl Harrington Modified over 9 years ago
1
How will new sequencing technologies enable the HMP? Elaine Mardis, Ph.D. Associate Professor of Genetics Co-Director, Genome Sequencing Center Washington University School of Medicine emardis@wustl.edu
2
Advantages of Next Gen Platforms No sub-cloning, no use of E. coli as host - cloning bias abolished - one FTE can keep several instruments busy Each sequence is from a unique DNA molecule - quantitation is possible through “counting” - enhanced dynamic range - detection of rare variants Multiple sequence-based assays on one platform emardis@wustl.edu
3
New Sequencing Platforms Roche FLX Sequencer Illumina 1G Analyzer ABI SOLiD Sequencer Helicos Single-molecule sequencer emardis@wustl.edu
4
Roche FLX: Vital Statistics >100Mb data/7 hours/$16K Read lengths average 250 bp Accuracy is hindered by homopolymer run in/dels Coverage model is higher than for 3730 data emardis@wustl.edu © Elaine Mardis, Ph.D. Currently: By year’s end: Improved pipeline and read assembly software Paired end reads 400 bp read lengths Bar-code tagging of libraries
5
Illumina 1G Analyzer: Vitals 1 Gb/4 days/$3-5000 40 bp read lengths, 8 channel flow cell Read accuracy is highest in 1st 25 bp, ~1% overall error rate Biased representation of high AT regions Currently: By year’s end: Paired end read capability 50 bp read lengths Improved short read mapping, assembly algorithms (?) emardis@wustl.edu
6
Cross-Platform Comparisons Platform cost $350K$500K$395K Read length 650 bp +250 bp40-50 bp Cost/run $55$16,000$3-5,000 Mbp/day 1.4200333 Cost/Mbp $880$160$5 Accuracy high No subs, Indels at homopolymers high Paired end reads YesComingYes* Criterion 3730Roche Illumina emardis@wustl.edu © Elaine Mardis, Ph.D.
7
AB SOLiD™: Vital Statistics 500Mb-1Gb/5 days/?$$ 50 base pair read lengths/ paired end or fragment reads Ligation based sequencing with high accuracy due to 2-base encoding Analysis software is unknown Early access platform due Q3 of ‘07 emardis@wustl.edu
8
HeliScope sequencer Single molecule detection obviates PCR amplification step >25Mbp/hour initial data rate, 1000Mbp/hour ultimately with <1% error rate Short read lengths, single molecule sequencing with high fidelity Two 25 channel flow cells Read mapping/assembly capability (?) emardis@wustl.edu
9
Comparative metagenomics: Cecal contents of obese mice (ob/ob) and lean littermates EXPERIMENTAL DESIGN: 1)Remove cecal contents of 2 ob/ob, 2 +/+, and 1 ob/+ C57Bl/6J mice and isolate DNA. 2)454 pyrosequencing of total DNA - 350,000 reads/mouse (one ob/ob, one +/+ mouse). 3)Compare data from each mouse to all known bacterial sequences. 4)Use data clustering methods to examine similarities and differences between all 5 mice that were sequenced. 5)Perform microbiota transplantation to test for ability to transfer phenotype to gnotobiotic mice. emardis@wustl.edu © Elaine Mardis, Ph.D.
10
Next Gen RNA Sequencing Our laboratory has developed a robust full-length cDNA process for 454-based sequencing of eukaryotic transcriptomes that features low input of total RNA, enzyme-based normalization and the ability to preferentially sequence the 5’ ends of cDNAs. We presently are working to modify this approach for sequencing microbiotal transcriptomes and clinical isolates likely to contain viral RNA genomes (e.g. nasal lavage samples). emardis@wustl.edu © Elaine Mardis, Ph.D.
11
Illumina ‘Mockagenomics’ Experiment emardis@wustl.edu We created two mock metagenomic samples by combining known bacterial and human genomic DNAs and sequenced them by Illumina platform to generate short (30bp) reads. We plan to compare the relative strengths of classification by assembly and alignment to those of “signature” characterization (GC content, kmer analysis) for short read data
12
Practical Issues DNA quality and quantity Value of paired end vs. fragment reads Normalization vs. quantitation Depth of “search space” emardis@wustl.edu
13
Sample prep Evaluate DNA Fragment (2-500bp) Repair ends Adapter ligate Enrich Amplify on bead(Roche/AB) or on glass slide (Illumina) Evaluate DNA Fragment (2.5kb) Repair ends Adapter ligate Methylate Restrict adapters Circularize 2° restriction with type IIS enzyme Purify tags+adapter Amplify Fragment reads Paired end reads emardis@wustl.edu
14
Paired End Libraries Internal Adapter 25 base Tag #1 25 base Tag #2 Mate Pair Library EcoP15I or fragmentation emardis@wustl.edu
15
Sequencing: PESP#1PESP#2 NaIO 4 U.S.E.R. Read 1 (25 to 40 cycles)Read 2 (25-40 cycles) Total 50-80 cycles 3-primer PE method Graft: P7:P7diol:9TUP5 [P7+P7diol] = [9TUP5] P7diol & 9TUP5 linearisable P7 non-linearisable Cluster formation: Heterogeneous clusters containing: P7/9TUP5 bridges P7diol/9TUP5 bridges P7diol/9TUP5P7/9TUP5
16
What are the issues? Consented sample availability!! Read length and accuracy Sample complexity Sensitivity to detect Coverage and cost DNA vs. RNA Bioinformatics-based analyses emardis@wustl.edu
17
Bioinformatics Challenges Most daunting issue: the ability to analyze enormous data sets intelligently and efficiently Metagenomic analysis tools are now emerging for next gen sequence data Testing and implementation into analysis pipelines will follow Output is only as good as the depth of the search space and the depth of coverage for any given combination of sample & sequencer emardis@wustl.edu
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.