Basic Techniques Project Design Process Improvements DNA Sequencing Basic Techniques Project Design Process Improvements 9/18/2018 Chuck Staben
Project Size/Type 500 bases 2500 bases 10 kbp 150 kbp 3 Mbp BIG simple repeats BIG 1 locus EST,STS whole cDNA/EST gene, virus BAC, big virus bacterial genome YAC-size HUMAN, etc. 9/18/2018 Chuck Staben
DNA Sequencing Methods Chain termination/Dideoxy/Sanger fluorescence paradigm, ABI, HOOD Sequencing by hybridization chips Affymetrix (Lander, et al) other formats Hyseq (Church, et al) Lark 9/18/2018 Chuck Staben
Dideoxy/Chain Terminator/Sanger Template Primer Extension Chemistry polymerase termination labeling Separation Detection 9/18/2018 Chuck Staben
Chain Terminator Basics Template-Primer Target TGCA ddA ddG ddC ddT Terminators Extend ddA A ddC dN : ddN 100 : 1 Ladder n, n+1... AC ddG ACG ddT 9/18/2018 Chuck Staben
Electrophoresis 9/18/2018 Chuck Staben
Template Preparation ssDNA vectors PCR dsDNA (+/- PCR) M13 pUC 9/18/2018 Chuck Staben
Primers Primer Label Dye Terminator Universal primers Custom primers cheap, reliable, easy, fast, parallel BULK sequencing Custom primers expensive, slow, one-at-a-time ADAPTABLE Primer Label Dye Terminator 9/18/2018 Chuck Staben
Extension Chemistry 100% termination Accurate Even signal Polymerase Sequenase Thermostable (Cycle Sequencing) Terminators Dye labels (“Big Dye”) spectrally different, high fluorescence (mass labels??) ddA,C,G,T with primer labels 9/18/2018 Chuck Staben
Separation migration ~1/log N Gel Electrophoresis Capillary Electrophoresis suited to automation rapid (2 hrs vs 12 hrs) re-usable simple temperature control 96 well format migration ~1/log N 9/18/2018 Chuck Staben
Paradigm Instrument Applied Biosystems ABI3700 (early 1999) 1500 samples/day! http://www2.perkin-elmer.com/ga/3700/features.html ABI377 (gel) and ABI310 (capillary) 9/18/2018 Chuck Staben
Alternate Instruments Molecular Dynamics, Beckman Coulter… ALF, LiCor infrared detection Not Complete List 9/18/2018 Chuck Staben
Sample Output 1 lane 9/18/2018 Chuck Staben
Trace Editing EditView Chromas Consed Mac WinNT UNIX 9/18/2018 Chuck Staben
Project Goals de novo sequence repetitive sequencing Chain terminators Sequencing by hybridization Chip technology, eg 9/18/2018 Chuck Staben
Sequencing Strategies Random Sequence Brute Force Ordered Divide and Conquer Sequencing Assembly Finishing Annotation Mix to Suit 9/18/2018 Chuck Staben
Random Method Assemble Contigs Shear DNA (nebulize) Produce template finish ends, ligate into vector Produce template Sequence to target coverage read length (500 typical) accuracy (99% good) Assemble Contigs 9/18/2018 Chuck Staben
Random T C No coverage DISAGREEMENT Only 1 strand 9/18/2018 Chuck Staben
Poisson Statistics P0=e-L(N)/G L=read length N=#reads G=genome size 9/18/2018 Chuck Staben
Poisson-2 Gap Length=P0G 9/18/2018 Chuck Staben
Gap Number=P0N (assume N=500 bases) Poisson-3 Gap Number=P0N (assume N=500 bases) 9/18/2018 Chuck Staben
4 Mbp Genome 55 instrument days on ABI3700 10x Coverage 80,000 reads at 500 bases/read 4 gaps 400 bases in gaps 55 instrument days on ABI3700 9/18/2018 Chuck Staben
3000 Mbp Genome HUMAN 50000 instrument days on ABI3700 300 machines, 3 years Plenty 9/18/2018 Chuck Staben
Automation QT 9/18/2018 Chuck Staben
Costs $0.50/base Raw cost ~$0.01/base “Semi-finished” $0.10 per base High-quality Genome Project $0.50/base 9/18/2018 Chuck Staben
Ordered Methods Primer Walking Nested Deletion 9/18/2018 Chuck Staben
Limitations Slow, Expensive Expertise Needed Repeat Problems especially nested deletion Repeat Problems especially primer walking 9/18/2018 Chuck Staben
Finishing GOALS Finish when random no longer productive (3-10 X range) >95% coverage on BOTH strands every base covered 3X resolve ambiguities Finish when random no longer productive (3-10 X range) 9/18/2018 Chuck Staben
Finish-How Identify gaps, ambiguities Extend from end of contigs specific primers subclones, etc. Resolve ambiguities consensus or resequence specific primers, different chemistry 9/18/2018 Chuck Staben
Assembly Methods Strip out vector Mask known repeats Trim off unreliable data Find Matches (500 x 500 x many!!) how long (and what ktuple) how perfect (reliability index) where to look? (ends only vs entire) 9/18/2018 Chuck Staben
Assembly Programs PHRAP FAMILY SeqWeb…. phrap, kangaroo, phrapo, GAP4, TIGRAssembler,... GCG gelstart, gelenter, gelmerge, gelassemble, geldisassemble thinly veiled vi editor SeqWeb…. 9/18/2018 Chuck Staben
Assembly Improvements Repeat Problems Multiple fragment sizes in 1 project Use length/distance info 9/18/2018 Chuck Staben
Project Management Editing and Assembly Databases RepeatMasker Phred/Phrap Consed Databases ACeDB A C. elegans database Oracle 9/18/2018 Chuck Staben
Annotation Submit to Genbank ...HTGS (level1,2,3) ...nr ORFs Repeats GRAIL, PowerBLAST Repeats Other Regions Submit to Genbank ...HTGS (level1,2,3) ...nr 9/18/2018 Chuck Staben
Sequencing by Hybridization Hybridize labeled query DNA CHIP OLIGOS (20-mers) ...gaactAatact... ...gaactCatact... ...gaactGatact... ...gaactTatact... A C G T site 1 ...gaactaAtact... ...gaactaCtact... ...gaactaGtact... ...gaactaTtact... site 2 GAACTATGTACT 9/18/2018 Chuck Staben
Modern Sequencing Challenges Heterozygous DNAs germline differences somatic variation Massive sequencing population studies genome scans Minimal sample preparation “Doctor’s Office” Chips, Quantitative Seq Automation Miniaturization 9/18/2018 Chuck Staben
Physical Mapping Genome Characterization Genome fragmentation and cloning vectors, etc. Physical map assembly hybridization fingerprinting 9/18/2018 Chuck Staben