Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick Genome Assembly.

Similar presentations


Presentation on theme: "Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick Genome Assembly."— Presentation transcript:

1 Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick Genome Assembly

2 Outline  Stake Holders  Biology  NGS Review  Introduction to Genome Assembly  Challenges  Analysis pipeline/ strategy  Tool selection  Summary (final pipeline)

3 Stakeholders  CDC (Centers for Disease Control and Prevention)  GaTech  Immunocompromised individuals  Consumers of seafood  Prediction group (and subsequent groups) Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

4 Biology… Image of V. vulnificus Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

5 Vibrio vulnificus  Gram-negative o Lipopolysaccharide membrane  Motile, facultative anaerobe  Halophilic (salt-loving) organism abundant in estuarine ecosystems  Major cause of seafood related deaths Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

6 Vibrio vulnificus – genome architecture  Bacterial genomes are coding- dense o Introns rare  Contains plasmids (pYJ016)  V. vulnificus ~5.2mbp genome (similar to E. coli, ~50%) o GC content: 45-47% Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

7 Vibrio navarrensis  Gram-negative  Lipopolysaccharide membrane  Motile, facultative anaerobe  Moderately halophilic organism  Some strains do not grow well in moderate to high salt concentrations Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

8 Vibrio navarrensis - genomic architecture Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

9 NGS - Review

10 Sample input: Genomic DNA, BACs, amplicons, cDNA Generation of small DNA fragments via shearing Ligation of A/B-Adaptors flanking single- stranded DNA fragments Emulsification of beads and fragments in water-in- oil microreactors Clonal amplification of fragments bound to beads in microreactors Sequencing and base calling One Fragment One Bead One Read 400,000 reads per run Roche 454 sequencing workflow overview Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

11 Flowgram GS FLX Data analysis – flowgram generation T Flow Order 4‐ A C mer G 3‐mer TTCTGCGAA 2‐mer 1‐mer Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary Example of homopolymer errors from 454 sequencing data

12 Example of 454 sff file (text format) Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

13 cBot GAIIx 0.1 - 1.0μg User or core facilitycore facility Illumina sequencing overview Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

14 Example of Illumina *.fastq file @C3PO_0001:2:1:17:1499#0/1 TGAATTCATTGACCATAACAATCATATGCATGATGCAAATTATAATATCATTTTTAGTGACGTCGT GAATCGTTT +C3PO_0001:2:1:17:1499#0/1 abaaaaaaaaaaa`a`aa_aaaaaaaaaaaaaaaa_a aaa`aaaaa^aaaaa`a]^`a YZYZ^`NJDJ\_Z @C3PO_0001:2:1:17:1291#0/1 TGTTTGAGCAAATGATTCATAATAATGTATTTCAATATTTTTAGGAATATCTCCCAATATTGCGCG TGCTGAATT +C3PO_0001:2:1:17:1291#0/1 a`_`_\a_aaaa_a^Z^^a[a^aa]a_^_a_``aa `aa`X^X^^`aa_\_]VR`\a_]W\_`_a]a]][\RZV @C3PO_0001:2:2:1452:1316#0/1 GTCCATCCGCAGCAGCGAATTTTTGACGTCCCCCCCCGAANGGANGNGANNNNGNNGNNNT NTNNAAANGNNNNN +C3PO_0001:2:2:1452:1316#0/1 _U a\ `]_`ZP\\_Z^[]aa^a_]XNBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB … Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

15 Genome Assembly

16 Input reads Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary V. navarrensisV. vulnificus 2423-012009V-1368 08-246206-2432 2541-9008-2435 2756-8108-2439 07-2444

17 Introduction to genome assembly  An assembly is a hierarchical data structure that maps the sequence data to a putative reconstruction of the target.  In addition to contigs, a set of unassembled or partially assembled reads is also given as an output. Reads Contigs multiple sequence alignment of reads plus the consensus sequence. Scaffolds - define the contig order and orientation Output (FASTA) Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

18 N50 minimum/maximum contig length No. of contigs No. of errors FRC (feature response curve) How do we check the quality of our assembly? METRICS! Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

19 Feature-by-Feature – evaluating de-novo assembly BREAKPOINT : Points in the assembly where leftover reads partially align; COMPRESSION : Area representing a possible repeat col- lapse; STRETCH : Area representing a possible repeat expansion; LOW_GOOD_CVG : Area composed of paired reads at the right distance and with the right orientation but at low coverage; HIGH_NORMAL_CVG : Area composed of normal oriented reads but at high coverage; HIGH_LINKING_CVG : Area composed of reads with associated mates in another scaffold; HIGH_SPANNING_CVG : Area composed of reads with associated mates in another contig; HIGH_OUTIE_CVG : Area composed of incorrectly oriented mates (--> -->, ); HIGH_SINGLEMATE_CVG : Area composed of single reads (mate not present anywhere); HIGH_READ_COVERAGE : Region in assembly with unexpectedly high local read coverage; HIGH_SNP : SNP with high coverage; KMER_COV : Problematic k-mer distribution. Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

20 Feature-by-Feature – evaluating de-novo assembly Most of the traditional metrics used to evaluate assemblies (N50, mean contig size, etc.) emphasize only size, while nothing (or almost nothing) is said about how correct the assemblies are. A typical such metric (especially, in the NGS context) consists in aligning contigs back to an available reference. However, this naive technique simply counts the number of mis-assemblies without attempting to distinguish or categorize them any further. After running amosvalidate, each contig is assigned the number of features that correspond to doubtful sequences in the assembly. For a fixed feature threshold w, the contigs are sorted by size and, starting from the longest, only those contigs are tallied, if their sum of features is ƒw. For this set of contigs, the corresponding approximate genome coverage is computed, leading to a single point of the Feature-Response curve (FRC). Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

21 Assembly Challenges Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

22 Challenges  Intrinsic  Genome architecture  Repeats  Homopolymer runs  Sequence complexity  Chimeras?  Contaminants  Technical  Short reads  Poisson distribution of coverage  Sequencing errors  Variable quality  Sequence tags

23 454 raw reads Pre-processing Illumina raw reads Pre-processing 454 reads Illumina reads Statistical analysis Read stats Published Genomes from public databases V. vulnificus YJ016 V. vulnificus CMCP6 V. vulnificus MO6-24/O Align Illumina against the reference Fastqc Prinseq NGS QC Compare mapping statistics Reference genome samstats bwa Reference selection Hybrid DeNovo Ray MIRA Illumina/ 454/ Hybrid DeNovo assembly 454 DeNovo Newbler CABOG SUTTA Illumina DeNovo Allpaths LG SOAP DeNovo Velvet Abyss Taipan Bambus2 SUTTA contigs * 3 Align illumina reads against 454 contigs Unmapped reads Mac vector CLC wb contigs Unmapped reads Evaluation GAGE Hawk-eye Illumina/(454?) reference based assembly AMOScmp contigs Unmapped reads DeNovo assembly Reference based assembly Draft/ Finished genome Reference evaluation DNA Diff Parameter optimization Contig merging All possible combinations of the best 3 Mimimus MAIA PAGIT Mauve Finished genome Scaffolds GAGE Genome finishing Gap filling Nulceotide identity DNA Diff GRASS Built-in Process 454 Illumina Info. Chosen Ref. Assemblers Illumina 454 LEGEND hybrid Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges / Analysis Pipeline-Strategy / Tool Selection / Summary

24 Tool Selection - Assembly Algorithm profile Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

25 Greedy Seed-and-extensionGraph based Branch-and-Bound  Basic operation: given any read or contig, add one more read or contig until no more reads or contigs are available  The contigs grow by “greedy extension” always incorporating a read that is found with the highest scoring overlap  Makes locally optimal choice with the hope of finding a globally optimal choice  No foresight -> misassembly Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

26 Greedy Seed-and-extensionGraph based Branch-and-Bound It was the best of of times, it was the best of times, it was times, it was the worst was the best of times, the best of times, it of times, it was the times, it was the age It was the best of of times, it was the times, it was the worst was the best of times, the best of times, it was the worst of times, best of times, it was it was the age of it was the worst of of times, it was the times, it was the age was the age of wisdom, the age of wisdom, it age of wisdom, it was of wisdom, it was the it was the age of was the age of foolishness, the worst of times, it It was the best of times, it was the [worst/age] Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

27 Greedy Seed-and-Extension Graph based Branch-and-Bound  Variation of the greedy assembler  Common in aligners, thus some assemblers/aligners may incorporate this approach  Particularly designed for short reads based on a contig heuristic scheme  Prefix-tree data structure  A contig is elongated at either end contingent upon the existence of reads with a prefix of minimal length perfectly matching the end of the contig Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

28 Greedy Seed-and-extensionGraph based Branch-and-Bound Overlap-layout-consensus (OLC): pairwise consensus Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

29 Hamiltonian Approach Find an assembled sequence that explains the observed sequence = finding a path through a graph that visits every vertex once Repeat Repeat Greedy Seed-and-extensionGraph based Branch-and-Bound Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

30 Greedy Seed-and-extensionGraph based Branch-and-Bound  Basic operation: k-mer approach  Eulerian approach de-Brujin Graph Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

31 Greedy Seed-and-extensionGraph based Branch-and-Bound  Basic operation: relies on “consistent layouts”; it generates all possible consistent layouts organizing them as paths in a “double tree” structure, rooted at a randomly selected seed read  Progressive evaluation of optimal criteria encoded by a set of score functions based on the set of overlaps along the layout Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

32 Tid-bits of advice GreedySeed-and- Extension OLCDe-BrujinBranch-and- Bound AdvantagesGuaranteed to find a solution sensitivitySuitable for low coverage long reads Repeats are immediately recognized; suitable for high coverage short reads Algorithm allows for checks DisadvantagesMisassembly Easily confused by complex repeats Can be very slow, memory usage Computation of overlaps time intensive RAM intensiveAmbiguities delay pruning Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

33 Tools of Choice Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

34 454 platform assembly NameAlgorithm Newbler 2.5OLC Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data CABOGOLC Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data SUTTA Branch-and- Bound Feature-by-Feature – Evaluating De Novo Sequence Assembly Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

35 Evaluation of 454 assemblers  Genomes Used For Comparison Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data Brief Bioinform (2012) 13(3): 269-280

36 Comparison of 454 assemblers using E. coli genome Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data Brief Bioinform (2012) 13(3): 269-280

37 Comparison of 454 assemblers using E. coli genome  The maximum value reached by the bars is the hypothetical reconstruction HR, defined as the ratio between the assembled bases and the reference length  The white section represents the real reconstruction RR, i.e. the portion of genome correctly reconstructed by assemblers.  The difference between hypothetical and RR, here called erroneous reconstruction ER, is shown in black Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data Brief Bioinform (2012) 13(3): 269-280

38 Illumina platform assembly NameAlgorithm Supporting Evidence ALLPATHS-LGOLC GAGE: A critical evaluation of genome assemblies and assembly algorithms Velvetde-Brujin Comparative studies of de novo assembly tools for next- generation sequencing technologies Taipan Hybrid(Greedy- based and graph) A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies SOAPdenovode-Brujin Feature-by-Feature- Evaluating De Novo Sequence Assembly SUTTA Branch-and- Bound Feature-by-Feature – Evaluating De Novo Sequence Assembly Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

39 Evaluation of illumina assemblers  Genomes Used For Comparison Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary GAGE: A critical evaluation of genome assemblies and assembly algorithms. Steven L. Salzberg, Adam M. Phillippy, Aleksey Zimin, et al. Genome Res. 2012 22: 557-567

40 Comparison of illumina assemblers The best value for each column is shown in bold. For all assemblies The Errors column contains the number of misjoins plus indel errors >5 bp for contigs, and the total number of misjoins for scaffolds. Corrected N50 values were computed after correcting contigs and scaffolds by breaking them at each error. See the evaluation section in the text for details on how errors were identified. Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary GAGE: A critical evaluation of genome assemblies and assembly algorithms. Steven L. Salzberg, Adam M. Phillippy, Aleksey Zimin, et al. Genome Res. 2012 22: 557-567

41 Comparison of illumina assemblers A ‘‘ chaff ’’ contig is defined as a single contig <200 bp in length. In many cases, these contigs can be as small as the k-mer size used to build the de Bruijn graph (e.g., 36 bp) and are too short to support any further genomic analysis. A duplicated repeat is one that appears in more copies than necessary in the assembly, and a compressed repeat is one that occurs in fewer copies. Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary GAGE: A critical evaluation of genome assemblies and assembly algorithms. Steven L. Salzberg, Adam M. Phillippy, Aleksey Zimin, et al. Genome Res. 2012 22: 557-567

42 Comparison of illumina assemblers ‘‘Misjoin’’ errors are perhaps the most harmful type, in that they represent a significant structural error. A misjoin occurs when an assembler incorrectly joins two distant loci of the genome, which most often occurs within a repeat sequence. We have tallied three types of misjoins: (1) inversions, where part of a contig or scaffold is reversed with respect to the true genome; (2) relocations, or rearrangements that move a contig or scaffold within a chro- mosome; and (3) translocations, or rearrangements between chromosomes Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary GAGE: A critical evaluation of genome assemblies and assembly algorithms. Steven L. Salzberg, Adam M. Phillippy, Aleksey Zimin, et al. Genome Res. 2012 22: 557-567

43 Comparison of illumina assemblers Average contig (A) and scaffold (B) sizes, measured by N50 values, versus error rates, averaged over all three genomes for which the true assembly is known: S. aureus, R. sphaeroides, and human chromosome 14. Errors (vertical axis) are measured as the average distance between errors, in kilobases. In both plots, the best assemblers appear in the upper right. Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary GAGE: A critical evaluation of genome assemblies and assembly algorithms. Steven L. Salzberg, Adam M. Phillippy, Aleksey Zimin, et al. Genome Res. 2012 22: 557-567

44 Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary Applicability of assemblers  Genomes used for comparison A Practical Comparison of De novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies. Wenyu Zhang, et al. Plos One. 2011 6: 1-12

45 Comparison of illumina assemblers Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary A Practical Comparison of De novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies. Wenyu Zhang, et al. Plos One. 2011 6: 1-12

46 Comparison of illumina assemblers Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary A Practical Comparison of De novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies. Wenyu Zhang, et al. Plos One. 2011 6: 1-12

47 Hybrid Platform Assembly NameAlgorithm Supporting Evidence RAY SBH Feature-by-Feature – Evaluating De Novo Sequence Assembly Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

48 Feature-by-Feature – evaluating de-novo assembly COMPRESSION: Area representing a possible repeat col- lapse; LOW_GOOD_CVG: Area composed of paired reads at the right distance and with the right orientation but at low coverage; HIGH_OUTIE_CVG: Area composed of incorrectly oriented mates (--> -->, ); HIGH_SINGLEMATE_CVG: Area composed of single reads (mate not present anywhere); HIGH_READ_COVERAGE: Region in assembly with unexpectedly high local read coverage; KMER_COV: Problematic k-mer distribution. Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

49 Feature-by-Feature: evaluating de-novo assembly  Real Data - Long Reads Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

50 Feature-by-Feature – evaluating de-novo assembly  Real Data - Short Reads Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

51 Final Approach Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

52 454 raw reads Pre-processing Illumina raw reads Pre-processing 454 reads Illumina reads Statistical analysis Read stats Published Genomes from public databases V. vulnificus YJ016 V. vulnificus CMCP6 V. vulnificus MO6-24/O Align Illumina against the reference Fastqc Prinseq NGS QC Compare mapping statistics Reference genome samstats bwa Reference selection Hybrid DeNovo Ray Illumina/ 454/ Hybrid DeNovo assembly 454 DeNovo Newbler CABOG SUTTA Illumina DeNovo Allpaths LG SOAP DeNovo Velvet Taipan SUTTA contigs * 3 Align illumina reads against 454 contigs Unmapped reads Mac vector CLC wb contigs Unmapped reads Evaluation GAGE Hawk-eye Illumina/(454?) reference based assembly AMOScmp contigs Unmapped reads DeNovo assembly Reference based assembly Draft/ Finished genome Reference evaluation DNA Diff MUMer Parameter optimization Contig merging All possible combinations of the best 3 Mimimus MAIA PAGIT Mauve Finished genome Scaffolds GAGE Genome finishing Gap filling Nulceotide identity MUMer GRASS Built-in Process 454 Illumina Info. Chosen Ref. Assemblers Illumina 454 LEGEND hybrid Stakeholders / Biology / NGS Review / Introduction to Genome Assembly / Challenges /Analysis Pipeline-Strategy / Tool Selection / Summary

53 References 1.Finotello, F., et al., Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data. Brief Bioinform, 2012. 13 (3): p. 269-80. 2.Vezzi, F., G. Narzisi, and B. Mishra, Feature-by-feature--evaluating de novo sequence assembly. PLoS One, 2012. 7 (2): p. e31002. 3.Zhang, W., et al., A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies. PLoS One, 2011. 6 (3): p. e17915. 4.Salzberg, S.L., et al., GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res, 2012. 22 (3): p. 557-67. 5.Narzisi, G. and B. Mishra, Comparing de novo genome assembly: the long and short of it. PLoS One, 2011. 6 (4): p. e19175. 6.Miller, J.R., S. Koren, and G. Sutton, Assembly algorithms for next-generation sequencing data. Genomics, 2010. 95 (6): p. 315-27. 7.Li, Z., et al., Comparison of the two major classes of assembly algorithms: overlap-layout- consensus and de-bruijn-graph. Brief Funct Genomics, 2012. 11 (1): p. 25-37. 8.Lin, Y., et al., Comparative studies of de novo assembly tools for next-generation sequencing technologies. Bioinformatics, 2011. 27 (15): p. 2031-7. 9.Zhang, J., et al., The impact of next-generation sequencing on genomics. J Genet Genomics, 2011. 38 (3): p. 95-109.


Download ppt "Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick Genome Assembly."

Similar presentations


Ads by Google