FuzzyPath - A Hybrid De novo Assembler using Solexa and 454 Short Reads Zemin Ning The Wellcome Trust Sanger Institute.

Slides:



Advertisements
Similar presentations
FuzzyPath Assemblies - from Bacterial to Mammalian Genomes and Zebrafish Finishing Zemin Ning The Wellcome Trust Sanger Institute.
Advertisements

V Improvements to 3kb Long Insert Size Paired-End Library Preparation Naomi Park, Lesley Shirley, Michael Quail, Harold Swerdlow Wellcome Trust Sanger.
Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.
WGS Assembly and Reads Clustering Zemin Ning Production Software Group Informatics Division.
RNA Assembly Using extending method. Wei Xueliang
DNA Sequencing Lecture 9, Tuesday April 29, 2003.
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Expanding the Tool Kit for BAC Extension Summary of completion criteria developed for NSF Tomato Sequencing Workshop January 14, 2007.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Aut08, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
Genome sequencing and assembling
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
De-novo Assembly Day 4.
Todd J. Treangen, Steven L. Salzberg
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Developing Bioinformatics Tools for Genome Analysis Zemin Ning The Wellcome Trust Sanger Institute.
Tomato Chromosome 4: A Mapping & Sequencing Update 28 th September 2005 Christine Nicholson Mapping Core Group Welcome Trust Sanger Institute, UK.
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
Meraculous: De Novo Genome Assembly with Short Paired-End Reads
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
P. Tang ( 鄧致剛 ); RRC. Gan ( 甘瑞麒 ); PJ Huang ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Genome Sequencing Genome Resequencing De novo Genome.
Fuzzypath – Algorithms, Applications and Future Developments
The Changing Face of Sequencing
Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library.
FuzzyPath Assemblies - from Mixed Solexa/454 Datasets to Extremely GC Biased Genomes Zemin Ning The Wellcome Trust Sanger Institute.
Problems of Genome Assembly James Yorke and Aleksey Zimin University of Maryland, College Park 1.
Finishing tomato chromosomes #6 and #12 using a Next Generation whole genome shotgun approach Roeland van Ham, CBSG, NL René Klein Lankhorst, EUSOL Giovanni.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
Chromosome 2 Doil Choi, Sunghwan Jo KOREA. Cytological architecture of chromosome kb/µm DAPI (4’-6-diamidino-2-phenylindole) stained pachytene chromosome.
Assembly of Paired-end Solexa Reads by Kmer Extension using Base Qualities Zemin Ning The Wellcome Trust Sanger Institute.
RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.
1.Data production 2.General outline of assembly strategy.
Genome De Novo Assemblies and Applications in NGS Sequencing Zemin Ning The Wellcome Trust Sanger Institute.
billion-piece genome puzzle
University of Connecticut School of Engineering Assembler Reference Abyss Simpson et al., J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones,
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
The Wellcome Trust Sanger Institute
13 th January 2008 Plant & Animal Genome Conference Progress with Sequencing Tomato Chromosome 4 Clare Riddle Tomato Project Group Wellcome Trust Sanger.
16 th April 2007 Christine Nicholson, Mapping Core Group Wellcome Trust Sanger Institute Tomato Chromosome 4 Mapping & Use of FPC Copyright Wellcome Trust.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
Cross_genome: Assembly Scaffolding using Cross-species Synteny Zemin Ning High Performance Assembly.
Meet the ants Camponotus floridanus Carpenter ant Harpegnathos saltator Jumping ant Solenopsis invicta Red imported fire ant Pogonomyrmex barbatus Harvester.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
Genome Research 12:1 (2002), Assembly algorithm outline ● Input and trimming ● Overlap detection ● Error correction ● Evaluation of alignments.
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
Variation Detections and De novo Assemblies from Next-gen Data Zemin Ning The Wellcome Trust Sanger Institute.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
Sequence Alignment and Genome Assembly Zemin Ning The Wellcome Trust Sanger Institute.
Virginia Commonwealth University
Phusion2 and The Genome Assembly of Tasmanian Devil
A B PI: 2646 PI: K-mer depth Figure S1. Estimation of Sogatella furcifera genome size based on flow cytometry and 17-mer statistics. (A) Estimation.
Cross_genome: Assembly Scaffolding using Cross-species Synteny
CAP5510 – Bioinformatics Sequence Assembly
Denovo genome assembly of Moniliophthora roreri
Professors: Dr. Gribskov and Dr. Weil
A Hybrid Assembly System in Zebrafish Pooled Clones
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Very important to know the difference between the trees!
Do You Want to Build a Transcriptome?
Plant & Animal Genome Conference
Next-generation DNA sequencing
Introduction to Sequencing
AMOS Assembly Validation and Visualization
Presentation transcript:

FuzzyPath - A Hybrid De novo Assembler using Solexa and 454 Short Reads Zemin Ning The Wellcome Trust Sanger Institute

Outline of the Talk:  Assembly strategy  Read extension using base qualities and read pairs  Repeat junctions and single base variation  Fuzzy kmers – how to find mismatches  Assemblies with mixed Solexa and 454 reads  Solexa reads guided by a closely related reference  Long Solexa reads with 70 bps  Future Work

Assembly Strategy Selexa reads assembler to extend long reads of 1-2Kb Genome/Chromosome Capillary reads assembler Phrap/Phusion forward-reverse paired reads bp known dist ~500 bp bp

Kmer Extension & Repeat Junctions

Quality Filters on Junctions

Repetitive Contig and Read Pairs Depth For each hit read in the contig, contig index and offset are stored. Insert length Current read position Contig start Pair read position Depth

Handling of Single Base Variations

ACGTAACTAACAGTT ACGTAACTCACAGTT ACGTAACT ACAGTT Number of Mismatches between Two Kmers

Use of Kmers with Mismatches

Mixed Solexa and 454 Reads L = ~250 bp L-K+1 kmers L-N-K+1 kmers Pileup of 454 reads at a repeat junction

Pileup of Solexa and 454 Reads

Guided by A Closely Related Reference L = 3000 bp L-K+1 kmers L-N-K+1 kmers Pileup of shredded reads at a repeat junction

Pileup of Solexa and Shredded Reads

Long Solexa Reads with 70 bp L = 70 bp L-K+1 kmers Pileup of long Solexa reads at a repeat junction

Pileup of Long 70 bp Solexa Reads

Solexa reads : Number of reads: 3,084,185; Finished genome size: 2,007,491 bp; Read length:39 and 36 bp; Estimated read coverage: ~55X; Number of 454 reads:100,000; Read coverage of 454:10X; Assembly features: - contig stats Total number of contigs: 73; Total bases of contigs: 1,999,817 bp N50 contig size: 62,508; Largest contig:162,190 Averaged contig size: 27,394; Contig coverage over the genome: ~99 %; Contig extension errors: 2 Mis-assembly errors:3 S.Suis P1/7 Solexa/454 Assembly

Shredded reads : Number of reads: 1,338,161; Finished genome size: 2,007,491 bp; Read length:36; Estimated read coverage: 24X; Insert size:500 bp; Assembly features: Paired _Data Not_Paired Number of contigs: Total assembled bases: Mb1.956 Mb N50 contig size: 243,03913,929 Largest contig: 474,070 33,460 Averaged contig size: 57,0436,168 Contig coverage: >99.0 %>99.0 % Contig extension errors: 0 0 Mis-assembly errors: 32 S.Suis P1/7 with Shredded Pair-end Reads

Solexa reads : Number of reads: 6,346,317; Finished genome size: 4.7 Mbp; Read length:33 bp; Estimated read coverage: ~40 X; Shredded reference of SpA: 10X; Assembly features: - contig stats Total number of contigs: 66; Total bases of contigs: 4,615,704 bp N50 contig size: 168,793; Largest contig:401,700 Averaged contig size: 69,934; Contig coverage over the genome: ~98 %; Contig extension errors: 0 Mis-assembly errors:2 Salmonella delhi5 Solexa Assembly Guided by A Close Reference

Shredded reads : Number of reads: 1,338,161; Finished genome size: 2,007,491 bp; Read length:36; Estimated read coverage: 24X; Insert size:500 bp; Assembly features: Paired _Data Not_Paired Number of contigs: Total assembled bases: Mb1.956 Mb N50 contig size: 243,03913,929 Largest contig: 474,070 33,460 Averaged contig size: 57,0436,168 Contig coverage: >99.0 %>99.0 % Contig extension errors: 0 0 Mis-assembly errors: 32 S Suis P1/7 Shredded Read Assembly

Acknowledgements:  Yong Gu  Ben Blackburne  Hannes Ponstingl  Harold Swerdlow  Michael Quail  Tony Cox  Richard Durbin