Download presentation
Presentation is loading. Please wait.
Published byShannon Wheeler Modified over 9 years ago
1
FuzzyPath - A Hybrid De novo Assembler using Solexa and 454 Short Reads Zemin Ning The Wellcome Trust Sanger Institute
2
Outline of the Talk: Assembly strategy Read extension using base qualities and read pairs Repeat junctions and single base variation Fuzzy kmers – how to find mismatches Assemblies with mixed Solexa and 454 reads Solexa reads guided by a closely related reference Long Solexa reads with 70 bps Future Work
3
Assembly Strategy Selexa reads assembler to extend long reads of 1-2Kb Genome/Chromosome Capillary reads assembler Phrap/Phusion forward-reverse paired reads 30-70 bp known dist ~500 bp 30-70 bp
4
Kmer Extension & Repeat Junctions
5
Quality Filters on Junctions
6
Repetitive Contig and Read Pairs Depth For each hit read in the contig, contig index and offset are stored. Insert length Current read position Contig start Pair read position Depth
7
Handling of Single Base Variations
8
ACGTAACTAACAGTT 00 01 10 11 00 00 01 11 00 00 01 00 10 11 11 ACGTAACTCACAGTT 00 01 10 11 00 00 01 11 01 00 01 00 10 11 11 ACGTAACT ACAGTT 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 Number of Mismatches between Two Kmers
9
Use of Kmers with Mismatches
10
Mixed Solexa and 454 Reads L = ~250 bp L-K+1 kmers L-N-K+1 kmers Pileup of 454 reads at a repeat junction
11
Pileup of Solexa and 454 Reads
12
Guided by A Closely Related Reference L = 3000 bp L-K+1 kmers L-N-K+1 kmers Pileup of shredded reads at a repeat junction
13
Pileup of Solexa and Shredded Reads
14
Long Solexa Reads with 70 bp L = 70 bp L-K+1 kmers Pileup of long Solexa reads at a repeat junction
15
Pileup of Long 70 bp Solexa Reads
16
Solexa reads : Number of reads: 3,084,185; Finished genome size: 2,007,491 bp; Read length:39 and 36 bp; Estimated read coverage: ~55X; Number of 454 reads:100,000; Read coverage of 454:10X; Assembly features: - contig stats Total number of contigs: 73; Total bases of contigs: 1,999,817 bp N50 contig size: 62,508; Largest contig:162,190 Averaged contig size: 27,394; Contig coverage over the genome: ~99 %; Contig extension errors: 2 Mis-assembly errors:3 S.Suis P1/7 Solexa/454 Assembly
17
Shredded reads : Number of reads: 1,338,161; Finished genome size: 2,007,491 bp; Read length:36; Estimated read coverage: 24X; Insert size:500 bp; Assembly features: Paired _Data Not_Paired Number of contigs: 35317 Total assembled bases: 1.996 Mb1.956 Mb N50 contig size: 243,03913,929 Largest contig: 474,070 33,460 Averaged contig size: 57,0436,168 Contig coverage: >99.0 %>99.0 % Contig extension errors: 0 0 Mis-assembly errors: 32 S.Suis P1/7 with Shredded Pair-end Reads
18
Solexa reads : Number of reads: 6,346,317; Finished genome size: 4.7 Mbp; Read length:33 bp; Estimated read coverage: ~40 X; Shredded reference of SpA: 10X; Assembly features: - contig stats Total number of contigs: 66; Total bases of contigs: 4,615,704 bp N50 contig size: 168,793; Largest contig:401,700 Averaged contig size: 69,934; Contig coverage over the genome: ~98 %; Contig extension errors: 0 Mis-assembly errors:2 Salmonella delhi5 Solexa Assembly Guided by A Close Reference
19
Shredded reads : Number of reads: 1,338,161; Finished genome size: 2,007,491 bp; Read length:36; Estimated read coverage: 24X; Insert size:500 bp; Assembly features: Paired _Data Not_Paired Number of contigs: 35317 Total assembled bases: 1.996 Mb1.956 Mb N50 contig size: 243,03913,929 Largest contig: 474,070 33,460 Averaged contig size: 57,0436,168 Contig coverage: >99.0 %>99.0 % Contig extension errors: 0 0 Mis-assembly errors: 32 S Suis P1/7 Shredded Read Assembly
20
Acknowledgements: Yong Gu Ben Blackburne Hannes Ponstingl Harold Swerdlow Michael Quail Tony Cox Richard Durbin
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.