A Hybrid Assembly System in Zebrafish Pooled Clones Zemin Ning The Wellcome Trust Sanger Institute 1
extended long reads of 1-2Kb 30-75 bp Insert ~300 bp Solexa assembly Genome/Chromosome Assembly Fishing WGS Reads WGS Reads 5X Combined Reads FuzzyPath Phusion or Phrap Phusion Solexa Reads
Read Coverage or Kmer Coverage
Minimum Kmer Coverage is 2
Kmer Extension & Repeat Junctions Pileup of other reads like 454, Sanger etc at a repeat junction Consensus Means to handle repeats: - Base quality - Read pair - Fuzzy kmers - Closely related reference - 454 or Sanger reads
Pooled Clones: Zfish 9, Pig 3 Clone Name Length (bp) Finished Cloning Vector Species Capillary Data Pathway zH117H1 129221 Yes pTARBAC2.1 D. rerio /nfs/repository/d0012/zH117H1 zH141B18 119622 /nfs/repository/d0012/zH141B18 zH151M17 122622 /nfs/repository/d0014/zH151M17 zH117E7 139449 /nfs/repository/d0015/zH117E7 zH137D22 122615 /nfs/repository/d0023/zH137D22 zH97A24 113538 /nfs/repository/d0027/zH97A24 zH146D21 109862 /nfs/repository/d0040/zH146D21 zH140N19 118794 /nfs/repository/d0013/zH140N19 zH147D24 111470 /nfs/repository/d0011/zH147D24 bE2F11 170585 pTARBAC1.3_BamHI S. scrofa /nfs/repository/d0027/bE2F11 bE156J20 210831 /nfs/repository/d0041/bE156J20 bE240L11 216560* No /nfs/repository/d0012/bE240L11 * Finished length may be shorter or longer once complete
Boundary of Solexa Contigs WGS DH reads and contigs
Mapping of Solexa Reads On the Reference
Zfish and “Pig” Clone Assemblies Solexa reads: Number of reads: 4.3 million; Estimated size of covered region: 1.72 Mbp; Read length: 2x36bp; Estimated read coverage: ~180X; Insert size: 260/50-400 bp; Zfish DH reads: 12,539 Assembly features: - contig stats Solexa Hybrid_Ctg Hybrid_Super N contigs: 496 152 95 Bases: 1.25 Mbp 1.68 Mbp 1.69 Mbp N50 size: 4,975 25,817 74,598 Largest 23,906 79,730 144,808 Averaged: 2,513 11,072 17,815 Coverage: ~72.6 % ~73% ~73% Errors: ? ? ?
Second Set with 50 Zfish Clones Solexa reads: Number of reads: 17.5 million; Estimated size of covered region : ~9.0 Mbp; Read length: 2x54bp; Estimated read coverage: ~190X; Insert size: 260/50-400 bp; Zfish DH capillary reads: 112,583 Assembly features: - contig stats Solexa Hybrid_Ctg Hybrid_Super N contigs: 3,143 688 359 Bases: 4.01 Mbp 8.39 Mbp 8.43 Mbp N50 size: 3,189 24,448 70,703 Largest 23,018 108,090 274,224 Averaged: 1,275 12,194 23,493 Coverage: ~50% ~93% ~94% Errors: ? ? ?
maq ssaha2
maq ssaha2
Contig of hybrid assembly Contig of Zv8 Contig of hybrid assembly
Acknowledgements: Yong Gu James Bonfiled Hannes Ponstingl Helen Beasley Siobhan Whitehead Michael Quail Tony Cox