Cross_genome: Assembly Scaffolding using Cross-species Synteny Zemin Ning High Performance Assembly
Can synteny help? And How? Contig gap closure Scaffolding
RACA - Reference-assisted chromosome assembly
Target sequence Reference Scaffold 1 Scaffold 2 Scaffold 3 Q = scaff(i)* contig_loci(j) Lattice of Target - Reference
Target sequence Reference Scaffold 1 After Noise Cleaning Y X Gap_size = Y - X Scaffold 2 Scaffold 3
Cases Shouldn’t Join Reference Target Scaffold 1 Scaffold 2 Scaffold 1 Gap_size Reference Target
AssemblerN_basesN_scaffsN50 (Mb) Original Allpahts-LGRACA86.8 Cross_genome Original Bambus2RACA72.1 Cross_genome Original CABOGRACA81.4 Cross_genome Original MSR-CARACA83.4 Cross_genome Original SGARACA57.4 Cross_genome Original SOAPdenovoRACA84.4 Cross_genome Original VelvetRACA123 Cross_genome GAGE: Human Chr14 and RACA using Orangutan
OriginalCross_gReferences Panda 1.3Mb25MbDog, Human Tibetan Antelope 2.6Mb42MbCattle, Dog, Human Tasmanian Devil 1.8Mb6.8MbOpossum Scaffold N50 for Other Genome Assemblies Availability ftp://ftp.sanger.ac.uk/pub/users/zn1/merge/cross_genome/
Improve gorilla assembly using human reference Contig Merge/Break Variation correction Contig gap size re-estimation Read Alignment Pair-wise/Multiple Combined Gorilla- Human Assembly Human Reference Gorilla Assembly Final Gorilla Assembly
Gap size New gap size Target sequence Reference sequence Re-estimate Contig Gap Sizes from Reference New gap size Read alignment and variation correction Ref seq inserted
Contig Consensus using Gap5 Target (query) aligned against Reference Before
Target (query) aligned against Reference Reference Sequence Replacement & Variation Correction
Variations: 2 indels (4bp and 1bp) corrected
Original Contig (query) against New Assembly after Contig Break
Alignment Inconsistency
Original Contig (query) against New Assembly after Contig Break
Alignment Inconsistency
Original New Total number of contigs: 464,875285,139 N50 contig size: 11.7kb23.9kb Largest contig:191,556322,733 Averaged contig size: The Gorilla Assemblies
Acknowledgements: Hanness Ponstingl Frank Liu – Nanjing University of Information Technology (NUIT) Yan Li – (NUIT) Gorilla genome sequencing data BGI – Panda and Tibetan Antelope assemblies