Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cross_genome: Assembly Scaffolding using Cross-species Synteny

Similar presentations


Presentation on theme: "Cross_genome: Assembly Scaffolding using Cross-species Synteny"— Presentation transcript:

1 Cross_genome: Assembly Scaffolding using Cross-species Synteny
Zemin Ning High Performance Assembly 1

2 Can synteny help? And How?
Scaffolding Contig gap closure

3 RACA - Reference-assisted chromosome assembly

4 Q = scaff(i)*232 + contig_loci(j) Lattice of Target - Reference
Target sequence Reference Scaffold 1 Scaffold 2 Scaffold 3 Q = scaff(i)*232 + contig_loci(j) Lattice of Target - Reference

5 After Noise Cleaning Gap_size = Y - X Scaffold 3 Scaffold 2 Scaffold 1
Target sequence Reference Scaffold 1 After Noise Cleaning Gap_size = Y - X Y Scaffold 3 X Scaffold 2

6 Cases Shouldn’t Join Reference Target Reference Target Gap_size
Scaffold 1 Scaffold 2 Reference Target Gap_size Scaffold 1 Scaffold 2

7 GAGE: Human Chr14 and RACA using Orangutan
Assembler N_bases N_scaffs N50 (Mb) Original 88.8 418 81.6 Allpahts-LG RACA 86.8 Cross_genome 89 221 85.5 78.6 1472 0.37 Bambus2 72.1 1094 13.7 86.5 498 0.4 CABOG 81.4 86.3 46 89.7 0.88 MSR-CA 83.4 89.6 94.7 30975 0.075 SGA 57.4 94.8 29662 77.3 108 38477 0.453 SOAPdenovo 84.4 102.8 12955 78.9 143.8 61455 0.84 Velvet 123 139.4 3278 8.71

8 Scaffold N50 for Other Genome Assemblies
Original Cross_g References Panda Mb 25Mb Dog, Human Tibetan Antelope 2.6Mb 42Mb Cattle, Dog, Human Tasmanian Devil 1.8Mb 6.8Mb Opossum Availability ftp://ftp.sanger.ac.uk/pub/users/zn1/merge/cross_genome/

9 Improve gorilla assembly using human reference
Contig gap size re-estimation Improve gorilla assembly using human reference Combined Gorilla-Human Assembly Read Alignment Pair-wise/Multiple Read Clustering Local Assembly Final Gorilla Assembly

10 Re-estimate Contig Gap Sizes from Reference
New gap size Local assembly based on clustered reads Ref seq inserted Gap size New gap size Target sequence Reference sequence

11 Assemblies using Synteny-guided Method Gorilla Genome - Real Data
Human Chr6 - Simulation Gorilla Genome - Real Data Reads: 2x100 with 500bp insert 60X Original Assembly Contig N50 24.3kb 13.5kb Average contig length 6850bp 6940bp N of clusters ( pairs) 504 5807 43.7kb 24.0kb Gap closed 7809 10433 N of base errors in gap closed regions 256 subs and 12 indels (24bps) N/A

12 Gorilla - Merge with other De novo Assemblies Original assembly (dev5)
Merge with Fermi* Merge with Masurca+ Contig N50 13.5kb 30.2kb 53.1kb Average length 6850 12577 18768 Largest contig 215kb 391.2kb 448.8kb N of gaps closed 182661 257167 *Fermi assembler: +Masurca assembler:

13 Gs = (Kn – Ks)/D = 4.5x109 Kn = 125.4x109 – Total number of kmer words; Ks = 2.4x109 - Number of single copy kmer words; D = Depth of kmer occurrence

14 Original Contig (query) against New Assembly after Contig Break

15 Alignment Inconsistency

16 Original Contig (query) against New Assembly after Contig Break

17 Alignment Inconsistency

18 The Gorilla Assemblies
Original New Total number of contigs: , ,139 N50 contig size: kb 23.9kb Largest contig: 191, ,733 Averaged contig size:

19 Acknowledgements: Hanness Ponstingl
Frank Liu – Nanjing University of Information Technology (NUIT) Yan Li – (NUIT) Gorilla genome sequencing data BGI – Panda and Tibetan Antelope assemblies


Download ppt "Cross_genome: Assembly Scaffolding using Cross-species Synteny"

Similar presentations


Ads by Google