Cross_genome: Assembly Scaffolding using Cross-species Synteny Zemin Ning High Performance Assembly.

Slides:



Advertisements
Similar presentations
Introduction 1.Ordering of P. knowlesi contigs v P. falciparum methodology progress/status towards a synteny map – ‘true’ scaffold 2. Gene prediction generating.
Advertisements

JAMES LINDSAY*, HAMED SALOOTI, ALEX ZELIKOVSKI, ION MANDOIU* Scaffolding Large Genomes Using Integer Linear Programming University of Connecticut*Georgia.
Updating the human reference assembly V.A. Schneider, P. Flicek, T. Graves, T. Hubbard & D.M. Church for the Genome Reference Consortium
Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.
Click to edit Master title style Irys data analysis January 10 th, 2014.
Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick Genome Assembly.
CS273a Lecture 5, Win07, Batzoglou Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort contigs from largest to smallest,
Some new sequencing technologies. Molecular Inversion Probes.
Elephant Seg Dup Analysis 1.Genome 2.Parameters for Pipeline 3.Analysis.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
Evaluation of PacBio sequencing to improve the sunflower genome assembly Stéphane Muños & Jérôme Gouzy Presented by Nicolas Langlade Sunflower Genome Consortium.
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
Reminder: Class on Friday, Discussion of Li et al. Proposal/Projects CAMERA feedback?
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
Hi Kathy, I’ve had a look at the remapped version of chr7 (MAL7.remapped this is the cons file you gave me) and the old version (MAL7.embl) in order to.
How to Build a Horse Megan Smedinghoff.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
A hierarchical approach to building contig scaffolds Mihai Pop Dan Kosack Steven L. Salzberg Genome Research 14(1), pp , 2004.
The New Zealand Institute for Plant & Food Research Limited Potato Genome Sequencing Consortium, notes from the edge Dr Susan Thomson, Dr Mark Fiers, Dr.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Developing Bioinformatics Tools for Genome Analysis Zemin Ning The Wellcome Trust Sanger Institute.
Whole genome scans to localise QTL X. Likely positionQTL Chromosome with mapped markers BAC Contig Spanning QTL region New MarkersCandidate Genes Fine.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
NGS sequencing and Genome Assemblies from Animals and Large Plants Zemin Ning The Wellcome Trust Sanger Institute.
By Zemin Ning & Adam Spargo Informatics Division The Wellcome Trust Sanger Institute The SSAHA2 Application Pack.
Fuzzypath – Algorithms, Applications and Future Developments
The Changing Face of Sequencing
Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library.
Advancing Science with DNA Sequence Metagenome definitions: a refresher course Natalia Ivanova MGM Workshop September 12, 2012.
FuzzyPath Assemblies - from Mixed Solexa/454 Datasets to Extremely GC Biased Genomes Zemin Ning The Wellcome Trust Sanger Institute.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
Chromosome 2 Doil Choi, Sunghwan Jo KOREA. Cytological architecture of chromosome kb/µm DAPI (4’-6-diamidino-2-phenylindole) stained pachytene chromosome.
Progress tomato chromosome 6 René Klein Lankhorst.
Assembly of Paired-end Solexa Reads by Kmer Extension using Base Qualities Zemin Ning The Wellcome Trust Sanger Institute.
Bombus terrestris, the buff-tailed bumble bee Native to Europe A managed pollinator Commercially available Reared in greenhouses Important pollinator in.
HeterochromatinEuchromatin Relative chromosome length Relative bivalent diameter X 1.23 X 1.00 Relative area Relative optical density.
Human Genome.
Genome De Novo Assemblies and Applications in NGS Sequencing Zemin Ning The Wellcome Trust Sanger Institute.
University of Connecticut School of Engineering Assembler Reference Abyss Simpson et al., J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones,
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
FuzzyPath - A Hybrid De novo Assembler using Solexa and 454 Short Reads Zemin Ning The Wellcome Trust Sanger Institute.
The Wellcome Trust Sanger Institute
13 th January 2008 Plant & Animal Genome Conference Progress with Sequencing Tomato Chromosome 4 Clare Riddle Tomato Project Group Wellcome Trust Sanger.
Accessing and visualizing genomics data
Genome representation and variant identification Deanna M. Church, NCBI.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
454 Genome Sequence Assembly and Analysis HC70AL S Brandon Le & Min Chen.
Meet the ants Camponotus floridanus Carpenter ant Harpegnathos saltator Jumping ant Solenopsis invicta Red imported fire ant Pogonomyrmex barbatus Harvester.
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
Variation Detections and De novo Assemblies from Next-gen Data Zemin Ning The Wellcome Trust Sanger Institute.
Phusion2 and The Genome Assembly of Tasmanian Devil
Cross_genome: Assembly Scaffolding using Cross-species Synteny
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Denovo genome assembly of Moniliophthora roreri
M. roreri de novo genome assembly using abyss/1.9.0-maxk96
Professors: Dr. Gribskov and Dr. Weil
A Hybrid Assembly System in Zebrafish Pooled Clones
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Jin Zhang, Jiayin Wang and Yufeng Wu
Figure 2 Use of CRISPR/Cas9 for genome editing
Assembly of BAC ends on P250I21
Volume 10, Issue 6, Pages (June 2017)
Presentation transcript:

Cross_genome: Assembly Scaffolding using Cross-species Synteny Zemin Ning High Performance Assembly

Can synteny help? And How? Contig gap closure Scaffolding

RACA - Reference-assisted chromosome assembly

Target sequence Reference Scaffold 1 Scaffold 2 Scaffold 3 Q = scaff(i)* contig_loci(j) Lattice of Target - Reference

Target sequence Reference Scaffold 1 After Noise Cleaning Y X Gap_size = Y - X Scaffold 2 Scaffold 3

Cases Shouldn’t Join Reference Target Scaffold 1 Scaffold 2 Scaffold 1 Gap_size Reference Target

AssemblerN_basesN_scaffsN50 (Mb) Original Allpahts-LGRACA86.8 Cross_genome Original Bambus2RACA72.1 Cross_genome Original CABOGRACA81.4 Cross_genome Original MSR-CARACA83.4 Cross_genome Original SGARACA57.4 Cross_genome Original SOAPdenovoRACA84.4 Cross_genome Original VelvetRACA123 Cross_genome GAGE: Human Chr14 and RACA using Orangutan

OriginalCross_gReferences Panda 1.3Mb25MbDog, Human Tibetan Antelope 2.6Mb42MbCattle, Dog, Human Tasmanian Devil 1.8Mb6.8MbOpossum Scaffold N50 for Other Genome Assemblies Availability ftp://ftp.sanger.ac.uk/pub/users/zn1/merge/cross_genome/

Improve gorilla assembly using human reference Contig Merge/Break Variation correction Contig gap size re-estimation Read Alignment Pair-wise/Multiple Combined Gorilla- Human Assembly Human Reference Gorilla Assembly Final Gorilla Assembly

Gap size New gap size Target sequence Reference sequence Re-estimate Contig Gap Sizes from Reference New gap size Read alignment and variation correction Ref seq inserted

Contig Consensus using Gap5 Target (query) aligned against Reference Before

Target (query) aligned against Reference Reference Sequence Replacement & Variation Correction

Variations: 2 indels (4bp and 1bp) corrected

Original Contig (query) against New Assembly after Contig Break

Alignment Inconsistency

Original Contig (query) against New Assembly after Contig Break

Alignment Inconsistency

Original New Total number of contigs: 464,875285,139 N50 contig size: 11.7kb23.9kb Largest contig:191,556322,733 Averaged contig size: The Gorilla Assemblies

Acknowledgements:  Hanness Ponstingl  Frank Liu – Nanjing University of Information Technology (NUIT)  Yan Li – (NUIT)  Gorilla genome sequencing data  BGI – Panda and Tibetan Antelope assemblies