A Hybrid Assembly System in Zebrafish Pooled Clones

Slides:



Advertisements
Similar presentations
FuzzyPath Assemblies - from Bacterial to Mammalian Genomes and Zebrafish Finishing Zemin Ning The Wellcome Trust Sanger Institute.
Advertisements

Large Plant Genome Assemblies using Phusion2 Zemin Ning The Wellcome Trust Sanger Institute.
WGS Assembly and Reads Clustering Zemin Ning Production Software Group Informatics Division.
CS273a Lecture 4, Autumn 08, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector.
DNA Sequencing Lecture 9, Tuesday April 29, 2003.
CS262 Lecture 11, Win07, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the.
DNA Sequencing Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the circular genome (host)
Stuff to Do. Midterm I questions due 1/31 me your question (with answers), –if you have the capability, mail complete questions, figures, etc. and.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Expanding the Tool Kit for BAC Extension Summary of completion criteria developed for NSF Tomato Sequencing Workshop January 14, 2007.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Aut08, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
DNA Sequencing and Assembly. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
De-novo Assembly Day 4.
Todd J. Treangen, Steven L. Salzberg
Solanum lycopersicum Chromosome 4 Sequencing Update SOL Germany– October 2008 Wellcome Trust Medical Photographic Library.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Developing Bioinformatics Tools for Genome Analysis Zemin Ning The Wellcome Trust Sanger Institute.
Tomato Chromosome 4: A Mapping & Sequencing Update 28 th September 2005 Christine Nicholson Mapping Core Group Welcome Trust Sanger Institute, UK.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
P. Tang ( 鄧致剛 ); RRC. Gan ( 甘瑞麒 ); PJ Huang ( 黄栢榕 ) Bioinformatics Center, Chang Gung University. Genome Sequencing Genome Resequencing De novo Genome.
By Zemin Ning & Adam Spargo Informatics Division The Wellcome Trust Sanger Institute The SSAHA2 Application Pack.
Fuzzypath – Algorithms, Applications and Future Developments
The Changing Face of Sequencing
Solanum lycopersicum Chromosome 4 Sequencing Update UK-SOL– Dec 2008 Wellcome Trust Medical Photographic Library.
FuzzyPath Assemblies - from Mixed Solexa/454 Datasets to Extremely GC Biased Genomes Zemin Ning The Wellcome Trust Sanger Institute.
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
Genome Characterization DNA sequence-ULTIMATE Map DNA sequencing-methods Assembly/sequencing BIO520 BioinformaticsJim Lund Assigned reading: Service 2006.
Problems of Genome Assembly James Yorke and Aleksey Zimin University of Maryland, College Park 1.
Finishing tomato chromosomes #6 and #12 using a Next Generation whole genome shotgun approach Roeland van Ham, CBSG, NL René Klein Lankhorst, EUSOL Giovanni.
Cancer Genome Assemblies and Variations between Normal and Tumour Human Cells Zemin Ning The Wellcome Trust Sanger Institute.
Chromosome 2 Doil Choi, Sunghwan Jo KOREA. Cytological architecture of chromosome kb/µm DAPI (4’-6-diamidino-2-phenylindole) stained pachytene chromosome.
Assembly of Paired-end Solexa Reads by Kmer Extension using Base Qualities Zemin Ning The Wellcome Trust Sanger Institute.
Wageningen, April 24-25, 2008 II Tomato Finishing Workshop Chromosome 12 Update ENEA, Rome University of Naples ‘Federico II’ CRIBI and Univ. of Padua.
Human Genome.
Genome De Novo Assemblies and Applications in NGS Sequencing Zemin Ning The Wellcome Trust Sanger Institute.
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
Solanum lycopersicum Chromosome 4 Mapping and Finishing Update SRC-UK and Wellcome Trust Sanger Institute SOL Korea – September 2007 Wellcome Trust Medical.
FuzzyPath - A Hybrid De novo Assembler using Solexa and 454 Short Reads Zemin Ning The Wellcome Trust Sanger Institute.
The Wellcome Trust Sanger Institute
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
13 th January 2008 Plant & Animal Genome Conference Progress with Sequencing Tomato Chromosome 4 Clare Riddle Tomato Project Group Wellcome Trust Sanger.
16 th April 2007 Christine Nicholson, Mapping Core Group Wellcome Trust Sanger Institute Tomato Chromosome 4 Mapping & Use of FPC Copyright Wellcome Trust.
Cross_genome: Assembly Scaffolding using Cross-species Synteny Zemin Ning High Performance Assembly.
ALLPATHS: De Novo Assembly of Whole-Genome Shotgun Microreads
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
Variation Detections and De novo Assemblies from Next-gen Data Zemin Ning The Wellcome Trust Sanger Institute.
Sequence Alignment and Genome Assembly Zemin Ning The Wellcome Trust Sanger Institute.
Cse587A/Bio 5747: L2 1/19/06 1 DNA sequencing: Basic idea Background: test tube DNA synthesis DNA polymerase (a natural enzyme) extends 2-stranded DNA.
Virginia Commonwealth University
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
Phusion2 and The Genome Assembly of Tasmanian Devil
Cross_genome: Assembly Scaffolding using Cross-species Synteny
Gonzalo Riadi February, 2013 – December, 2015
CAP5510 – Bioinformatics Sequence Assembly
Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert
Genome sequence assembly
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Stuff to Do.
Jin Zhang, Jiayin Wang and Yufeng Wu
Plant & Animal Genome Conference
Introduction to Sequencing
Sequence the 3 billion base pairs of human
Presentation transcript:

A Hybrid Assembly System in Zebrafish Pooled Clones Zemin Ning The Wellcome Trust Sanger Institute 1

extended long reads of 1-2Kb 30-75 bp Insert ~300 bp Solexa assembly Genome/Chromosome Assembly Fishing WGS Reads WGS Reads 5X Combined Reads FuzzyPath Phusion or Phrap Phusion Solexa Reads

Read Coverage or Kmer Coverage

Minimum Kmer Coverage is 2

Kmer Extension & Repeat Junctions Pileup of other reads like 454, Sanger etc at a repeat junction Consensus Means to handle repeats: - Base quality - Read pair - Fuzzy kmers - Closely related reference - 454 or Sanger reads

Pooled Clones: Zfish 9, Pig 3 Clone Name Length (bp) Finished Cloning Vector Species Capillary Data Pathway zH117H1 129221 Yes pTARBAC2.1 D. rerio /nfs/repository/d0012/zH117H1 zH141B18 119622 /nfs/repository/d0012/zH141B18 zH151M17 122622 /nfs/repository/d0014/zH151M17 zH117E7 139449 /nfs/repository/d0015/zH117E7 zH137D22 122615 /nfs/repository/d0023/zH137D22 zH97A24   113538 /nfs/repository/d0027/zH97A24  zH146D21 109862 /nfs/repository/d0040/zH146D21 zH140N19 118794 /nfs/repository/d0013/zH140N19 zH147D24 111470 /nfs/repository/d0011/zH147D24 bE2F11 170585 pTARBAC1.3_BamHI S. scrofa /nfs/repository/d0027/bE2F11 bE156J20 210831 /nfs/repository/d0041/bE156J20 bE240L11 216560* No /nfs/repository/d0012/bE240L11 * Finished length may be shorter or longer once complete

Boundary of Solexa Contigs WGS DH reads and contigs

Mapping of Solexa Reads On the Reference

Zfish and “Pig” Clone Assemblies Solexa reads: Number of reads: 4.3 million; Estimated size of covered region: 1.72 Mbp; Read length: 2x36bp; Estimated read coverage: ~180X; Insert size: 260/50-400 bp; Zfish DH reads: 12,539 Assembly features: - contig stats Solexa Hybrid_Ctg Hybrid_Super N contigs: 496 152 95 Bases: 1.25 Mbp 1.68 Mbp 1.69 Mbp N50 size: 4,975 25,817 74,598 Largest 23,906 79,730 144,808 Averaged: 2,513 11,072 17,815 Coverage: ~72.6 % ~73% ~73% Errors: ? ? ?

Second Set with 50 Zfish Clones Solexa reads: Number of reads: 17.5 million; Estimated size of covered region : ~9.0 Mbp; Read length: 2x54bp; Estimated read coverage: ~190X; Insert size: 260/50-400 bp; Zfish DH capillary reads: 112,583 Assembly features: - contig stats Solexa Hybrid_Ctg Hybrid_Super N contigs: 3,143 688 359 Bases: 4.01 Mbp 8.39 Mbp 8.43 Mbp N50 size: 3,189 24,448 70,703 Largest 23,018 108,090 274,224 Averaged: 1,275 12,194 23,493 Coverage: ~50% ~93% ~94% Errors: ? ? ?

maq ssaha2

maq ssaha2

Contig of hybrid assembly Contig of Zv8 Contig of hybrid assembly

Acknowledgements: Yong Gu James Bonfiled Hannes Ponstingl Helen Beasley Siobhan Whitehead Michael Quail Tony Cox