MCB3895-004 Lecture #15 Oct 23/14 De novo assemblies using PacBio.

Slides:



Advertisements
Similar presentations
Mo17 shotgun project Goal: sequence Mo17 gene space with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of.
Advertisements

IMGS 2012 Bioinformatics Workshop: File Formats for Next Gen Sequence Analysis.
DNAseq analysis Bioinformatics Analysis Team
Click to edit Master title style Irys data analysis January 10 th, 2014.
Next-generation sequencing
Introduction to Short Read Sequencing Analysis
MCB Lecture #21 Nov 20/14 Prokaryote RNAseq.
Some new sequencing technologies. Molecular Inversion Probes.
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
Henrik Lantz - BILS/SciLife/Uppsala University
Evaluation of PacBio sequencing to improve the sunflower genome assembly Stéphane Muños & Jérôme Gouzy Presented by Nicolas Langlade Sunflower Genome Consortium.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers
Delon Toh. Pitfalls of 2 nd Gen Amplification of cDNA – Artifacts – Biased coverage Short reads – Medium ~100bp for Illumina – 700bp for 454.
Sequencing Data Quality Saulo Aflitos. Read (≈100bp) Contig (≈2Kbp) Scaffold (≈ 2Mbp) Pseudo Molecule (Super Scaffold) Paired-End Mate-Pair LowComplexityRegion.
Detecting copy number variations using paired-end sequence data Nick Furlotte CS224 May 29, 2009.
De-novo Assembly Day 4.
CS 394C March 19, 2012 Tandy Warnow.
Bacterial Genome Assembly C. Victor Jongeneel Bacterial Genome Assembly | C. Victor Jongeneel | PowerPoint by Casey Hanson.
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
Introduction to next generation sequencing Rolf Sommer Kaas.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Introduction to Short Read Sequencing Analysis
How I learned to quit worrying Deanna M. Church Staff Scientist, Short Course in Medical Genetics 2013 And love multiple coordinate.
GENOME SEQUENCING AND ASSEMBLY Mayo/UIUC Summer Course in Computational Biology.
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Quick introduction to genomic file types Preliminary quality control (lab)
The iPlant Collaborative
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
De Novo Genome Assembly - Introduction Henrik Lantz - BILS/SciLife/Uppsala University.
Problems of Genome Assembly James Yorke and Aleksey Zimin University of Maryland, College Park 1.
Bombus terrestris, the buff-tailed bumble bee Native to Europe A managed pollinator Commercially available Reared in greenhouses Important pollinator in.
RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.
Overview of the Drosophila modENCODE hybrid assemblies Wilson Leung01/2014.
billion-piece genome puzzle
University of Connecticut School of Engineering Assembler Reference Abyss Simpson et al., J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones,
The Genome Assemblies of Tasmanian Devil Zemin Ning The Wellcome Trust Sanger Institute.
De novo assembly validation
Sequence File Formats.
De Novo Genome Assembly - Introduction
Denovo Sequencing Practical. Overview Very small dataset from Staphylococcus aureus – 4 million x 75 base-pair, paired end reads Cover basic aspects of.
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Third Generation Sequencing. Today Illumina – Solexa sequencing technology 454 Life sciences – 454 sequencer Applied Biosystem – SOLiD system Tomorrow.
Meet the ants Camponotus floridanus Carpenter ant Harpegnathos saltator Jumping ant Solenopsis invicta Red imported fire ant Pogonomyrmex barbatus Harvester.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Short Read Workshop Day 1 - Experimental Design Example 1: How to log in to vieques.
Bacterial Genome Assembly Tutorial: C. Victor Jongeneel Bacterial Genome Assembly v9 | C. Victor Jongeneel1 Powerpoint: Casey Hanson.
Short Read Sequencing Analysis Workshop
Sequence Assembly.
Sequencing technologies
Quality Control & Preprocessing of Metagenomic Data
Introduction to next generation sequencing
Bacterial Genome Assembly
Short Read Sequencing Analysis Workshop
Genomics Sequencing genomes.
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Professors: Dr. Gribskov and Dr. Weil
Assembly.
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
Sequencing technology and assembly
MiSeq Validation Pipeline
Bacterial Genome Assembly
Do You Want to Build a Transcriptome?
2nd (Next) Generation Sequencing
Maximize read usage through mapping strategies
Single-Molecule Sequencing: Towards Clinical Applications
Canadian Bioinformatics Workshops
Mapping rates of different transcript sets to the P
Apollo: A Sequencing-Technology-Independent, Scalable,
Presentation transcript:

MCB Lecture #15 Oct 23/14 De novo assemblies using PacBio

PacBio Long read sequencing technology High error rate (~13%) threw people at first What would this be good for? Scaffolding an early focus Also correct reads using Illumina data (now obsolete)

HGAP "Hierarchical Genome Assembly Process" 1.Preassembly - corrects longest reads by mapping shorter reads to them, quality trims 2.Assembly - OLC approach 3.Polishing - Quiver software derives consensus from mapped reads, uses to correct assembly

Results My test gave an impressive 1 contig! High ~60X coverage, tame dataset Known problem: still some SNP errors Can run Quiver again 1.Import assembly as a reference sequence 2.Perform reference mapping using same reads vs. new reference 3.Will output a new consensus fasta file incorporating the variants it finds

PacBio chemistries PacBio has continually updated both its polymerases and detection chemistry Current test data uses P4-C2 chemistry P5-C3 gave slightly better length, maybe a bit more error Fastq available for this E.coli: SRR Brand new: P6-C4

P6-C4 As per last week 10-15kb read N50 Slightly better accuracy? nces.com/2014/10/new- chemistry-boosts- average-read.html

Other options: hybrid assemby It is possible to combine multiple data types Goal: cover the respective strengths of each (of course, could confound too!) SPAdes is one of the most flexible assemblers in this regard Must have some Illumina Will accept corrected, uncorrected PacBio (and many more, including Oxford Nanopore)

Assignment #7 Create 2 E.coli assemblies using PacBio data Use P4-C2 alone and HGAP Use Illumina + P5-C3 uncorrected Use Illumina + P4-C2 uncorrected Use Illumina + P4-C2 corrected Multiple quiver steps to correct Some other option! Hand in: 2 genome assemblies Lab notebook file detailing exact commands