Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert

Slides:



Advertisements
Similar presentations
Celera Assembler Arthur L. Delcher Senior Research Scientist CBCB University of Maryland.
Advertisements

Genome Assembly: a brief introduction
WGS Assembly and Reads Clustering Zemin Ning Production Software Group Informatics Division.
RNA Assembly Using extending method. Wei Xueliang
Lecture 14 Genome sequencing projects
Alignment Problem (Optimal) pairwise alignment consists of considering all possible alignments of two sequences and choosing the optimal one. Sub-optimal.
CS273a Lecture 4, Autumn 08, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector.
DNA Sequencing Lecture 9, Tuesday April 29, 2003.
Genome Sequence Assembly: Algorithms and Issues Fiona Wong Jan. 22, 2003 ECS 289A.
Hidden Markov Model Special case of Dynamic Bayesian network Single (hidden) state variable Single (observed) observation variable Transition probability.
CISC667, F05, Lec4, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Whole genome sequencing Mapping & Assembly.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Genome sequence assembly
CS262 Lecture 11, Win07, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the.
DNA Sequencing. The Walking Method 1.Build a very redundant library of BACs with sequenced clone- ends (cheap to build) 2.Sequence some “seed” clones.
DNA Sequencing Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the circular genome (host)
Assembly.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Win07, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
Novel multi-platform next generation assembly methods for mammalian genomes The Baylor College of Medicine, Australian Government and University of Connecticut.
Sequencing and Assembly Cont’d. CS273a Lecture 5, Aut08, Batzoglou Steps to Assemble a Genome 1. Find overlapping reads 4. Derive consensus sequence..ACGATTACAATAGGTT..
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
CSE182-L10 LW statistics/Assembly. Whole Genome Shotgun Break up the entire genome into pieces Sequence ends, and assemble using a computer LW statistics.
Genome sequencing and assembling
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Assembly Bonnie Hurwitz Graduate student TMPL.
Sequencing Data Quality Saulo Aflitos. Read (≈100bp) Contig (≈2Kbp) Scaffold (≈ 2Mbp) Pseudo Molecule (Super Scaffold) Paired-End Mate-Pair LowComplexityRegion.
De-novo Assembly Day 4.
How to Build a Horse Megan Smedinghoff.
CS 394C March 19, 2012 Tandy Warnow.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
Next generation sequence data and de novo assembly For human genetics By Jaap van der Heijden.
Meraculous: De Novo Genome Assembly with Short Paired-End Reads
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Genome sequencing Haixu Tang School of Informatics.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.
Overview of the Drosophila modENCODE hybrid assemblies Wilson Leung01/2014.
Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Whole Genome Sequencing (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 13, 2005 ChengXiang Zhai Department of Computer Science University of.
CS 173, Lecture B Introduction to Genome Assembly (using Eulerian Graphs) Tandy Warnow.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Drosophila Genomics Where are we now? Where are we going? Christopher Shaffer, Wilson Leung, Sarah Elgin Dept of Biology; Washington University in St.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
454 Genome Sequence Assembly and Analysis HC70AL S Brandon Le & Min Chen.
CISC667, S07, Lec4, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Whole genome sequencing Mapping & Assembly.
Phusion2 Assemblies and Indel Confirmation Zemin Ning The Wellcome Trust Sanger Institute.
JERI DILTS SUZANNA KIM HEMA NAGRAJAN DEEPAK PURUSHOTHAM AMBILY SIVADAS AMIT RUPANI LEO WU Genome Assembly Final Results
Are Roche 454 shotgun reads giving a accurate picture of the genome?
Assembly algorithms for next-generation sequencing data
DNA Sequencing Project
Sequence Assembly.
Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
A Fast Hybrid Short Read Fragment Assembly Algorithm
Metafast High-throughput tool for metagenome comparison
Genome sequence assembly
Professors: Dr. Gribskov and Dr. Weil
Assembly.
Pre-genomic era: finding your own clones
Introduction to Genome Assembly
CS 598AGB Genome Assembly Tandy Warnow.
How to Build a Horse: Final Report
Bioinformatics: Buzzword or Discipline (???)
CSCI 1810 Computational Molecular Biology 2018
Sequence the 3 billion base pairs of human
Presentation transcript:

Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert A machine-learning approach to combined evidence validation of genome assemblies  Jeong-Hyeon Choi, Sun Kim, Haixu Tang, Justen Andrews, Don G. Gilbert John K. Colbourne Presented By – F A Rezaur Rahman Chowdhury

Mate-Pair Shotgun DNA Sequencing DNA target sample SHEAR & SIZE End Reads / Mate Pairs 550bp CLONE & END SEQUENCE 10,000bp

Assembling the fragments NOte that contig orientation/order is not determined

Building Scaffolds Break DNA into random fragments Sequence the ends of the fragments Assemble the sequenced ends Build scaffolds We need to determine the relative order/orientation of contigs Using forward-reverse constraints helps

Assembly Overview Assembly Scaffolding

Assembly Algorithms Greedy (TIGR , phrap, CAP3) De bruijn Graph Graph based Greedy(Celera, Arachne) Euler path based

Statistical Error detection Significant deviations from average coverage.

Distribution of clone length

Good and Bad Clones intra-contig or intra-scaffold clone is called good if the absolute Z- score of its length is smaller than a threshold  Half-placed clones also bad Clones with paired-end reads that are placed in the same or outer orientation 

Machine-learning approach Combine evidence assembly validation Features are taken from the statistical approaches Five different classifier (J48, RF, RT, NB, BN)

Evaluation Simulated dataset with different error rates ( 0.001, 0.003, 0.005) Draft Assembly of Drosophila (D. mojavensis, D. erecta and D. virilis)

ROC Curve

Simulated Data

Drosophila Assembly

Cross Species

Questions?