GigAssembler. Genome Assembly: A big picture

Slides:



Advertisements
Similar presentations
Next Generation Sequencing, Assembly, and Alignment Methods
Advertisements

Introduction to Short Read Sequencing Analysis
Greg Phillips Veterinary Microbiology
Genome Sequence Assembly: Algorithms and Issues Fiona Wong Jan. 22, 2003 ECS 289A.
CISC667, F05, Lec4, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Whole genome sequencing Mapping & Assembly.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
Genome sequencing and assembling
Genome Assembly Charles Yan Fragment Assembly Given a large number of fragments, such as ACC AC AT AC AT GG …, the goal is to figure out the original.
Towards accurate detection and genotyping of expressed variants from whole transcriptome sequencing data Jorge Duitama 1, Pramod Srivastava 2, and Ion.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Sequence comparison: Local alignment
CS 6293 Advanced Topics: Current Bioinformatics
Next generation sequencing platforms Applications
Biostatistics-Lecture 15 High-throughput sequencing and sequence alignment Ruibin Xi Peking University School of Mathematical Sciences.
Assembling Genomes BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Genomic sequencing and its data analysis Dong Xu Digital Biology Laboratory Computer Science Department Christopher S. Life Sciences Center University.
Sequencing Technologies
Mouse Genome Sequencing
CS 394C March 19, 2012 Tandy Warnow.
Todd J. Treangen, Steven L. Salzberg
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
Phred/Phrap/Consed Analysis A User’s View Arthur Gruber International Training Course on Bioinformatics Applied to Genomic Studies Rio de Janeiro 2001.
Introduction to Short Read Sequencing Analysis
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
PERFORMANCE COMPARISON OF NEXT GENERATION SEQUENCING PLATFORMS Bekir Erguner 1,3, Duran Üstek 2, Mahmut Ş. Sağıroğlu 1 1Advanced Genomics and Bioinformatics.
Assembling Sequences Using Trace Signals and Additional Sequence Information Bastien Chevreux, Thomas Pfisterer, Thomas Wetter, Sandor Suhai Deutsches.
O PTICAL M APPING AS A M ETHOD OF W HOLE G ENOME A NALYSIS M AY 4, 2009 C OURSE : 22M:151 P RESENTED BY : A USTIN J. R AMME.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
CS/BioE 598AGB: Genome Assembly, part II Tandy Warnow.
Physical Mapping of DNA BIO/CS 471 – Algorithms for Bioinformatics.
The iPlant Collaborative
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies to present day DNA SEQUENCING & ASSEMBLY.
Jan Pačes Institute of Molecular Genetics AS CR
Chromosome 12 M. Pietrella 1, G. Falcone 1, E. Fantini 1, A. Fiore 1, C. Perla 1, M.R. Ercolano 2, A. Barone 2, M.L. Chiusano 2, S. Grandillo 3, N. D’Agostino.
Chromosome 12 M. Pietrella 1, G. Falcone 1, E. Fantini 1, A. Fiore 1, M.R. Ercolano 2, A. Barone 2, M.L. Chiusano 2, S. Grandillo 3, N. D’Agostino 2, A.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
1.Data production 2.General outline of assembly strategy.
Anna Shcherbina Bioinformatics Challenge Day 01/10/2013 De novo assembly from clinical sample This work is sponsored by the Defense Threat Reduction Agency.
Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy.
Sequencing technologies and Velvet assembly Lecturer : Du Shengyang September 29 , 2012.
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
Chapter 5 Sequence Assembly: Assembling the Human Genome.
454 Genome Sequence Assembly and Analysis HC70AL S Brandon Le & Min Chen.
CISC667, S07, Lec4, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Whole genome sequencing Mapping & Assembly.
Structural genomics includes the genetic mapping, physical mapping and sequencing of entire genomes.
JERI DILTS SUZANNA KIM HEMA NAGRAJAN DEEPAK PURUSHOTHAM AMBILY SIVADAS AMIT RUPANI LEO WU Genome Assembly Final Results
Introduction to Illumina Sequencing
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Next-generation sequencing technology
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on
Virginia Commonwealth University
PDCB BioC for HTS topic Understanding the tech. 01
CAP5510 – Bioinformatics Sequence Assembly
COMPUTATIONAL GENOMICS GENOME ASSEMBLY
A Fast Hybrid Short Read Fragment Assembly Algorithm
Genome sequence assembly
Next-generation sequencing technology
Sequence comparison: Local alignment
2nd (Next) Generation Sequencing
Padova sequencing contribution:
Count on 2 (Over the bridge)
Assembly of Solexa tomato reads
(Top) Construction of synthetic long read clouds with 10× Genomics technology. (Top) Construction of synthetic long read clouds with 10× Genomics technology.
IWGS workflow. iWGS workflow. A typical iWGS analysis consists of four steps: (1) data simulation (optional); (2) preprocessing (optional); (3) de novo.
Assembling Genomes BCH339N Systems Biology / Bioinformatics – Spring 2016 Edward Marcotte, Univ of Texas at Austin.
Presentation transcript:

GigAssembler

Genome Assembly: A big picture

GigAssembler – Preprocessing 1. Decontaminating & Repeat Masking. 2. Aligning of mRNAs, ESTs, BAC ends & paired reads against initial sequence contigs. psLayout → BLAT 3. Creating an input directory (folder) structure.

The image from Reference: Jones NC, Pevzner PA, Introduction to Bioinformatics Algorithms, MIT press

RepBase + RepeatMasker

GigAssembler: Build merged sequence contigs (“rafts”)

Sequencing quality (Phred Score)

Base-calling Error Probability

GigAssembler: Build merged sequence contigs (“rafts”)

GigAssembler: Build sequenced clone contigs (“barges”)

GigAssembler: Build a “raft-ordering” graph

Add information from mRNAs, ESTs, paired plasmid reads, BAC end pairs: building a “bridge” Different weight to different data type: (mRNA ~ highest) Conflicts with the graph as constructed so far are rejected. Build a sequence path through each raft. Fill the gap with N. 100: between rafts 50,000: between bridged barges

Bellman-Ford algorithm

A B C DE Find the shortest path to all nodes Take every edge and try to relax it (N – 1 times where N is the count of nodes)

A B C DE Find the shortest path to all nodes Take every edge and try to relax it (N – 1 times where N is the count of nodes)

A B C DE Find the shortest path to all nodes START Inf. Take every edge and try to relax it (N – 1 times where N is the count of nodes)

A B C DE Find the shortest path to all nodes START Inf. +7 ( → A) +6 ( → A) Take every edge and try to relax it (N – 1 times where N is the count of nodes)

A B C DE Find the shortest path to all nodes START +7 ( → A) +6 ( → A) +2 ( → B) +4 ( → D) Take every edge and try to relax it (N – 1 times where N is the count of nodes)

A B C DE Find the shortest path to all nodes START +7 ( → A) +2 ( → B) +4 ( → D) +2 ( → C) Take every edge and try to relax it (N – 1 times where N is the count of nodes)

A B C DE Find the shortest path to all nodes START +7 ( → A) +4 ( → D) +2 ( → C) -2 ( → B) Take every edge and try to relax it (N – 1 times where N is the count of nodes)

A B C DE START +7 ( → A) +4 ( → D) +2 ( → C) -2 ( → B) Answer: A-D-C-B-E

Next-generation sequencing

Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 Illumina

Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 Illumina

Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 Roche/454

Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 SOLiD

An example of single molecule DNA sequencing, from Helicos (approx. 1 billion reads / run) Pushkarev, D., N.F. Neff, and S.R. Quake. Nat Biotechnol (2009) 27, Harris, T.D., et al. Science (2008) 320, 106-9

Mapping program Trapnell C, Salzberg SL, Nat. Biotech., 2009

Two strategies in mapping Trapnell C, Salzberg SL, Nat. Biotech., 2009