John Dorband, Yaacov Yesha, and Ashwin Ganesan Analysis of DNA Sequence Alignment Tools.

Slides:



Advertisements
Similar presentations
SeqMapReduce: software and web service for accelerating sequence mapping Yanen Li Department of Computer Science, University of Illinois at Urbana-Champaign.
Advertisements

A new method of finding similarity regions in DNA sequences Laurent Noé Gregory Kucherov LORIA/UHP Nancy, France LORIA/INRIA Nancy, France Corresponding.
GNUMap: Unbiased Probabilistic Mapping of Next- Generation Sequencing Reads Nathan Clement Computational Sciences Laboratory Brigham Young University Provo,
Fast and accurate short read alignment with Burrows–Wheeler transform
High Throughput Sequencing Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Group 1 (1)陳伊瑋 (2)沈國曄 (3)唐婉馨 (4)吳彥緯 (5)魏銘良
Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain.
Introduction to Short Read Sequencing Analysis
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Ben Langmead, Cole Trapnell, Mihai Pop, Steven L Salzberg 林恩羽 宋曉亞 陳翰平.
Variant discovery Different approaches: With or without a reference? With a reference – Limiting factors are CPU time and memory required – Crossbow –
Bowtie2: Extending Burrows-Wheeler-based read alignment to longer reads and gapped alignments Ben Langmead 1, 2, Mihai Pop 1, Rafael A. Irizarry 2 and.
Ultrafast and memory-efficient alignment of short reads to the human genome Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg Center for Bioinformatics.
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis (DNA) Yan Guo.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg Center.
Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers
Presented by Mario Flores, Xuepo Ma, and Nguyen Nguyen.
Mapping NGS sequences to a reference genome. Why? Resequencing studies (DNA) – Structural variation – SNP identification RNAseq – Mapping transcripts.
Genome & Exome Sequencing Read Mapping Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Assembling Genomes BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Bioinformatics Sean Langford, Larry Hale. What is it?  Bioinformatics is a scientific field involving many disciplines that focuses on the development.
Todd J. Treangen, Steven L. Salzberg
Anomaly Detection Using Symmetric Compression Benjamin Arai & Chris Baron Computer Science and Engineering Department University of California - Riverside.
Human SNPs from short reads in hours using cloud computing Ben Langmead 1, 2, Michael C. Schatz 2, Jimmy Lin 3, Mihai Pop 2, Steven L. Salzberg 2 1 Department.
Introduction to Short Read Sequencing Analysis
GNUMAP-SNP Nathan Clement The University of Texas Austin, TX, USA.
Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA RNA-seq CHIP-seq DNAse I-seq FAIRE-seq Peaks Transcripts Gene models Binding sites RIP/CLIP-seq.
SHRiMP: Accurate Mapping of Short Reads in Letter- and Colour-spaces Stephen Rumble, Phil Lacroute, …, Arend Sidow, Michael Brudno.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
BNFO 615 Usman Roshan. Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
SAVANT GENOME BROWSER Marc Fiume Department of Computer Science University of Toronto.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
VARiD: A Variation Detection Framework for Color-space and Letter- space platforms By A.V. Dalca, S. M. Rumble, S. Levy, M. Brudno Presented by Velian.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
SHRiMP: The SHort Read Mapping Package Michael Brudno Department of Computer Science University of Toronto 11/09/08.
Big Data Bioinformatics By: Khalifeh Al-Jadda. Is there any thing useful?!
Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Qq q q q q q q q q q q q q q q q q q q Background: DNA Sequencing Goal: Acquire individual’s entire DNA sequence Mechanism: Read DNA fragments and reconstruct.
Short Read Workshop Day 5: Mapping and Visualization
From Reads to Results Exome-seq analysis at CCBR
Short Read Workshop Day 5: Mapping and Visualization Video 3 Introduction to BWA.
RNAseq: a Closer Look at Read Mapping and Quantitation
1 BWT Arrays and Mismatching Trees: A New Way for String Matching with k Mismatches 1Yangjun Chen, 2Yujia.
Day 5 Mapping and Visualization
Burrows-Wheeler Transformation Review
FastHASH: A New Algorithm for Fast and Comprehensive Next-generation Sequence Mapping Hongyi Xin1, Donghyuk Lee1, Farhad Hormozdiari2, Can Alkan3, Onur.
Introduction to Bioinformatics Resources for DNA Barcoding
Lesson: Sequence processing
An Introduction to RNA-Seq Data and Differential Expression Tools in R
Automatic Digitizing.
Rod Eyles1, John Juma1, Morag Ferguson1, Trushar Shah1 1 IITA, Nairobi
Pairwise and NGS read alignment
Jin Zhang, Jiayin Wang and Yufeng Wu
MapView: visualization of short reads alignment on a desktop computer
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
CSC2431 February 3rd 2010 Alecia Fowler
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Next-generation sequencing - Mapping short reads
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Lecture 14 Algorithm Analysis
Maximize read usage through mapping strategies
BIOINFORMATICS Fast Alignment
Next-generation sequencing - Mapping short reads
CS 6293 Advanced Topics: Translational Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Canadian Bioinformatics Workshops
Assembling Genomes BCH339N Systems Biology / Bioinformatics – Spring 2016 Edward Marcotte, Univ of Texas at Austin.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Toward Accurate and Quantitative Comparative Metagenomics
Presentation transcript:

John Dorband, Yaacov Yesha, and Ashwin Ganesan Analysis of DNA Sequence Alignment Tools

Project Goal The goal of our project is analyzing DNA sequence alignment tools, such as SHRiMP [1], Bowtie [2], BWA [3], and BFAST [4], explaining why different tools produce different results, and finding ways of improving the tools.

Alignment of Short Reads A common task is aligning short reads of DNA to a reference genome (database). A common technique used by DNA alignment tools is creating a searchable index.

Transitions Vs. Transversions As mentioned in [5], transition mutations (A G and C T) have higher probability than transversion mutations (other subsitutions). [5] utilized this facts for improving DNA alignment. We introduced the following technique: In situations where mutation rate is suffiently high compared with sequencing error rate, use different penalties for transition mismatches and tranversion mismathces, in algorithms, such as those used in Bowtie [2] and BWA [3], that are related to the Burrows Wheeler transform [6]. We plan to test our technique.

Comparing DNA Alignment Tools Our work also includes comparing several DNA alignment tools. We compared Bowtie and SHRiMP, and found out that SHRiMP mapped 74.18%, while Bowtie mapped 35.79%. We plan to use simulated data, as was used in [5], in order to compare sensitivity and specificity of different DNA alignment tools.

A Performance Issue At IGS it was found that BWA was performing an enormous number of opens and closes of files, which resulted in extremely poor performance We analysed the problem and concluded that this is likely caused by file locks by the system We recommend that the BWA code be checked and likely modified in order to eliminate this problem

Polymorphism One claimed strength of SHRiMP [3] is handling substantial polymorphism. We plan on using simulated test data that will include substantial polymorphism in addition to sequencing errors. We plan to run SHRiMP and also other mapping tools on that data and compare sensitivity and specificity.

References [1] Stephen M. Rumble, Phil Lacroute, Adrian V. Dalca1, Marc Fiume, Arend Sidow, Michael Brudno, SHRiMP: Accurate Mapping of Short Color-space Reads, PLoS Computational Biology, May [2] Ben Langmead, Cole Trapnell, Mihai Pop and Steven L Salzberg, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biology [3] Heng Li and Richard Durbin, Fast and accurate short read alignment with Burrows–Wheeler transform, Bioinformatics (2009).

References (continued) [ 4] Nils Homer, Barry Merriman, Stanley F. Nelson, BFAST: An Alignment Tool for Large Scale Genome Resequencing, PLoS ONE, [5] Laurent Noé* and Gregory Kucherov, Improved hit criteria for DNA local alignment, BMC Bioinformatics 2004, 5:149. [6] M. Burrows and D.J. Wheeler, A Block-sorting Lossless Data Compression Algorithm, SRC Research Report 124, May 10, 1994, digital, Systems Research Center, 130 Lytton Avenue, Palo Alto, California 94301,

Thank You!