Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Slides:



Advertisements
Similar presentations
A Proposed Solution to the Short Read Reassembly Problem Carl Ebeling and Corey Olson.
Advertisements

Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
Processing of miRNA samples and primary data analysis
Parallel Implementation of BWT Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain.
Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain.
Next Generation Sequencing, Assembly, and Alignment Methods
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
1 ALAE: Accelerating Local Alignment with Affine Gap Exactly in Biosequence Databases Xiaochun Yang, Honglei Liu, Bin Wang Northeastern University, China.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Ben Langmead, Cole Trapnell, Mihai Pop, Steven L Salzberg 林恩羽 宋曉亞 陳翰平.
Blockwise Suffix Sorting for Space-Efficient Burrows-Wheeler Ben Langmead Based on work by Juha Kärkkäinen.
A Parallel Solution to Global Sequence Comparisons CSC 583 – Parallel Programming By: Nnamdi Ihuegbu 12/19/03.
Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.
Recap 3 different types of comparisons 1. Whole genome comparison 2. Gene search 3. Motif discovery (shared pattern discovery)
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg Center.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Compressed Index for a Dynamic Collection of Texts H.W. Chan, W.K. Hon, T.W. Lam The University of Hong Kong.
Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers
Heuristic methods for sequence alignment in practice Sushmita Roy BMI/CS 576 Sushmita Roy Sep 27 th,
Accelerating Read Mapping with FastHASH †† ‡ †† Hongyi Xin † Donghyuk Lee † Farhad Hormozdiari ‡ Samihan Yedkar † Can Alkan § Onur Mutlu † † † Carnegie.
Developing Pairwise Sequence Alignment Algorithms
Presented by Mario Flores, Xuepo Ma, and Nguyen Nguyen.
Mapping NGS sequences to a reference genome. Why? Resequencing studies (DNA) – Structural variation – SNP identification RNAseq – Mapping transcripts.
Biostatistics-Lecture 15 High-throughput sequencing and sequence alignment Ruibin Xi Peking University School of Mathematical Sciences.
CS 394C March 19, 2012 Tandy Warnow.
MES Genome Informatics I - Lecture V. Short Read Alignment
Massive Parallel Sequencing
GNUMAP-SNP Nathan Clement The University of Texas Austin, TX, USA.
Genome Alignment. Alignment Methods Needleman-Wunsch (global) and Smith- Waterman (local) use dynamic programming Guaranteed to find an optimal alignment.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA RNA-seq CHIP-seq DNAse I-seq FAIRE-seq Peaks Transcripts Gene models Binding sites RIP/CLIP-seq.
SHRiMP: Accurate Mapping of Short Reads in Letter- and Colour-spaces Stephen Rumble, Phil Lacroute, …, Arend Sidow, Michael Brudno.
Read Alignment Algorithms. The Problem 2 Given a very long reference sequence of length n and given several short strings.
Hash Algorithm and SSAHA Implementations Zemin Ning Production Software Group Informatics.
BNFO 615 Usman Roshan. Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Next Generation Sequencing
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Short Read Mapper Evan Zhen CS 124. Introduction Find a short sequence in a very long DNA sequence Motivation – It is easy to sequence everyone’s genome,
Spliced Transcripts Alignment & Reconstruction
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
From Smith-Waterman to BLAST
Lecture 15 Algorithm Analysis
Doug Raiford Phage class: introduction to sequence databases.
Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Dynamic programming with more complex models When gaps do occur, they are often longer than one residue.(biology) We can still use all the dynamic programming.
Qq q q q q q q q q q q q q q q q q q q Background: DNA Sequencing Goal: Acquire individual’s entire DNA sequence Mechanism: Read DNA fragments and reconstruct.
Indexing genomic sequences 逢甲大學 資訊工程系 許芳榮. Outline Introduction Unique markers Multi-layer unique markers Locating SNP on genome Aligning EST to genome.
Short Read Workshop Day 5: Mapping and Visualization
From Reads to Results Exome-seq analysis at CCBR
Short Read Workshop Day 5: Mapping and Visualization Video 3 Introduction to BWA.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
RNAseq: a Closer Look at Read Mapping and Quantitation
1 BWT Arrays and Mismatching Trees: A New Way for String Matching with k Mismatches 1Yangjun Chen, 2Yujia.
Day 5 Mapping and Visualization
FastHASH: A New Algorithm for Fast and Comprehensive Next-generation Sequence Mapping Hongyi Xin1, Donghyuk Lee1, Farhad Hormozdiari2, Can Alkan3, Onur.
VCF format: variants c.f. S. Brown NYU
Genome alignment Usman Roshan.
Pairwise and NGS read alignment
Department of Computer Science
Homology Search Tools Kun-Mao Chao (趙坤茂)
CSC2431 February 3rd 2010 Alecia Fowler
Next-generation sequencing - Mapping short reads
Lecture 14 Algorithm Analysis
Maximize read usage through mapping strategies
Sahand Kashani, Stuart Byma, James Larus 2019/02/16
BIOINFORMATICS Fast Alignment
Next-generation sequencing - Mapping short reads
CS 6293 Advanced Topics: Translational Bioinformatics
Homology Search Tools Kun-Mao Chao (趙坤茂)
Presentation transcript:

Aligning Reads Ramesh Hariharan Strand Life Sciences IISc

What is Read Alignment?

AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC Subject’s Genome AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC Reference Genome Where do these match in the Reference? Close but not quite the same as the Subject’s Genome

What does “Match” mean?

AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC Reference Genome GCTACGCA Exact Match CATAAAGAC With Mismatches CACTT_AGT With Gaps

Why mismatches and gaps?

The subject genome could be different from the reference

Reads Reference Genome SNP Deletion Mismatches and Gaps

The reading process could be erroneous

How many mismatches and gaps?

Short reads ~50, few mismatches and gaps Long reads, ~1000, many more mismatches and gaps

How do aligners fare?

BWA: Very few mismatches and gaps CoBWeb BWA-SW: Many mismatches and gaps BowTie: only mismatches, no gaps No paired read handling No handling of adaptor trimming for small RNA Separate handling for RNASeq BowTie2

How does an Aligner work?

For simplicity, assume Exact Match

For each read, scan the entire reference genome sequence SLOW!!!!

CGACG The Reference C C C G G T T T A A C C A A G G A A C C T T Index the Reference

How can we find Exact Matches of a read quickly with this index?

CGACG The Reference C C C G G T T T A A C C A A G G A A C C T T CG C

The problem: 24GB

Can this structure be compressed?

C G AC$ A C $CG C G AC$ C $ CGA G A C$C $ C GAC The Reference This column is the BWT All its circular shifts, sorted lexicographically The Index: now an array instead of a tree The Burrows- Wheeler based Index Sampled to reduce memory at the expense of speed (Ferragina and Manzini) Sampled to reduce memory at the expense of speed (Ferragina and Manzini)

How about Mismatches and Gaps?

BWA, BWA-SW and BowTie force mismatches and gaps into the BW Index searching procedure

CoBWeb uses the BW Index to find a ‘seed’ exact match and does Smith- Waterman around this seed This 15-mer occurs at locations x1, x2… This 15-mer occurs at locations x3, x4… This whole 30-mer occurs at location x5

Dynamic Programming Given a location in the reference with an read anchor, how well does the read match here? Reference Read Anchor 14 mer Smith-Waterman (optimized for large gaps)

Comparison with BWA Read Length 50 Read Length % faster than BWA with comparable results CoBWeb: 3 mismatches and 2 gaps BWA: 2 mismatches + 1 gap of possibly multiple length

Comparison with BWA-SW Read Length mismatches plus 10 gaps CoBWebBWA-SW Reads1m Time taken1130s2242s Incorrectly Mapped mapped incorrecty by BWA-SW The remainder has poor BWA mapping quality

Avadis NGS

Alignment, DNA Var Detection, RNASeq, ChIPSeq, Small RNASeq

Thank You