Download presentation
Presentation is loading. Please wait.
Published byAdam Davidson Modified over 9 years ago
1
Aligning Reads Ramesh Hariharan Strand Life Sciences IISc
2
What is Read Alignment?
3
AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC Subject’s Genome AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC Reference Genome Where do these match in the Reference? Close but not quite the same as the Subject’s Genome
4
What does “Match” mean?
5
AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC Reference Genome GCTACGCA Exact Match CATAAAGAC With Mismatches CACTT_AGT With Gaps
6
Why mismatches and gaps?
7
The subject genome could be different from the reference
8
Reads Reference Genome SNP Deletion Mismatches and Gaps
9
The reading process could be erroneous
10
How many mismatches and gaps?
11
Short reads ~50, few mismatches and gaps Long reads, ~1000, many more mismatches and gaps
12
How do aligners fare?
13
BWA: Very few mismatches and gaps CoBWeb BWA-SW: Many mismatches and gaps BowTie: only mismatches, no gaps No paired read handling No handling of adaptor trimming for small RNA Separate handling for RNASeq BowTie2
14
How does an Aligner work?
15
For simplicity, assume Exact Match
16
For each read, scan the entire reference genome sequence SLOW!!!!
17
CGACG The Reference C C C G G T T T A A C C A A G G A A C C T T Index the Reference
18
How can we find Exact Matches of a read quickly with this index?
19
CGACG The Reference C C C G G T T T A A C C A A G G A A C C T T CG C
20
The problem: 24GB
21
Can this structure be compressed?
22
C G AC$ A C $CG C G AC$ C $ CGA G A C$C $ C GAC The Reference This column is the BWT All its circular shifts, sorted lexicographically The Index: now an array instead of a tree The Burrows- Wheeler based Index Sampled to reduce memory at the expense of speed (Ferragina and Manzini) Sampled to reduce memory at the expense of speed (Ferragina and Manzini)
23
How about Mismatches and Gaps?
24
BWA, BWA-SW and BowTie force mismatches and gaps into the BW Index searching procedure
25
CoBWeb uses the BW Index to find a ‘seed’ exact match and does Smith- Waterman around this seed This 15-mer occurs at locations x1, x2… This 15-mer occurs at locations x3, x4… This whole 30-mer occurs at location x5
26
Dynamic Programming Given a location in the reference with an read anchor, how well does the read match here? Reference Read Anchor 14 mer Smith-Waterman (optimized for large gaps)
27
Comparison with BWA Read Length 50 Read Length 150 20% faster than BWA with comparable results CoBWeb: 3 mismatches and 2 gaps BWA: 2 mismatches + 1 gap of possibly multiple length
28
Comparison with BWA-SW Read Length 400 8 mismatches plus 10 gaps CoBWebBWA-SW Reads1m Time taken1130s2242s Incorrectly Mapped125989819 5650 mapped incorrecty by BWA-SW The remainder has poor BWA mapping quality
29
Avadis NGS
30
Alignment, DNA Var Detection, RNASeq, ChIPSeq, Small RNASeq
31
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.