Genome alignment Usman Roshan
Applications Genome sequencing on the rise Whole genome comparison provides a deeper understanding of biology Evolutionary history Non-coding regions Variant detection
Methods General two-fold approach 1. Find high scoring segments between pair of genomes. Similar to BLAST like k-mer search using hash-tables Also done with suffix tree Similar to short read mapping strategies 2. Perform constrained alignment between high scoring segments
Longest increasing subsequence Simple algorithm takes O(n2) time where n is the input size (total numbers in sequence) Can be solved in O(nlog(n)) time by creating extra data structures and remembering where the previous longest subsequence ended
Simple genome alignment Find high scoring segments with hash tables Line up high scoring segments and find longest increasing subsequence (like in MUMmer) Align between the segments Output full genome alignment
Programs and experimental comparison Alignathon
Exact methods What if we used Smith-Waterman or another exact method to find high scoring segments? Results on simulated data