Short read alignment BNFO 601
Short read alignment Input: –Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine Reads are fragments of a longer DNA sequence present in the sample given as input to the machine Usually in the millions –Genome sequence: a reference DNA sequence much longer than the read length
Short read alignment Applications –Genome assembly –RNA splicing studies –Gene expression studies –Discovery of new genes –Discovering of cancer causing mutations
Short read alignment Two approaches –Hashing based algorithms BFAST SHRIMP MAQ STAMPY (statistical alignment) –Burrows Wheeler transform Bowtie BWA
Courtesy of Nature Biotechnology 27, (2009)
BFAST overview PLoS ONE 4(11): e7767.
BFAST algorithm PLoS ONE 4(11): e7767.
BFAST masked keys
Short read alignment Empirical performance: Simulated data: –Extract random substrings of fixed length with random mutations and gaps –Realign back to reference genome Real data: –Paired reads: two ends of the same sequence –Count number of paired reads within 500 to bases of each other
Short read alignment Courtesy of Genome Res. June : ;
Short read alignment Courtesy of Genome Res. June : ;
Short read alignment