Sequence Alignment. CS262 Lecture 3, Win06, Batzoglou Sequence Alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition.

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
BLAST Sequence alignment, E-value & Extreme value distribution.
Rapid Global Alignments How to align genomic sequences in (more or less) linear time.
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
Sequence Alignment.
Welcome to CS262: Computational Genomics Instructor: Serafim Batzoglou TAs: Eugene Davydov Christina Pop Monday & Wednesday.
Heuristic Local Alignerers 1.The basic indexing & extension technique 2.Indexing: techniques to improve sensitivity Pairs of Words, Patterns 3.Systems.
Linear-Space Alignment. Subsequences and Substrings Definition A string x’ is a substring of a string x, if x = ux’v for some prefix string u and suffix.
6/11/2015 © Bud Mishra, 2001 L7-1 Lecture #7: Local Alignment Computational Biology Lecture #7: Local Alignment Bud Mishra Professor of Computer Science.
Genomic Sequence Alignment. Overview Dynamic programming & the Needleman-Wunsch algorithm Local alignment—BLAST Fast global alignment Multiple sequence.
. Class 4: Fast Sequence Alignment. Alignment in Real Life u One of the major uses of alignments is to find sequences in a “database” u Such collections.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
Welcome to CS262!. Goals of this course Introduction to Computational Biology  Basic biology for computer scientists  Breadth: mention many topics &
Computational Genomics Lecture 1, Tuesday April 1, 2003.
Heuristic alignment algorithms and cost matrices
Space Efficient Alignment Algorithms and Affine Gap Penalties
Sequence Alignment Algorithms in Computational Biology Spring 2006 Edited by Itai Sharon Most slides have been created and edited by Nir Friedman, Dan.
CS 5263 Bioinformatics Lecture 5: Affine Gap Penalties.
Sequence Alignment. Scoring Function Sequence edits: AGGCCTC  MutationsAGGACTC  InsertionsAGGGCCTC  DeletionsAGG. CTC Scoring Function: Match: +m Mismatch:
Sequence Alignment. CS262 Lecture 2, Win06, Batzoglou Complete DNA Sequences More than 300 complete genomes have been sequenced.
Fa05CSE 182 L3: Blast: Keyword match basics. Fa05CSE 182 Silly Quiz TRUE or FALSE: In New York City at any moment, there are 2 people (not bald) with.
Linear-Space Alignment. Linear-space alignment Using 2 columns of space, we can compute for k = 1…M, F(M/2, k), F r (M/2, N – k) PLUS the backpointers.
Sequence Alignment Cont’d. Needleman-Wunsch with affine gaps Initialization:V(i, 0) = d + (i – 1)  e V(0, j) = d + (j – 1)  e Iteration: V(i, j) = max{
Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
. Class 4: Fast Sequence Alignment. Alignment in Real Life u One of the major uses of alignments is to find sequences in a “database” u Such collections.
Sequence Alignment Cont’d. Sequence Alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings.
Sequence Alignment.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Sequence Alignment Lecture 2, Thursday April 3, 2003.
Sequence Alignment. Before we start, administrivia Instructor: Serafim Batzoglou, CS x Office hours: Monday 2:00-3:30 TA:
CS273a Lecture 10, Aut 08, Batzoglou Multiple Sequence Alignment.
Time Warping Hidden Markov Models Lecture 2, Thursday April 3, 2003.
Alignments and Comparative Genomics. Welcome to CS374! Today: Serafim: Alignments and Comparative Genomics Omkar: Administrivia.
Index-based search of single sequences Omkar Mate CS 374 Stanford University.
Sequence Alignment Cont’d. Evolution Scoring Function Sequence edits: AGGCCTC  Mutations AGGACTC  Insertions AGGGCCTC  Deletions AGG.CTC Scoring Function:
CS 6293 Advanced Topics: Current Bioinformatics Lectures 3-4: Pair-wise Sequence Alignment.
Sequence Alignment Cont’d. Linear-space alignment Iterate this procedure to the left and right! N-k * M/2 k*k*
Sequence Alignment Slides courtesy of Serafim Batzoglou, Stanford Univ.
Sequence Alignment III CIS 667 February 10, 2004.
CS262 Lecture 4, Win07, Batzoglou Heuristic Local Alignerers 1.The basic indexing & extension technique 2.Indexing: techniques to improve sensitivity Pairs.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence Alignment Cont’d. CS262 Lecture 4, Win06, Batzoglou Indexing-based local alignment (BLAST- Basic Local Alignment Search Tool) 1.SEED Construct.
Sequence Alignment Lecture 2, Thursday April 3, 2003.
Sequence Alignment. -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Given two strings x = x 1 x 2...x M, y = y 1 y 2 …y N,
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
CS 5263 Bioinformatics Lecture 4: Global Sequence Alignment Algorithms.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Sequence Alignment. 2 Sequence Comparison Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Indexing DNA sequences for local similarity search Joint work of Angela, Dr. Mamoulis and Dr. Yiu 17/5/2007.
Minimum Edit Distance Definition of Minimum Edit Distance.
CS 5263 Bioinformatics CS 4593 AT:Bioinformatics Lectures 3-6: Pair-wise Sequence Alignment.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Computational Genomics I: Sequence Alignment Eric Xing Lecture.
CS 5263 Bioinformatics Lecture 7: Heuristic Sequence Alignment Algorithms (BLAST)
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Doug Raiford Phage class: introduction to sequence databases.
CS 5263 Bioinformatics Lecture 7: Heuristic Sequence Alignment Tools (BLAST) Multiple Sequence Alignment.
CS 5263 Bioinformatics Lectures 3-6: Pair-wise Sequence Alignment.
1 Sequence Alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings x = x 1 x 2...x M, y = y.
Homology Search Tools Kun-Mao Chao (趙坤茂)
CS 6293 Advanced Topics: Translational Bioinformatics
Homology Search Tools Kun-Mao Chao (趙坤茂)
Homology Search Tools Kun-Mao Chao (趙坤茂)
CSE 5290: Algorithms for Bioinformatics Fall 2009
Presentation transcript:

Sequence Alignment

CS262 Lecture 3, Win06, Batzoglou Sequence Alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings x = x 1 x 2...x M, y = y 1 y 2 …y N, an alignment is an assignment of gaps to positions 0,…, N in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gap in the other sequence AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC

CS262 Lecture 3, Win06, Batzoglou The Needleman-Wunsch Algorithm 1.Initialization. F(0, 0) = F(0, j) = F(i, 0) = 0 2.Main Iteration. a.For each i = 1……M For eachj = 1……N F(i-1,j-1) + s(x i, y j ) [case 1] F(i, j) = max F(i-1, j) – d [case 2] F(i, j-1) – d [case 3] DIAG, if [case 1] Ptr(i,j) = LEFT,if [case 2] UP,if [case 3] 3.Termination. F(M, N) is the optimal score, and from Ptr(M, N) can trace back optimal alignment

CS262 Lecture 3, Win06, Batzoglou The Smith-Waterman algorithm Idea: Ignore badly aligning regions Modifications to Needleman-Wunsch: Initialization:F(0, j) = F(i, 0) = 0 0 Iteration:F(i, j) = max F(i – 1, j) – d F(i, j – 1) – d F(i – 1, j – 1) + s(x i, y j )

CS262 Lecture 3, Win06, Batzoglou Scoring the gaps more accurately Simple, linear gap model: Gap of length n incurs penaltyn  d However, gaps usually occur in bunches Convex gap penalty function:  (n): for all n,  (n + 1) -  (n)   (n) -  (n – 1) Algorithm: O(N 3 ) time, O(N 2 ) space  (n)

CS262 Lecture 3, Win06, Batzoglou Compromise: affine gaps  (n) = d + (n – 1)  e || gap gap open extend To compute optimal alignment, At position i, j, need to “remember” best score if gap is open best score if gap is not open F(i, j):score of alignment x 1 …x i to y 1 …y j if if x i aligns to y j if G(i, j):score if x i aligns to a gap after y j if H(i, j): score if y j aligns to a gap after x i V(i, j) = best score of alignment x 1 …x i to y 1 …y j d e  (n)

CS262 Lecture 3, Win06, Batzoglou Needleman-Wunsch with affine gaps Why do we need matrices F, G, H? x i aligns to y j x 1 ……x i-1 x i x i+1 y 1 ……y j-1 y j - 2.x i aligns to a gap x 1 ……x i-1 x i x i+1 y 1 ……y j …- - Add -d Add -e G(i+1, j) = V(i, j) – d G(i+1, j) = G(i, j) – e Because, perhaps G(i, j) < V(i, j) (it is best to align x i to y j if we were aligning only x 1 …x i to y 1 …y j and not the rest of x, y), but on the contrary G(i, j) – e > V(i, j) – d (i.e., had we “fixed” our decision that x i aligns to y j, we could regret it at the next step when aligning x 1 …x i+1 to y 1 …y j )

CS262 Lecture 3, Win06, Batzoglou Needleman-Wunsch with affine gaps Initialization:V(i, 0) = d + (i – 1)  e V(0, j) = d + (j – 1)  e Iteration: V(i, j) = max{ F(i, j), G(i, j), H(i, j) } F(i, j) = V(i – 1, j – 1) + s(x i, y j ) V(i – 1, j) – d G(i, j) = max G(i – 1, j) – e V(i, j – 1) – d H(i, j) = max H(i, j – 1) – e Termination: V(i, j) has the best alignment Time? Space?

CS262 Lecture 3, Win06, Batzoglou To generalize a little… … think of how you would compute optimal alignment with this gap function ….in time O(MN)  (n)

CS262 Lecture 3, Win06, Batzoglou Bounded Dynamic Programming Assume we know that x and y are very similar Assumption: # gaps(x, y) < k(N) xixi Then,|implies | i – j | < k(N) yj yj We can align x and y more efficiently: Time, Space: O(N  k(N)) << O(N 2 )

CS262 Lecture 3, Win06, Batzoglou Bounded Dynamic Programming Initialization: F(i,0), F(0,j) undefined for i, j > k Iteration: For i = 1…M For j = max(1, i – k)…min(N, i+k) F(i – 1, j – 1)+ s(x i, y j ) F(i, j) = maxF(i, j – 1) – d, if j > i – k(N) F(i – 1, j) – d, if j < i + k(N) Termination:same Easy to extend to the affine gap case x 1 ………………………… x M y 1 ………………………… y N k(N)

CS262 Lecture 3, Win06, Batzoglou Linear-Space Alignment

CS262 Lecture 3, Win06, Batzoglou Subsequences and Substrings Definition A string x’ is a substring of a string x, if x = ux’v for some prefix string u and suffix string v (similarly, x’ = x i …x j, for some 1  i  j  |x|) A string x’ is a subsequence of a string x if x’ can be obtained from x by deleting 0 or more letters (x’ = x i1 …x ik, for some 1  i 1  …  i k  |x|) Note: a substring is always a subsequence Example: x = abracadabra y = cadabr; substring z = brcdbr;subseqence, not substring

CS262 Lecture 3, Win06, Batzoglou Hirschberg’s algortihm Given a set of strings x, y,…, a common subsequence is a string u that is a subsequence of all strings x, y, … Longest common subsequence  Given strings x = x 1 x 2 … x M, y = y 1 y 2 … y N,  Find longest common subsequence u = u 1 … u k Algorithm: F(i – 1, j) F(i, j) = maxF(i, j – 1) F(i – 1, j – 1) + [1, if x i = y j ; 0 otherwise] Ptr(i, j) = (same as in N-W) Termination: trace back from Ptr(M, N), and prepend a letter to u whenever Ptr(i, j) = DIAG and F(i – 1, j – 1) < F(i, j) Hirschberg’s algorithm solves this in linear space

CS262 Lecture 3, Win06, Batzoglou F(i,j) Introduction: Compute optimal score It is easy to compute F(M, N) in linear space Allocate ( column[1] ) Allocate ( column[2] ) For i = 1….M If i > 1, then: Free( column[i – 2] ) Allocate( column[ i ] ) For j = 1…N F(i, j) = …

CS262 Lecture 3, Win06, Batzoglou Linear-space alignment To compute both the optimal score and the optimal alignment: Divide & Conquer approach: Notation: x r, y r : reverse of x, y E.g.x = accgg; x r = ggcca F r (i, j): optimal score of aligning x r 1 …x r i & y r 1 …y r j same as aligning x M-i+1 …x M & y N-j+1 …y N

CS262 Lecture 3, Win06, Batzoglou Linear-space alignment Lemma: (assume M is even) F(M, N) = max k=0…N ( F(M/2, k) + F r (M/2, N-k) ) x y M/2 k*k* F(M/2, k) F r (M/2, N-k)

CS262 Lecture 3, Win06, Batzoglou Linear-space alignment Now, using 2 columns of space, we can compute for k = 1…M, F(M/2, k), F r (M/2, N-k) PLUS the backpointers

CS262 Lecture 3, Win06, Batzoglou Linear-space alignment Now, we can find k * maximizing F(M/2, k) + F r (M/2, N-k) Also, we can trace the path exiting column M/2 from k * k*k* k * …… M/2 M/2+1 …… M M+1

CS262 Lecture 3, Win06, Batzoglou Linear-space alignment Iterate this procedure to the left and right! N-k * M/2 k*k*

CS262 Lecture 3, Win06, Batzoglou Linear-space alignment Hirschberg’s Linear-space algorithm: MEMALIGN(l, l’, r, r’):(aligns x l …x l’ with y r …y r’ ) 1.Let h =  (l’-l)/2  2.Find (in Time O((l’ – l)  (r’-r)), Space O(r’-r)) the optimal path,L h, entering column h-1, exiting column h Let k 1 = pos’n at column h – 2 where L h enters k 2 = pos’n at column h + 1 where L h exits 3.MEMALIGN(l, h-2, r, k 1 ) 4.Output L h 5.MEMALIGN(h+1, l’, k 2, r’) Top level call: MEMALIGN(1, M, 1, N)

CS262 Lecture 3, Win06, Batzoglou Linear-space alignment Time, Space analysis of Hirschberg’s algorithm: To compute optimal path at middle column, For box of size M  N, Space: 2N Time:cMN, for some constant c Then, left, right calls cost c( M/2  k * + M/2  (N-k * ) ) = cMN/2 All recursive calls cost Total Time: cMN + cMN/2 + cMN/4 + ….. = 2cMN = O(MN) Total Space: O(N) for computation, O(N+M) to store the optimal alignment

CS262 Lecture 3, Win06, Batzoglou Heuristic Local Alignerers 1.The basic indexing & extension technique 2.Indexing: techniques to improve sensitivity Pairs of Words, Patterns 3.Systems for local alignment

CS262 Lecture 3, Win06, Batzoglou State of biological databases ~10x per 3 years

CS262 Lecture 3, Win06, Batzoglou State of biological databases Number of genes in these genomes:  Mammals: ~24,000  Insects: ~14,000  Worms: ~17,000  Fungi: ~6,000-10,000  Small organisms: 100s-1,000s Each known or predicted gene has one or more associated protein sequences >1,000,000 known / predicted protein sequences

CS262 Lecture 3, Win06, Batzoglou Some useful applications of alignments Given a newly discovered gene,  Does it occur in other species?  How fast does it evolve? Assume we try Smith-Waterman: The entire genomic database Our new gene

CS262 Lecture 3, Win06, Batzoglou Some useful applications of alignments Given a newly sequenced organism, Which subregions align with other organisms?  Potential genes  Other biological characteristics Assume we try Smith-Waterman: The entire genomic database Our newly sequenced mammal 3 

CS262 Lecture 3, Win06, Batzoglou Indexing-based local alignment (BLAST- Basic Local Alignment Search Tool) Main idea: 1.Construct a dictionary of all the words in the query 2.Initiate a local alignment for each word match between query and DB Running Time: O(MN) However, orders of magnitude faster than Smith-Waterman query DB

CS262 Lecture 3, Win06, Batzoglou Indexing-based local alignment Dictionary: All words of length k (~10) Alignment initiated between words of alignment score  T (typically T = k) Alignment: Ungapped extensions until score below statistical threshold Output: All local alignments with score > statistical threshold …… query DB query scan

CS262 Lecture 3, Win06, Batzoglou Indexing-based local alignment— Extensions A C G A A G T A A G G T C C A G T C C C T T C C T G G A T T G C G A Example: k = 4 The matching word GGTC initiates an alignment Extension to the left and right with no gaps until alignment falls < C below best so far Output: GTAAGGTCC GTTAGGTCC

CS262 Lecture 3, Win06, Batzoglou Indexing-based local alignment— Extensions A C G A A G T A A G G T C C A G T C T G A T C C T G G A T T G C G A Gapped extensions Extensions with gaps in a band around anchor Output: GTAAGGTCCAGT GTTAGGTC-AGT

CS262 Lecture 3, Win06, Batzoglou Indexing-based local alignment— Extensions A C G A A G T A A G G T C C A G T C T G A T C C T G G A T T G C G A Gapped extensions until threshold Extensions with gaps until score < C below best score so far Output: GTAAGGTCCAGT GTTAGGTC-AGT

CS262 Lecture 3, Win06, Batzoglou Sensitivity-Speed Tradeoff long words (k = 15) short words (k = 7) Sensitivity Speed Kent WJ, Genome Research 2002 Sens. Speed X%

CS262 Lecture 3, Win06, Batzoglou Sensitivity-Speed Tradeoff Methods to improve sensitivity/speed 1.Using pairs of words 2.Using inexact words 3.Patterns—non consecutive positions ……ATAACGGACGACTGATTACACTGATTCTTAC…… ……GGCACGGACCAGTGACTACTCTGATTCCCAG…… ……ATAACGGACGACTGATTACACTGATTCTTAC…… ……GGCGCCGACGAGTGATTACACAGATTGCCAG…… TTTGATTACACAGAT T G TT CAC G

CS262 Lecture 3, Win06, Batzoglou Measured improvement Kent WJ, Genome Research 2002

CS262 Lecture 3, Win06, Batzoglou Non-consecutive words—Patterns Patterns increase the likelihood of at least one match within a long conserved region 3 common 5 common 7 common Consecutive PositionsNon-Consecutive Positions 6 common On a 100-long 70% conserved region: Consecutive Non-consecutive Expected # hits: Prob[at least one hit]:

CS262 Lecture 3, Win06, Batzoglou Advantage of Patterns 11 positions 10 positions

CS262 Lecture 3, Win06, Batzoglou Multiple patterns K patterns  Takes K times longer to scan  Patterns can complement one another Computational problem:  Given: a model (prob distribution) for homology between two regions  Find: best set of K patterns that maximizes Prob(at least one match) TTTGATTACACAGAT T G TT CAC G T G T C CAG TTGATT A G Buhler et al. RECOMB 2003 Sun & Buhler RECOMB 2004 How long does it take to search the query?

CS262 Lecture 3, Win06, Batzoglou Variants of BLAST NCBI BLAST: search the universe MEGABLAST:  Optimized to align very similar sequences Works best when k = 4i  16 Linear gap penalty WU-BLAST: (Wash U BLAST)  Very good optimizations  Good set of features & command line arguments BLAT  Faster, less sensitive than BLAST  Good for aligning huge numbers of queries CHAOS  Uses inexact k-mers, sensitive PatternHunter  Uses patterns instead of k-mers BlastZ  Uses patterns, good for finding genes Typhon  Uses multiple alignments to improve sensitivity/speed tradeoff

CS262 Lecture 3, Win06, Batzoglou Example Query: gattacaccccgattacaccccgattaca (29 letters) [2 mins] Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences) 1,726,556 sequences; 8,074,398,388 total letters >gi| |gb|AC | Oryza sativa chromosome 3 BAC OSJNBa0087C10 genomic sequence, complete sequence Length = Score = 34.2 bits (17), Expect = 4.5 Identities = 20/21 (95%) Strand = Plus / Plusgi| |gb|AC | Query: 4 tacaccccgattacaccccga 24 ||||||| ||||||||||||| Sbjct: tacacccagattacaccccga Score = 34.2 bits (17), Expect = 4.5 Identities = 20/21 (95%) Strand = Plus / Plus Query: 4 tacaccccgattacaccccga 24 ||||||| ||||||||||||| Sbjct: tacacccagattacaccccga >gi| |gb|AC | Oryza sativa chromosome 3 BAC OSJNBa0052F07 genomic sequence, complete sequence Length = Score = 34.2 bits (17), Expect = 4.5 Identities = 20/21 (95%) Strand = Plus / Plusgi| |gb|AC | Query: 4 tacaccccgattacaccccga 24 ||||||| ||||||||||||| Sbjct: 3891 tacacccagattacaccccga 3911

CS262 Lecture 3, Win06, Batzoglou Example Query: Human atoh enhancer, 179 letters[1.5 min] Result: 57 blast hits 1. gi| |gb|AF |AF Homo sapiens ATOH1 enhanc e-95 gi| |gb|AF |AF gi| |gb|AC | Mus musculus Strain C57BL6/J ch e-68gi| |gb|AC |264 3.gi| |gb|AF |AF Mus musculus Atoh1 enhanc e-66gi| |gb|AF |AF gi| |gb|AF | Gallus gallus CATH1 (CATH1) gene e-12gi| |gb|AF |78 5.gi| |emb|AL | Zebrafish DNA sequence from clo e-05gi| |emb|AL |54 6.gi| |gb|AC | Oryza sativa chromosome 10 BAC O gi| |gb|AC |44 7.gi| |ref|NM_ | Mus musculus suppressor of Ty gi| |ref|NM_ |42 8.gi| |gb|BC | Mus musculus, Similar to suppres gi| |gb|BC |42 gi| |gb|AF |AF218258gi| |gb|AF |AF Mus musculus Atoh1 enhancer sequence Length = 1517 Score = 256 bits (129), Expect = 9e-66 Identities = 167/177 (94%), Gaps = 2/177 (1%) Strand = Plus / Plus Query: 3 tgacaatagagggtctggcagaggctcctggccgcggtgcggagcgtctggagcggagca 62 ||||||||||||| ||||||||||||||||||| |||||||||||||||||||||||||| Sbjct: 1144 tgacaatagaggggctggcagaggctcctggccccggtgcggagcgtctggagcggagca 1203 Query: 63 cgcgctgtcagctggtgagcgcactctcctttcaggcagctccccggggagctgtgcggc 122 |||||||||||||||||||||||||| ||||||||| |||||||||||||||| ||||| Sbjct: 1204 cgcgctgtcagctggtgagcgcactc-gctttcaggccgctccccggggagctgagcggc 1262 Query: 123 cacatttaacaccatcatcacccctccccggcctcctcaacctcggcctcctcctcg 179 ||||||||||||| || ||| |||||||||||||||||||| ||||||||||||||| Sbjct: 1263 cacatttaacaccgtcgtca-ccctccccggcctcctcaacatcggcctcctcctcg

CS262 Lecture 3, Win06, Batzoglou The Four-Russian Algorithm brief overview A (not so useful) speedup of Dynamic Programming [ Arlazarov, Dinic, Kronrod, Faradzev 1970]

CS262 Lecture 3, Win06, Batzoglou Main Observation Within a rectangle of the DP matrix, values of D depend only on the values of A, B, C, and substrings x l...l’, y r…r’ Definition: A t-block is a t  t square of the DP matrix Idea: Divide matrix in t-blocks, Precompute t-blocks Speedup: O(t) A B C D xlxl x l’ yryr y r’ t

CS262 Lecture 3, Win06, Batzoglou The Four-Russian Algorithm Main structure of the algorithm: 1.Divide N  N DP matrix into K  K log 2 N-blocks that overlap by 1 column & 1 row 2.For i = 1……K 3. For j = 1……K 4. Compute D i,j as a function of A i,j, B i,j, C i,j, x[l i …l’ i ], y[r j …r’ j ] Time: O(N 2 / log 2 N) times the cost of step 4 t t t

CS262 Lecture 3, Win06, Batzoglou The Four-Russian Algorithm t t t