Download presentation
Presentation is loading. Please wait.
1
Synthesizable, Space and Time Efficient Algorithms for String Editing Problem. Vamsi K. Kundeti
2
Agenda. Synthesizable: –Digital circuit to implement edit distance in hardware. –High speed and area efficient Space and Time efficient algorithms: –Computing the edit script and edit distance in time O(n 2 /log(n)) and O(n) space.
3
Edit Distance Optimization Problem
4
Edit Distance in hardware. Related work. –Parallel systolic array based designs. –Issues with systolic arrays. –e.g. [lipton86], [lopresti87] & [sastry95] Sequential design. –Area efficient and high speed. –Adding edit distance to instruction set of general CPU. –Speedup by reduction in constants.
5
Basic idea behind systolic arrays PE-1PE-2PE-3PE-4 PE-5 PE-7 PE-6 PE-5 PE-7 Entries computed By a single processor Entries computed In parallel. Linear array.
6
Basic idea behind systolic arrays PE-1PE-2PE-3PE-4 PE-5 PE-7 PE-6 PE-5 PE-7 Entries computed By a single processor Entries computed In parallel. T = x Can be computed in parallel
7
Basic idea behind systolic arrays PE-1PE-2PE-3PE-4 PE-5 PE-7 PE-6 PE-5 PE-7 Entries computed By a single processor Entries computed In parallel. T = x+1T = x+2
8
Systolic Array Issues S 1 = [abc], S 2 = [bca] a_b_c b_c_a abcabc b c a pe-1 pe-5pe-4pe-3pe-2 0123 112 21 3 pe-5 pe-4 pe-3 pe-2 pe-1 1. pe-2, pe-4 has to wait until pe-1 is done (synchronous) 2. pe-3 does more computation than others 3. Increased IO complexity
9
Systolic Array Problems. Pros: –Need only O(n) steps to compute edit distance Cons: –Design is too complex. –Although we need only O(n) time we pay big price. Clock Speed Reduction: The design needs a clock with large time period, so can only give speed in MHz. This is due to synchronous nature of design [sastry95] design is only 80MHz speed. –Increased Area, redundancy in form of PE’s doing less work. –I/O bandwidth limits the cost model, constraints the cost of operations under a range. –Needs custom hardware and limits the usage of hardware. Issues with the systolic arrays makes their usage very limited.
10
Motivation behind our work. CPU’s are every where –servers, desktops, laptops etc… Almost all the Bio-Informatics software runs on general CPU’s rather than custom hardware (systolic arrays). Can we add edit distance instruction to the processor instruction set ? This can really help software by reducing the constants in asymptotic complexity.
11
Our Contribution. Key idea behind our design –“Can we compute edit distance using exactly n+2 memory locations” We know if that if we need to compute only edit distance we just need to keep track of two rows which is 2n memory locations.
12
Basic Idea behind our algorithm. aaaabcda 012345678 a 101234567 a 210123456 a 321012345 b 432111234 c 543222123 a 654323222 d 765433323 a 876544432 T = x
13
Basic Idea behind our algorithm. aaaabcda 012345678 a 101234567 a 210123456 a 321012345 b 432111234 c 543222123 a 654323222 d 765433323 a 876544432 T = x Needed for further computation. Just Computed.
14
Basic Idea behind our algorithm. aaaabcda 012345678 a 101234567 a 210123456 a 321012345 b 432111234 c 543222123 a 654323222 d 765433323 a 876544432 T = x+1 Needed for further computation. Computed in previous step Redundant Just Computed
15
Basic Idea behind our algorithm. aaaabcda 12345678 a 101234567 a 210123456 a 321012345 b 432111234 c 543222123 a 654323222 d 765433323 a 876544432 T = x+1 Needed for further computation. Computed in previous step Redundant Just Computed
16
Basic Idea behind our algorithm. aaaabcda 12345678 a 101234567 a 210123456 a 321012345 b 432111234 c 543222123 a 654323222 d 765433323 a 876544432 T = x+2 Needed for further computation. Computed in previous step Redundant Just Computed
17
Basic Idea behind our algorithm. aaaabcda 2345678 a 101234567 a 210123456 a 321012345 b 432111234 c 543222123 a 654323222 d 765433323 a 876544432 T = x+2 Needed for further computation. Computed in previous step Redundant Just Computed
18
Basic Idea behind our algorithm. aaaabcda 2345678 a 101234567 a 210123456 a 321012345 b 432111234 c 543222123 a 654323222 d 765433323 a 876544432 T = x+2 Needed for further computation. Computed in previous step Redundant Just Computed
19
Basic Idea behind our algorithm. aaaabcda 345678 a 101234567 a 210123456 a 321012345 b 432111234 c 543222123 a 654323222 d 765433323 a 876544432 T = x+2 Needed for further computation. Computed in previous step Redundant Just Computed
20
Basic Idea behind our algorithm. aaaabcda 012345678 a 101234567 a 210123456 a 321012345 b 432111234 c 543222123 a 654323222 d 765433323 a 876544432
21
aaaabcda 012345678 a 101234567 a 210123456 a 321012345 b 432111234 c 543222123 a 654323222 d 765433323 a 876544432
22
aaaabcda 012345678 a 101234567 a 210123456 a 321012345 b 432111234 c 543222123 a 654323222 d 765433323 a 876544432
23
aaaabcda 012345678 a 101234567 a 210123456 a 321012345 b 432111234 c 543222123 a 654323222 d 765433323 a 876544432
24
aaaabcda 012345678 a 101234567 a 210123456 a 321012345 b 432111234 c 543222123 a 654323222 d 765433323 a 876544432 Shift register of size n+2 Elements are shifted in as they are computed. And redundant elements shifted out.
25
Top Level Circuit Diagram
26
Design Block: AlgoShifter
27
Design Block: ComputeBlock
28
Design Block: CounterBlock.
29
Verification Simulation-ex1
30
Verification Simulation ex-2
31
Edit Distance Instruction. If we have a t x t edit distance instruction we spend only O(n 2 / t 2 ) time in software, thus this instruction is helpful in reducing the constants and speed-up edit distance computation.
32
Design Metrics.
33
PART-2: Space and Time Efficient Algorithms for Edit Distance. Brief overview of Four Russian Algorithm [russian70]. Brief overview of Hirschberg’s Algorithm [hirschberg75]. Algorithm to compute edit distance and edit script in O(n 2 /log(n)) time and O(n) space.
34
The Four Russian Algorithm. aaaabcda 012345678 a 101234567 a 210123456 a 321012345 b 432111234 c 543222123 a 654323222 d 765433323 a 876544432 Row Overlap Column Overlap t-block n 2 /t 2 blocks idea is to do some pre processing to spend only O(t) time per block runtime O(n 2 /t) Spend only O(t) time to compute the entries in each block
35
Four Russian Algorithm In unit cost model the following is true | D[i+1,j] – D[i,j] | <= 1 (across col) | D[i,j+1] – D[i,j] | <= 1 (across row) This helps us in characterizing any t-block by two vectors of size t. –The vectors will have only {-1,0,1} –e.g [0,1,2,3,….n] can be replaced by vector [0,1,1,1,….n]
36
Look Up table for t-block aaaabcda 012345678 a 101234567 a 210123456 a 321012345 b 432111234 c 543222123 a 654323222 d 765433323 a 876544432 A = [0,1,1,1,1] B = [0,1,1,1,1] C = [_aaab] D = [_aaaa] E=[0,-1,-1,-1,0] F=[0,-1,-1,-1,0] [E,F] = table(A,B,C,D) Preprocessing time O(3 t Σ t t 2 )
37
Hirschberg’s Dynamic Programming formulation. (a 1 a 2 ….a n-1 ) a n (b 1 b 2 ….b n-1 ) b n Standard DP (a 1 a 2 ….a n-1 a n ) (b 1 b 2 ….b n-1 b n ) align …….. (a 1 a 2 … a n/2 ) (a n-1 … a n ) (.) ( …………… ) (a 1 a 2 … a n/2 ) (a n-1 … a n ) (..) ( ………… ) (a 1 a 2 … a n/2 ) (a n-1 … a n ) ( … ) ( ……… )
38
Hirschberg's Algorithm runtime.
39
Our Algorithm. In hirschberg’s algorithm we spend O(n 2 ) time to compute D[n/2,*] and D r [n/2,*]. Can we use the Four Russian framework to Compute D[n/2,*] and D r [n/2,*] in time O(n 2 /log(n)) O(n) space?
40
Using Four Russian Framework at each level Space Usage D[n/2-1,*] D r [n/2-1,*]
41
Using Four Russian Framework at each level Space Usage
42
Using Four Russian Framework at each level Space Usage Spend Only O(n 2 /t) time to compute D[n/2,*] and D r [n/2,*]
43
Using Four Russian Framework at each level Space Usage Spend Only O(n 2 /t) time to compute D[n/2,*] and D r [n/2,*]
44
Cases which require row k which is not a multiple of t Space Usage Use Four Russian framework till FLOOR(k) spend at most O(nt) time to compute row k. However O(n 2 /t 2 ) dominates Required this row k
45
Runtime and Space Analysis. Space: 1.Space during the core algorithm, which we saw is linear. 2.Space to hold the lookup table after the preprocessing. then the space required would be linear for lookup table
46
References. [sastry95] R. Sastry, N. Ranganathan, and K. Remedios. CASM: A VLSI chip for approximate string matching. IEEE Trans. Pattern Anal. Mach. Intell., 17(8):824 – 830, 1995. [lopresti87] D. P. Lopresti. P-NAC: A systolic array for comparing nucleic acid sequences. Computer, 20(7):98 – 99, 1987. [lipton85] R. J. Lipton and D. Lopresti. A systolic array for rapid string comparison. In Chapel Hill Conf. on VLSI, pages 363 – 376, 1985. [russian70] V. L. Arlazarov, E. A. Dinic, M. A. Kronrod, and I. A. Faradzev. On economic construction of the transitive closure of a directed graph. Dokl. Akad. Nauk SSSR, 194:487 – 488, 1970. [hirschberg75] D. S. Hirschberg. Linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341 – 343, 1975.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.