Download presentation
Presentation is loading. Please wait.
Published byDulcie Thomas Modified over 9 years ago
1
Length Reduction in Binary Transforms Oren Kapah Ely Porat Amir Rothschild Amihood Amir Bar Ilan University and Johns Hopkins University
2
Motivation
3
Bar Ilan UniversityU. of Minnesota Error in Address: Error in Content: U. of MinnesotaBar Ilan University
4
Motivation: Architecture. Assume distributed memory. Our processor has text and requests pattern of length m. Pattern arrives in m asynchronous packets, of the form: Example:,,,, Pattern: BCBAA
5
Our Model… Text: T[0],T[1],…,T[n] Pattern: P[0]=, P[1]=, …, P[m]= ; P[i] є ∑, I[i] є {1, …,m}. Standard pattern Matching: no error in A. Asynchronous Pattern Matching: no error in C. Eventually: error in both.
6
Address Register log m bits “bad” bits What does “bad” mean? 1. bit “flips” its value. 2. bit sometimes flips its value. 3. Transient error.
7
We will now concentrate on consistent bit flips Example: Let ∑={a,b} T[0] T[1] T[2] T[3] a a b b P[0] P[1] P[2] P[3] b b a a
8
P[0] P[1] P[2] P[3] b b a a P[00] P[01] P[10] P[11] b b a a Example: BAD
9
P[0] P[1] P[2] P[3] b b a a P[00] P[01] P[10] P[11] a a b b Example: GOOD
10
P[0] P[1] P[2] P[3] b b a a P[00] P[01] P[10] P[11] a a b b Example: BEST
11
Naïve Algorithm For each of the 2 = m different bit combinations try matching. Choose match with minimum bits. Time: O(m ). 2 log m
12
Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A
13
Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A 3
14
Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A 3
15
Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A 5
16
Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A 0
17
Approximate Pattern Matching Naïve Algorithm Time: O(nm) Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A 4
18
In Pattern Matching Polynomial Multiplication: b 0 b 1 b 2 Naïve Time: O(nm)
19
What do the Two Examples have in Common? What Really Happened? 0 0 0 T[0] T[1] T[2] T[3] 0 0 0 C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] Dot products array: P[0] P[1] P[2] P[3]
20
What Really Happened? 0 0 0 T[0] T[1] T[2] T[3] 0 0 0 C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]
21
What Really Happened? 0 0 0 T[0] T[1] T[2] T[3] 0 0 0 C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]
22
What Really Happened? 0 0 0 T[0] T[1] T[2] T[3] 0 0 0 C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]
23
What Really Happened? 0 0 0 T[0] T[1] T[2] T[3] 0 0 0 C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]
24
What Really Happened? 0 0 0 T[0] T[1] T[2] T[3] 0 0 0 C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]
25
What Really Happened? 0 0 0 T[0] T[1] T[2] T[3] 0 0 0 C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]
26
Another way of defining the transform: Where we define: P[x]=0 for x m.
27
FFT solution to the “shift” convolution: 1. Compute in time O(m log m) (values of X at roots of unity). 2. For polynomial multiplication compute values of product polynomial at roots of unity in time O(m log m). 3. Compute the coefficient of the product polynomial, again in time O(m log m).
28
A General Convolution C f Bijections ; j=1,….,O(m)
29
Consistent bit flip as a Convolution Construct a mask of length log m that has 0 in every bit except for the bad bits where it has a 1. Example: Assume the bad bits are in indices i,j,k є{0,…,log m}. Then the mask is i j k 000001000100001000 An exclusive OR between the mask and a pattern index Gives the target index.
30
Example: Mask: 0010 Index: 1010 1000 Index: 1000 1010
31
Our Case: Denote our convolution by: Our convolution: For each of the 2 =m masks, let jє{0,1} log m
32
To compute min bit flip: Let T,P be over alphabet {0,1}: For each j, is a permutation of P. Thus, only the j ’s for which = number of 1 ‘s in T are valid flips. Since for them all 1’s match 1’s and all 0’s match 0’s. Choose valid j with minimum number of 1’s.
33
Time All convolutions can be computed in time O(m ) After preprocessing the permutation functions as tables. Can we do better? (As in the FFT, for example) 2
34
Idea – Divide and Conquer- Walsh Transform 1.Split T and P to the length m/2 arrays: 2.Compute 3.Use their values to compute in time O(m). Time: Recurrence: t(m)=2t(m/2)+m Closed Form: t(m)=O(m log m)
35
Sparse Transform Applications where most of the input is 0. The locations where there are “1”s are given as inputs We are only interested in the transform results for the locations where all pattern “1”s match text “1”s.
36
Motivation – Point Set Matching 1-D Point Set Matching: T: (t 1,t 2,…,t n ) P: (p 1,p 2,…,p m ) 2-D Point Set Matching – Searching in Music:
37
Notations: Length of text: N Length of Pattern: M Number of “1”s in text: n Number of “1”s in pattern: m.
38
Idea: Map text and pattern to small text and pattern: Hash function h
39
Idea: Do fast transform on the small text and pattern
40
Idea: Map results onto transform result of original text and pattern. h -1
41
Length Reduction in DFT Goal: Given two vectors V 1 &V 2, obtain two vectors V’ 1 &V’ 2 of size O(n’) such that all non-zero in V 1 and in V 2 will appear as singletons respectively while maintaining the distance property. The Distance Property: If V’ 2 [h(0)] is aligned with V’ 1 [h(i )], then V’ 2 [h(j)] is aligned with V’ 1 [h(f i (j))] = V’ 1 [f(i +j)]. Using the reduced size vectors, matching can be done in time O(n’ log n’) using the FFT algorithm.
42
Example: Length Reduction The vectors are given as sets of pairs: (index, value) V 1 : (0, 5), (6, 2), (13, 3), (19, 1) V 2 : (0, 2), (7, 3) Length Reduction Hash Function: mod(5) V’ 1 : V’ 2 : 52031 20300
43
The Randomized Algorithm of Cole & Hariharan [STOC 02] Idea: Find a set of log(n) short vectors, in which with high probability, each non-zero in V, appears as a singleton in at least one of the vectors. Hash functions: (ax mod(q))mod(s). Where q is a large prime number, and s is O(n). If s is c·n, then the probability of a non-zero appearing as a multiple is constant. Using log(n) different hash functions will reduce the failure probability exponentially.
44
Problem For the Walsh Transform, the mod function is useless. The distance property has to do with exclusive or, not addition!
45
IDEA Instead of the modulo function Do an exclusive or( ) of the index bits with a random bit string.
46
Example 111110101100011010001000 address 01000010 Text 10000001 Pattern Let’s do the Walsh Transform
47
Location 000 111110101100011010001000 address 01000010 Text 111110101100011010001000 address 10000001 Pattern 0 Dot Product
48
Location 001-XOR 111110101100011010001000 address 01000010 Text 110111100101010011000001 address 10000001 Pattern 0 Dot Product
49
Location 001-dot product 111110101100011010001000 address 01000010 Text 110111100101010011000001 address 10000001 Pattern 20 Dot Product
50
Location 010-XOR 111110101100011010001000 address 01000010 Text 101101100100111111110110001001000000011011010010 address 10000001 Pattern 20 Dot Product
51
Location 010-dot product 111110101100011010001000 address 01000010 Text 101101100100111111110110001001000000011011010010 address 10000001 Pattern 020 Dot Product
52
Location 011-XOR 111110101100011010001000 address 01000010 Text 100101110111000001010011 address 10000001 Pattern 020 Dot Product
53
Location 011-dot product 111110101100011010001000 address 01000010 Text 100101110111000001010011 address 10000001 Pattern 0020 Dot Product
54
Location 100-XOR 111110101100011010001000 address 01000010 Text 011010001000111110101100 address 10000001 Pattern 0020 Dot Product
55
Location 100-dot product 111110101100011010001000 address 01000010 Text 011010001000111110101100 address 10000001 Pattern 00020 Dot Product
56
Location 101-XOR 111110101100011010001000 address 01000010 Text 010010011011000000001001110110111111100100101101 address 10000001 Pattern 00020 Dot Product
57
Location 101-dot product 111110101100011010001000 address 01000010 Text 010010011011000000001001110110111111100100101101 address 10000001 Pattern 000020 Dot Product
58
Location 110-XOR 111110101100011010001000 address 01000010 Text 001000011010101100111110 address 10000001 Pattern 000020 Dot Product
59
Location 110-dot product 111110101100011010001000 address 01000010 Text 001000011010101100111110 address 10000001 Pattern 2000020 Dot Product
60
Location 111-XOR 111110101100011010001000 address 01000010 Text 000001010011100101110111 address 10000001 Pattern 2000020 Dot Product
61
Location 111-dot product 111110101100011010001000 address 01000010 Text 000001010011100101110111 address 10000001 Pattern 02000020 Dot Product
62
The Length Reduction Reduce the length by half. Choose a mask of log n - 1 bits at random, add to it a MSB 1, and XOR it with each index in the second half. This will randomly hash all 1’s in the second half to the first half.
63
Length Reduction - Example 111110101100011010001000 address 01000010 Text 10000001 Pattern Let mask be 101 010010011011000000001001011010001000 address 01000010 Text 10000001 Pattern
64
Reduced Strings - Example 111110101100011010001000 address 01000010 Text 10000001 Pattern mask is 101 011010001000 address 1010 Text 0101 Pattern
65
Reduced Strings - Example Walsh Transform of reduced string: 011010001000 address 1010 Text 0101 Pattern 0 Walsh transform
66
Reduced Strings - Example Walsh Transform of reduced string: 011010001000 address 1010 Text 0101 Pattern 20 Walsh transform
67
Reduced Strings - Example Walsh Transform of reduced string: 011010001000 address 1010 Text 0101 Pattern 020 Walsh transform
68
Reduced Strings - Example Walsh Transform of reduced string: 011010001000 address 1010 Text 0101 Pattern 2020 Walsh transform Questions: 1. Does the distance property hold? 2. Which of these results is “legal”? 3. Where should it be mapped?
69
Answers: Distance property The Distance Property: If T[h(0)] is aligned with P[h(i )], then T[h(j)] is aligned with P[h(f i (j))] = P[f(i j)]. Holds because both h and f are XOR functions and because of the commutativity and associativity of XOR.
70
Answers: which are “legal”? 111110101100011010001000 address 01000010 Text 10000001 Pattern Let mask be 101 010010011011000000001001011010001000 address 01000010 Text 10000001 Pattern
71
Answers: which are “legal”? 111110101100011010001000 address 01000010 Text 10000001 Pattern mask is 101 011010001000 address 1010 Text 0101 Pattern Original tenants Johnny-come-latelies
72
Answers: which are “legal”? 111110101100011010001000 address 01000010 Text 10000001 Pattern mask is 101 011010001000 address m0s0 Text 0m0s Pattern Original tenants Johnny-come-latelies
73
Answers: which are “legal”? 111110101100011010001000 address 01000010 Text 10000001 Pattern mask is 101 011010001000 address m0s0 Text 0m0s Pattern Observation: Legal multiplications are: all s in text by s in pattern and m in text by m in pattern or all s in text by m in pattern and m in text by s in pattern.
74
Answers: which are “legal”? Observation: Legal multiplications are: 1. all s in text by s in pattern and m in text by m in pattern or 2. all s in text by m in pattern and m in text by s in pattern. This can be checked by a constant number of binary DWT’s with an added benefit: 1. means result stays. 2. means result is moved to its address XOR with the mask.
75
Reduced Strings - Example Walsh Transform of reduced string: 011010001000 address 1010 Text 0101 Pattern 20 Walsh transform Result in correct address
76
Reduced Strings - Example Walsh Transform of reduced string: 011010001000 address 1010 Text 0101 Pattern 2020 Walsh transform Result belongs in address 011 101 = 110
77
Reminder – the dot product 111110101100011010001000 address 01000010 Text 000001010011100101110111 address 10000001 Pattern 02000020 Dot Product
78
Analysis: We could continue this process recursively and analyze probability of clash of masked element with an element that is already there but… More elegant solution:
79
Polynomials over a finite field Consider indices as elements in F 2 L. In F 2 L : x+y = x y. Length Reduction: Every index in F 2 L is written as a polynomial in F 2 ℓ [X] of degree d = L/ℓ - 1.
80
Length reduction example: Index = 17 In binary = 10001 Take ℓ = 2 10001 The polynomial: 1·X 2 + 00·X + 01·1= X 2 +1 Choose a value for X from F 2 ℓ at random and evaluate the polynomial.
81
Length reduction example: Recall that in F 2 ℓ : addition is exclusive or multiplication is polynomial multiplication modulo some irreducible polynomial. So, evaluating a polynomial at X gives a number with ℓ bits.
82
Probability of Collision: Probability of collision of index i and j = Probability that the chosen value of X is the root of the difference polynomial P i (x)-P j (x) Where P i, P j are the polynomials of index i and j, resp. Degree of difference polynomial = d So probability= d/2 ℓ.
83
Moral of the Story: Polynomials are a good candidate for locality preserving length reductions for discrete transforms.
84
The End
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.