Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications Written by: Amihood Amir, Oren Kapah and Ely Porat.

Similar presentations


Presentation on theme: "Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications Written by: Amihood Amir, Oren Kapah and Ely Porat."— Presentation transcript:

1 Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications Written by: Amihood Amir, Oren Kapah and Ely Porat

2 Motivation – Point Set Matching Integer 1-D Point Set Matching: T: (t 1,t 2,…,t n ) P: (p 1,p 2,…,p m ) Where t i and p i are integers. Let N=t n, M=p m. (the maximal index) Time: O(nm), O(N·log(M))

3 Motivation – Point Set Matching 2-D Point Set Matching – Searching in Music: T: (i 1,j 1 ),(i 2,j 2 ),…,(i n,j n ) P: (i 1,j 1 ),(i 2,j 2 ),…,(i m,j m ) PatternText Dimension Reduction: (i,j) →i·N + j

4 Motivation – Generalized Case The generalized case of these problems is the d- Dimensional sparse wildcard matching problem. Problem Definition: Given d-Dimensional text T with zeros and non-zeros, and a d-Dimensional pattern P with wildcards and non-zeros. Find all the locations where P matches T. Applications: d-Dimensional point set matching, searching in music, protein activity research, etc.

5 Length Reduction Goal: Given two vectors V 1 &V 2, obtain two vectors V’ 1 &V’ 2 of size O(n 1 ) such that all non-zero in V 1 and in V 2 will appear as singletons in respectively while maintaining the distance property. The Distance Property: If V’ 2 [f(0)] is aligned with V’ 1 [f(i)], then V’ 2 [f(j)] will be aligned with V’ 1 [f(i + j)]. Using the reduced size vectors, matching can be done in time O(n 1 log(n 1 )) using convolutions.

6 Example: Length Reduction The vectors are given as sets of pairs: (index, value). V 1 : (0, 5), (6, 2), (13, 3), (19, 1) V 2 : (0, 2), (7, 3) Length Reduction Function: mod(5) V’ 1 : V’ 2 : 52031 20300

7 The Randomized Algorithm (Cole & Hariharan – STOC02) Idea: Find a set of log(n) short vectors, in which with high probability, each non-zero in V, appears as a singleton in at least one of the vectors. Hash functions: (ax mod(q))mod(s). Where q is a large prime number, and s is O(n). If s is c·n, then the probability of a non-zero appearing as a multiple is constant. Using log(n) different hash functions will reduce the failure probability exponentially.

8 The Randomized Algorithm Sources of Errors 1. Some non-zeros may appear only as multiples in all the set of vectors. 2. The non-zero from the text which was aligned with the non-zero from the pattern came from a different index (false matches). 3. This algorithm was created for matching, but in convolution each non-zero should be calculated only once.

9 Deterministic Length Reduction Our Goal: Find a set of log(n) hash functions, which will ensure that each non-zero appears as a singleton at least once. Finding the hash functions is done in a preprocessing step based on V 1. The algorithm distinguish between 2 cases: N 1 is polynomial in n 1. N 1 is exponential in n 1.

10 The Polynomial case: N<n c Let q be a prime number of size O(n), and mod(q) be the suggested hash function. Let i,j be the indices of two non-zeros. Observation: If i and j are mapped into the same location, it means that q divides d ij. Observation: There are at most c prime numbers of size O(n), which divides d ij. Corollary: A non-zero can appear as a multiple in at most c·n prime numbers.

11 Choosing Prime Numbers Test 2c·n prime numbers (of size O(nlogn) ), and build the following table: Each column represents a non-zero (n columns). Each row represents a prime number (2c·n rows). Reminder: Each non-zero can appear as a multiple at most c·n times. Corollary: The table is at least half full with ones. NZ 1 NZ 2 NZ 3 NZ 4 NZ 5 P1P1 10010 P2P2 11010 P3P3 00001 P4P4 10101 P5P5 01110 P6P6 01001 P7P7 10110 P8P8 00111 P9P9 11000 P 10 01101 P 11 01011 P 12 10100

12 Choosing Prime Numbers: Cont. 1. Select a prime number which generates a row that is at least half full. (for example P 2 ) 2. Delete the row and all the columns in which there was 1 in the deleted row. 3. Repeat steps 1 and 2 until the whole table is deleted NZ 1 NZ 2 NZ 3 NZ 4 NZ 5 P1P1 10010 P2P2 11010 P3P3 00001 P4P4 10101 P5P5 01110 P6P6 01001 P7P7 10110 P8P8 00111 P9P9 11000 P 10 01101 P 11 01011 P 12 10100 NZ 3 NZ 5 P1P1 00 P3P3 01 P4P4 11 P5P5 10 P6P6 01 P7P7 10 P8P8 11 P9P9 00 P 10 11 P 11 01 P 12 10 Slected Primes: P 2,P4,P4, Time: O(n 2 )

13 The Exponential Case: n<2 n Idea: Reduce the length of the vector to polynomial and continue with the previous algorithm. Any distance d ij can be divided by at most n prime numbers. There are at most n 2 different distances. Corollary: There are at most n 3 prime numbers which generates multiples.

14 The Reduction Algorithm. 1. Choose a prime number q of size O(n 4 ). 2. Create the reduced size vector using the mod(q) hash function. 3. Repeat steps 1&2 if a multiple was created. 4. Duplicate the obtained vector (create a vector of size 2q), to allow further reduction of the vector. Time: O(n 4 )

15 The Randomized Algorithm Sources of Errors 1. Some non-zeros may appear only as multiples in all the set of vectors. 2. The non-zero from the text which was aligned with the non-zero from the pattern came from a different index (false matches). 3. This algorithm was created for matching, but in convolution each non-zero should be calculated only once.

16 The Convolution Algorithm 1. For each prime number P i : 1. Create the reduced size vectors V’ 1,i &V’ 2,i using the indices of the non-zeros and perform shift matching. 2. Create the reduced size vectors V’ 1,i &V’ 2,i using 1’s instead of the non-zeros and perform convolution. 3. Create the reduced size vectors V’ 1,i &V’ 2,i using the values of the non-zeros and perform convolution. 4. Zero the value of the non-zeros appeared as singletons. 2. For all indices where shift matching was found: 1. Sum the results of the 1’s convolutions. 2. If the result is n 2 then sum the results of the values convolutions and report the result. Time: O(nlog 3 (n))

17 Example V 1 : (0, 5), (5, 2), (13, 3), (20, 1) V 2 : (0, 2), (8, 3) Prime Numbers: 5,7 V’ 1,1 : V’ 2,1 : (5, 1, 9), (13, 1, 6) V’ 1,2 : V’ 2,2 : (0, 1, 10), (5, 1, 4) 000 13 0 ‘0’‘0’ 00 8 0 ‘0’‘0’ 0000 5 0 ‘0’‘0’ 8 00000 00030 20030 5000020 2300000 00010 10010

18 Conclusions and Open Problems A deterministic algorithm for length reduction and fast convolution was presented. Preprocessing time: O(n 2 ) – Polynomial case, O(n 4 ) – Exponential case. Running time: O(nlog 2 n) Open problems: Can the preprocessing time be reduced? Can the size of the vectors be reduced? Can the number of vectors be reduced?

19 Thank You!

20 Questions?


Download ppt "Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications Written by: Amihood Amir, Oren Kapah and Ely Porat."

Similar presentations


Ads by Google