Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications Written by: Amihood Amir, Oren Kapah and Ely Porat.

Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications Written by: Amihood Amir, Oren Kapah and Ely Porat

Motivation – Point Set Matching Integer 1-D Point Set Matching: T: (t 1,t 2,…,t n ) P: (p 1,p 2,…,p m ) Where t i and p i are integers. Let N=t n, M=p m. (the maximal index) Time: O(nm), O(N·log(M))

Motivation – Point Set Matching 2-D Point Set Matching – Searching in Music: T: (i 1,j 1 ),(i 2,j 2 ),…,(i n,j n ) P: (i 1,j 1 ),(i 2,j 2 ),…,(i m,j m ) PatternText Dimension Reduction: (i,j) →i·N + j

Motivation – Generalized Case The generalized case of these problems is the d- Dimensional sparse wildcard matching problem. Problem Definition: Given d-Dimensional text T with zeros and non-zeros, and a d-Dimensional pattern P with wildcards and non-zeros. Find all the locations where P matches T. Applications: d-Dimensional point set matching, searching in music, protein activity research, etc.

Length Reduction Goal: Given two vectors V 1 &V 2, obtain two vectors V’ 1 &V’ 2 of size O(n 1 ) such that all non-zero in V 1 and in V 2 will appear as singletons in respectively while maintaining the distance property. The Distance Property: If V’ 2 [f(0)] is aligned with V’ 1 [f(i)], then V’ 2 [f(j)] will be aligned with V’ 1 [f(i + j)]. Using the reduced size vectors, matching can be done in time O(n 1 log(n 1 )) using convolutions.

Example: Length Reduction The vectors are given as sets of pairs: (index, value). V 1 : (0, 5), (6, 2), (13, 3), (19, 1) V 2 : (0, 2), (7, 3) Length Reduction Function: mod(5) V’ 1 : V’ 2 : 52031 20300

The Randomized Algorithm (Cole & Hariharan – STOC02) Idea: Find a set of log(n) short vectors, in which with high probability, each non-zero in V, appears as a singleton in at least one of the vectors. Hash functions: (ax mod(q))mod(s). Where q is a large prime number, and s is O(n). If s is c·n, then the probability of a non-zero appearing as a multiple is constant. Using log(n) different hash functions will reduce the failure probability exponentially.

The Randomized Algorithm Sources of Errors 1. Some non-zeros may appear only as multiples in all the set of vectors. 2. The non-zero from the text which was aligned with the non-zero from the pattern came from a different index (false matches). 3. This algorithm was created for matching, but in convolution each non-zero should be calculated only once.

Deterministic Length Reduction Our Goal: Find a set of log(n) hash functions, which will ensure that each non-zero appears as a singleton at least once. Finding the hash functions is done in a preprocessing step based on V 1. The algorithm distinguish between 2 cases: N 1 is polynomial in n 1. N 1 is exponential in n 1.

The Polynomial case: N<n c Let q be a prime number of size O(n), and mod(q) be the suggested hash function. Let i,j be the indices of two non-zeros. Observation: If i and j are mapped into the same location, it means that q divides d ij. Observation: There are at most c prime numbers of size O(n), which divides d ij. Corollary: A non-zero can appear as a multiple in at most c·n prime numbers.

Choosing Prime Numbers Test 2c·n prime numbers (of size O(nlogn) ), and build the following table: Each column represents a non-zero (n columns). Each row represents a prime number (2c·n rows). Reminder: Each non-zero can appear as a multiple at most c·n times. Corollary: The table is at least half full with ones. NZ 1 NZ 2 NZ 3 NZ 4 NZ 5 P1P1 10010 P2P2 11010 P3P3 00001 P4P4 10101 P5P5 01110 P6P6 01001 P7P7 10110 P8P8 00111 P9P9 11000 P 10 01101 P 11 01011 P 12 10100

Choosing Prime Numbers: Cont. 1. Select a prime number which generates a row that is at least half full. (for example P 2 ) 2. Delete the row and all the columns in which there was 1 in the deleted row. 3. Repeat steps 1 and 2 until the whole table is deleted NZ 1 NZ 2 NZ 3 NZ 4 NZ 5 P1P1 10010 P2P2 11010 P3P3 00001 P4P4 10101 P5P5 01110 P6P6 01001 P7P7 10110 P8P8 00111 P9P9 11000 P 10 01101 P 11 01011 P 12 10100 NZ 3 NZ 5 P1P1 00 P3P3 01 P4P4 11 P5P5 10 P6P6 01 P7P7 10 P8P8 11 P9P9 00 P 10 11 P 11 01 P 12 10 Slected Primes: P 2,P4,P4, Time: O(n 2 )

The Exponential Case: n<2 n Idea: Reduce the length of the vector to polynomial and continue with the previous algorithm. Any distance d ij can be divided by at most n prime numbers. There are at most n 2 different distances. Corollary: There are at most n 3 prime numbers which generates multiples.

The Reduction Algorithm. 1. Choose a prime number q of size O(n 4 ). 2. Create the reduced size vector using the mod(q) hash function. 3. Repeat steps 1&2 if a multiple was created. 4. Duplicate the obtained vector (create a vector of size 2q), to allow further reduction of the vector. Time: O(n 4 )

The Randomized Algorithm Sources of Errors 1. Some non-zeros may appear only as multiples in all the set of vectors. 2. The non-zero from the text which was aligned with the non-zero from the pattern came from a different index (false matches). 3. This algorithm was created for matching, but in convolution each non-zero should be calculated only once.

The Convolution Algorithm 1. For each prime number P i : 1. Create the reduced size vectors V’ 1,i &V’ 2,i using the indices of the non-zeros and perform shift matching. 2. Create the reduced size vectors V’ 1,i &V’ 2,i using 1’s instead of the non-zeros and perform convolution. 3. Create the reduced size vectors V’ 1,i &V’ 2,i using the values of the non-zeros and perform convolution. 4. Zero the value of the non-zeros appeared as singletons. 2. For all indices where shift matching was found: 1. Sum the results of the 1’s convolutions. 2. If the result is n 2 then sum the results of the values convolutions and report the result. Time: O(nlog 3 (n))

Example V 1 : (0, 5), (5, 2), (13, 3), (20, 1) V 2 : (0, 2), (8, 3) Prime Numbers: 5,7 V’ 1,1 : V’ 2,1 : (5, 1, 9), (13, 1, 6) V’ 1,2 : V’ 2,2 : (0, 1, 10), (5, 1, 4) 000 13 0 ‘0’‘0’ 00 8 0 ‘0’‘0’ 0000 5 0 ‘0’‘0’ 8 00000 00030 20030 5000020 2300000 00010 10010

Conclusions and Open Problems A deterministic algorithm for length reduction and fast convolution was presented. Preprocessing time: O(n 2 ) – Polynomial case, O(n 4 ) – Exponential case. Running time: O(nlog 2 n) Open problems: Can the preprocessing time be reduced? Can the size of the vectors be reduced? Can the number of vectors be reduced?

Thank You!

Questions?

Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications Written by: Amihood Amir, Oren Kapah and Ely Porat.

Similar presentations

Presentation on theme: "Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications Written by: Amihood Amir, Oren Kapah and Ely Porat."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications Written by: Amihood Amir, Oren Kapah and Ely Porat.

Similar presentations

Presentation on theme: "Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications Written by: Amihood Amir, Oren Kapah and Ely Porat."— Presentation transcript:

Similar presentations

About project

Feedback