Length Reduction in Binary Transforms Oren Kapah Ely Porat Amir Rothschild Amihood Amir Bar Ilan University and Johns Hopkins University.

Slides:



Advertisements
Similar presentations
Fast Fourier Transform for speeding up the multiplication of polynomials an Algorithm Visualization Alexandru Cioaca.
Advertisements

Mathematics of Cryptography Part II: Algebraic Structures
Applied Algorithmics - week7
Cryptography and Network Security
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Asynchronous Pattern Matching - Metrics Amihood Amir CPM 2006.
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
1 2 Dimensional Parameterized Matching Carmit Hazay Moshe Lewenstein Dekel Tsur.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
String Matching COMP171 Fall String matching 2 Pattern Matching * Given a text string T[0..n-1] and a pattern P[0..m-1], find all occurrences of.
Deterministic Length Reduction: Fast Convolution in Sparse Data and Applications Written by: Amihood Amir, Oren Kapah and Ely Porat.
Introduction Polynomials
Pattern Matching in Weighted Sequences Oren Kapah Bar-Ilan University Joint Work With: Amihood Amir Costas S. Iliopoulos Ely Porat.
CSE 421 Algorithms Richard Anderson Lecture 13 Divide and Conquer.
String Matching with Mismatches Some slides are stolen from Moshe Lewenstein (Bar Ilan University)
Asynchronous Pattern Matching - Address Level Errors Amihood Amir Bar Ilan University 2010.
S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University.
Error Detection and Reliable Transmission EECS 122: Lecture 24 Department of Electrical Engineering and Computer Sciences University of California Berkeley.
Survey: String Matching with k Mismatches Moshe Lewenstein Bar Ilan University.
Induction and recursion
Arithmetic.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
FINITE FIELDS 7/30 陳柏誠.
Great Theoretical Ideas in Computer Science.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
1 Channel Coding (II) Cyclic Codes and Convolutional Codes.
Analysis of Algorithms
FFT1 The Fast Fourier Transform. FFT2 Outline and Reading Polynomial Multiplication Problem Primitive Roots of Unity (§10.4.1) The Discrete Fourier Transform.
On The Connections Between Sorting Permutations By Interchanges and Generalized Swap Matching Joint work of: Amihood Amir, Gary Benson, Avivit Levy, Ely.
1 SNS COLLEGE OF ENGINEERING Department of Electronics and Communication Engineering Subject: Digital communication Sem: V Cyclic Codes.
Data Security and Encryption (CSE348) 1. Lecture # 12 2.
Linear Feedback Shift Register. 2 Linear Feedback Shift Registers (LFSRs) These are n-bit counters exhibiting pseudo-random behavior. Built from simple.
Great Theoretical Ideas in Computer Science.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
The Fast Fourier Transform and Applications to Multiplication
Fast Approximate Point Set Matching for Information Retrieval Raphaël Clifford and Benjamin Sach
Hashing Amihood Amir Bar Ilan University Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Information Security Lab. Dept. of Computer Engineering 87/121 PART I Symmetric Ciphers CHAPTER 4 Finite Fields 4.1 Groups, Rings, and Fields 4.2 Modular.
Information and Coding Theory Cyclic codes Juris Viksna, 2015.
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
Ravello, Settembre 2003Indexing Structures for Approximate String Matching Alessandra Gabriele Filippo Mignosi Antonio Restivo Marinella Sciortino.
Some Computation Problems in Coding Theory
Error Detection and Correction
Digital Communications I: Modulation and Coding Course Term Catharina Logothetis Lecture 9.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Applied Symbolic Computation1 Applied Symbolic Computation (CS 567) The Fast Fourier Transform (FFT) and Convolution Jeremy R. Johnson TexPoint fonts used.
Hardware Implementations of Finite Field Primitives
May 9, 2001Applied Symbolic Computation1 Applied Symbolic Computation (CS 680/480) Lecture 6: Multiplication, Interpolation, and the Chinese Remainder.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Rabin & Karp Algorithm. Rabin-Karp – the idea Compare a string's hash values, rather than the strings themselves. For efficiency, the hash value of the.
Amihood Amir, Gary Benson, Avivit Levy, Ely Porat, Uzi Vishne
Communication Networks: Technology & Protocols
Advanced Computer Networks
Polynomial + Fast Fourier Transform
Randomized Algorithms
Rabin & Karp Algorithm.
DFT and FFT By using the complex roots of unity, we can evaluate and interpolate a polynomial in O(n lg n) An example, here are the solutions to 8 =
Randomized Algorithms
Locality Sensitive Hashing
In Pattern Matching Convolutions: O(n log m) using FFT b0 b1 b2
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Presentation transcript:

Length Reduction in Binary Transforms Oren Kapah Ely Porat Amir Rothschild Amihood Amir Bar Ilan University and Johns Hopkins University

Motivation

Bar Ilan UniversityU. of Minnesota Error in Address: Error in Content: U. of MinnesotaBar Ilan University

Motivation: Architecture. Assume distributed memory. Our processor has text and requests pattern of length m. Pattern arrives in m asynchronous packets, of the form: Example:,,,, Pattern: BCBAA

Our Model… Text: T[0],T[1],…,T[n] Pattern: P[0]=, P[1]=, …, P[m]= ; P[i] є ∑, I[i] є {1, …,m}. Standard pattern Matching: no error in A. Asynchronous Pattern Matching: no error in C. Eventually: error in both.

Address Register log m bits “bad” bits What does “bad” mean? 1. bit “flips” its value. 2. bit sometimes flips its value. 3. Transient error.

We will now concentrate on consistent bit flips Example: Let ∑={a,b} T[0] T[1] T[2] T[3] a a b b P[0] P[1] P[2] P[3] b b a a

P[0] P[1] P[2] P[3] b b a a P[00] P[01] P[10] P[11] b b a a Example: BAD

P[0] P[1] P[2] P[3] b b a a P[00] P[01] P[10] P[11] a a b b Example: GOOD

P[0] P[1] P[2] P[3] b b a a P[00] P[01] P[10] P[11] a a b b Example: BEST

Naïve Algorithm For each of the 2 = m different bit combinations try matching. Choose match with minimum bits. Time: O(m ). 2 log m

Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A

Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A 3

Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A 3

Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A 5

Approximate Pattern Matching Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A 0

Approximate Pattern Matching Naïve Algorithm Time: O(nm) Hamming distance: For every location, write number of mismatches Text: A B B A B C B A A B C B A B B C Pattern: A B C B A 4

In Pattern Matching Polynomial Multiplication: b 0 b 1 b 2 Naïve Time: O(nm)

What do the Two Examples have in Common? What Really Happened? T[0] T[1] T[2] T[3] C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] Dot products array: P[0] P[1] P[2] P[3]

What Really Happened? T[0] T[1] T[2] T[3] C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]

What Really Happened? T[0] T[1] T[2] T[3] C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]

What Really Happened? T[0] T[1] T[2] T[3] C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]

What Really Happened? T[0] T[1] T[2] T[3] C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]

What Really Happened? T[0] T[1] T[2] T[3] C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]

What Really Happened? T[0] T[1] T[2] T[3] C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]

Another way of defining the transform: Where we define: P[x]=0 for x m.

FFT solution to the “shift” convolution: 1. Compute in time O(m log m) (values of X at roots of unity). 2. For polynomial multiplication compute values of product polynomial at roots of unity in time O(m log m). 3. Compute the coefficient of the product polynomial, again in time O(m log m).

A General Convolution C f Bijections ; j=1,….,O(m)

Consistent bit flip as a Convolution Construct a mask of length log m that has 0 in every bit except for the bad bits where it has a 1. Example: Assume the bad bits are in indices i,j,k є{0,…,log m}. Then the mask is i j k An exclusive OR between the mask and a pattern index Gives the target index.

Example: Mask: 0010 Index: Index:

Our Case: Denote our convolution by: Our convolution: For each of the 2 =m masks, let jє{0,1} log m

To compute min bit flip: Let T,P be over alphabet {0,1}: For each j, is a permutation of P. Thus, only the j ’s for which = number of 1 ‘s in T are valid flips. Since for them all 1’s match 1’s and all 0’s match 0’s. Choose valid j with minimum number of 1’s.

Time All convolutions can be computed in time O(m ) After preprocessing the permutation functions as tables. Can we do better? (As in the FFT, for example) 2

Idea – Divide and Conquer- Walsh Transform 1.Split T and P to the length m/2 arrays: 2.Compute 3.Use their values to compute in time O(m). Time: Recurrence: t(m)=2t(m/2)+m Closed Form: t(m)=O(m log m)

Sparse Transform Applications where most of the input is 0. The locations where there are “1”s are given as inputs We are only interested in the transform results for the locations where all pattern “1”s match text “1”s.

Motivation – Point Set Matching 1-D Point Set Matching: T: (t 1,t 2,…,t n ) P: (p 1,p 2,…,p m ) 2-D Point Set Matching – Searching in Music:

Notations: Length of text: N Length of Pattern: M Number of “1”s in text: n Number of “1”s in pattern: m.

Idea: Map text and pattern to small text and pattern: Hash function h

Idea: Do fast transform on the small text and pattern

Idea: Map results onto transform result of original text and pattern. h -1

Length Reduction in DFT Goal: Given two vectors V 1 &V 2, obtain two vectors V’ 1 &V’ 2 of size O(n’) such that all non-zero in V 1 and in V 2 will appear as singletons respectively while maintaining the distance property. The Distance Property: If V’ 2 [h(0)] is aligned with V’ 1 [h(i )], then V’ 2 [h(j)] is aligned with V’ 1 [h(f i (j))] = V’ 1 [f(i +j)]. Using the reduced size vectors, matching can be done in time O(n’ log n’) using the FFT algorithm.

Example: Length Reduction The vectors are given as sets of pairs: (index, value) V 1 : (0, 5), (6, 2), (13, 3), (19, 1) V 2 : (0, 2), (7, 3) Length Reduction Hash Function: mod(5) V’ 1 : V’ 2 :

The Randomized Algorithm of Cole & Hariharan [STOC 02] Idea: Find a set of log(n) short vectors, in which with high probability, each non-zero in V, appears as a singleton in at least one of the vectors. Hash functions: (ax mod(q))mod(s). Where q is a large prime number, and s is O(n). If s is c·n, then the probability of a non-zero appearing as a multiple is constant. Using log(n) different hash functions will reduce the failure probability exponentially.

Problem For the Walsh Transform, the mod function is useless. The distance property has to do with exclusive or, not addition!

IDEA Instead of the modulo function Do an exclusive or( ) of the index bits with a random bit string.

Example address Text Pattern Let’s do the Walsh Transform

Location address Text address Pattern 0 Dot Product

Location 001-XOR address Text address Pattern 0 Dot Product

Location 001-dot product address Text address Pattern 20 Dot Product

Location 010-XOR address Text address Pattern 20 Dot Product

Location 010-dot product address Text address Pattern 020 Dot Product

Location 011-XOR address Text address Pattern 020 Dot Product

Location 011-dot product address Text address Pattern 0020 Dot Product

Location 100-XOR address Text address Pattern 0020 Dot Product

Location 100-dot product address Text address Pattern Dot Product

Location 101-XOR address Text address Pattern Dot Product

Location 101-dot product address Text address Pattern Dot Product

Location 110-XOR address Text address Pattern Dot Product

Location 110-dot product address Text address Pattern Dot Product

Location 111-XOR address Text address Pattern Dot Product

Location 111-dot product address Text address Pattern Dot Product

The Length Reduction Reduce the length by half. Choose a mask of log n - 1 bits at random, add to it a MSB 1, and XOR it with each index in the second half. This will randomly hash all 1’s in the second half to the first half.

Length Reduction - Example address Text Pattern Let mask be address Text Pattern

Reduced Strings - Example address Text Pattern mask is address 1010 Text 0101 Pattern

Reduced Strings - Example Walsh Transform of reduced string: address 1010 Text 0101 Pattern 0 Walsh transform

Reduced Strings - Example Walsh Transform of reduced string: address 1010 Text 0101 Pattern 20 Walsh transform

Reduced Strings - Example Walsh Transform of reduced string: address 1010 Text 0101 Pattern 020 Walsh transform

Reduced Strings - Example Walsh Transform of reduced string: address 1010 Text 0101 Pattern 2020 Walsh transform Questions: 1. Does the distance property hold? 2. Which of these results is “legal”? 3. Where should it be mapped?

Answers: Distance property The Distance Property: If T[h(0)] is aligned with P[h(i )], then T[h(j)] is aligned with P[h(f i (j))] = P[f(i j)]. Holds because both h and f are XOR functions and because of the commutativity and associativity of XOR.

Answers: which are “legal”? address Text Pattern Let mask be address Text Pattern

Answers: which are “legal”? address Text Pattern mask is address 1010 Text 0101 Pattern Original tenants Johnny-come-latelies

Answers: which are “legal”? address Text Pattern mask is address m0s0 Text 0m0s Pattern Original tenants Johnny-come-latelies

Answers: which are “legal”? address Text Pattern mask is address m0s0 Text 0m0s Pattern Observation: Legal multiplications are: all s in text by s in pattern and m in text by m in pattern or all s in text by m in pattern and m in text by s in pattern.

Answers: which are “legal”? Observation: Legal multiplications are: 1. all s in text by s in pattern and m in text by m in pattern or 2. all s in text by m in pattern and m in text by s in pattern. This can be checked by a constant number of binary DWT’s with an added benefit: 1. means result stays. 2. means result is moved to its address XOR with the mask.

Reduced Strings - Example Walsh Transform of reduced string: address 1010 Text 0101 Pattern 20 Walsh transform Result in correct address

Reduced Strings - Example Walsh Transform of reduced string: address 1010 Text 0101 Pattern 2020 Walsh transform Result belongs in address = 110

Reminder – the dot product address Text address Pattern Dot Product

Analysis: We could continue this process recursively and analyze probability of clash of masked element with an element that is already there but… More elegant solution:

Polynomials over a finite field Consider indices as elements in F 2 L. In F 2 L : x+y = x y. Length Reduction: Every index in F 2 L is written as a polynomial in F 2 ℓ [X] of degree d = L/ℓ - 1.

Length reduction example: Index = 17 In binary = Take ℓ = The polynomial: 1·X ·X + 01·1= X 2 +1 Choose a value for X from F 2 ℓ at random and evaluate the polynomial.

Length reduction example: Recall that in F 2 ℓ : addition is exclusive or multiplication is polynomial multiplication modulo some irreducible polynomial. So, evaluating a polynomial at X gives a number with ℓ bits.

Probability of Collision: Probability of collision of index i and j = Probability that the chosen value of X is the root of the difference polynomial P i (x)-P j (x) Where P i, P j are the polynomials of index i and j, resp. Degree of difference polynomial = d So probability= d/2 ℓ.

Moral of the Story: Polynomials are a good candidate for locality preserving length reductions for discrete transforms.

The End