Fast Fourier Transform

Slides:



Advertisements
Similar presentations
Fast Walsh Transform Unexpected Distance One way to measure a sort of "correlation" between two Boolean functions is to compare their.
Advertisements

Parameterized Matching Amir, Farach, Muthukrishnan Orgad Keller Modified by Ariel Rosenfeld.
Space-for-Time Tradeoffs
String Searching Algorithm
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
6-1 String Matching Learning Outcomes Students are able to: Explain naïve, Rabin-Karp, Knuth-Morris- Pratt algorithms Analyse the complexity of these algorithms.
A Fast String Matching Algorithm The Boyer Moore Algorithm.
String Matching COMP171 Fall String matching 2 Pattern Matching * Given a text string T[0..n-1] and a pattern P[0..m-1], find all occurrences of.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp Adviser:
Quick Search Algorithm A very fast substring search algorithm, SUNDAY D.M., Communications of the ACM. 33(8),1990, pp Adviser: R. C. T. Lee Speaker:
Algorithms and Efficiency of Algorithms February 4th.
Pattern Matching in Weighted Sequences Oren Kapah Bar-Ilan University Joint Work With: Amihood Amir Costas S. Iliopoulos Ely Porat.
Orgad Keller Modified by Ariel Rosenfeld Less Than Matching.
String Matching with Mismatches Some slides are stolen from Moshe Lewenstein (Bar Ilan University)
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
Advisor: Prof. R. C. T. Lee Speaker: T. H. Ku
Semi-Numerical String Matching. All the methods we’ve seen so far have been based on comparisons. We propose alternative methods of computation such as:
MA/CSSE 473 Day 24 Student questions Quadratic probing proof
Length Reduction in Binary Transforms Oren Kapah Ely Porat Amir Rothschild Amihood Amir Bar Ilan University and Johns Hopkins University.
Boyer Moore Algorithm Idan Szpektor. Boyer and Moore.
Application: String Matching By Rong Ge COSC3100
Real time pattern matching Porat Benny Porat Ely Bar-Ilan University.
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off: b extra space in tables (breathing.
String Searching CSCI 2720 Spring 2007 Eileen Kraemer.
 Author: Ricardo A. Baeza-Yates, Gaston H. Gonnet  Publisher: 1992 Communications of the ACM  Presenter: Yuen-Shuo Li  Date: 2013/08/14 1.
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
Design and Analysis of Algorithms – Chapter 71 Space-Time Tradeoffs: String Matching Algorithms* Dr. Ying Lu RAIK 283: Data Structures.
1 Haim Kaplan, Uri Zwick Tel Aviv University March 2016 Last updated: March 2, 2016 Algorithms in Action.
Pattern Matching With Don’t Cares Clifford & Clifford’s Algorithm Orgad Keller.
1/39 COMP170 Tutorial 13: Pattern Matching T: P:.
Rabin & Karp Algorithm. Rabin-Karp – the idea Compare a string's hash values, rather than the strings themselves. For efficiency, the hash value of the.
CSG523/ Desain dan Analisis Algoritma
MA/CSSE 473 Day 26 Student questions Boyer-Moore B Trees.
String Matching (Chap. 32)
Haim Kaplan and Uri Zwick
13 Text Processing Hongfei Yan June 1, 2016.
String Processing.
Rabin & Karp Algorithm.
Fast Fourier Transform
March 2017 Last updated: March 15, 2017
CSCE350 Algorithms and Data Structure
Chapter 3 String Matching.
Space-for-time tradeoffs
Pattern Matching With Don’t Cares Clifford & Clifford’s Algorithm
Knuth-Morris-Pratt KMP algorithm. [over binary alphabet]
String matching.
Adviser: R. C. T. Lee Speaker: C. W. Cheng National Chi Nan University
Chapter 7 Space and Time Tradeoffs
Pattern Matching 12/8/ :21 PM Pattern Matching Pattern Matching
Pattern Matching 1/14/2019 8:30 AM Pattern Matching Pattern Matching.
Digital Encodings.
In Pattern Matching Convolutions: O(n log m) using FFT b0 b1 b2
Haim Kaplan, Uri Zwick March 2018
Space-for-time tradeoffs
CSE 589 Applied Algorithms Spring 1999
Pattern Matching 2/15/2019 6:17 PM Pattern Matching Pattern Matching.
Space-for-time tradeoffs
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Knuth-Morris-Pratt Algorithm.
Chap 3 String Matching 3 -.
String Processing.
Pattern Matching Pattern Matching 5/1/2019 3:53 PM Spring 2007
Space-for-time tradeoffs
Pattern Matching 4/27/2019 1:16 AM Pattern Matching Pattern Matching
Space-for-time tradeoffs
Sequences 5/17/ :43 AM Pattern Matching.
15-826: Multimedia Databases and Data Mining
MA/CSSE 473 Day 27 Student questions Leftovers from Boyer-Moore
Constants, Variables and Data Types
Presentation transcript:

Fast Fourier Transform Algorithms in Action Fast Fourier Transform Haim Kaplan, Uri Zwick Tel Aviv University March 2016 Last updated: March 28, 2017

String Matching abraabracadabracadabraabara abracadabra abracadabra Given a text of length 𝑛 and a pattern of length 𝑚, find all occurrences of the pattern in the text. The naïve algorithm runs in 𝑂 𝑚𝑛 time. Several classical algorithms run in 𝑂 𝑚+𝑛 time. [Knuth-Morris-Pratt (1977)] [Boyer-Moore (1977)]

More String Matching Problems abraabracadabracadabraabara abracadabra abracadabra Count the number of matches/mismatches in each alignment of the pattern with the text. (Find all aligments with at most 𝑘 mismatches.) Allow a wildcard (“don’t care”) (∗) that match any (single) symbol in the pattern and/or text. “Traditional” string matching techniques are not so efficient for these extensions.

(Cross-)Correlation 𝑥 0 𝑥 1 𝑥 2 𝑥 3 𝑦 0 𝑦 1 𝑦 2 𝑦 3 𝑧 −3 = 𝑥 0 𝑦 3 𝑧 −3 = 𝑥 0 𝑦 3 𝑧 −2 = 𝑥 0 𝑦 2 + 𝑥 1 𝑦 3 𝑧 −1 = 𝑥 0 𝑦 1 + 𝑥 1 𝑦 2 + 𝑥 2 𝑦 3 𝑧 0 = 𝑥 0 𝑦 0 + 𝑥 1 𝑦 1 + 𝑥 2 𝑦 2 + 𝑥 3 𝑦 3 𝑧 1 = 𝑥 1 𝑦 0 + 𝑥 2 𝑦 1 + 𝑥 3 𝑦 2 𝑧 2 = 𝑥 2 𝑦 0 + 𝑥 3 𝑦 1 𝑧 3 = 𝑥 3 𝑦 0

(Cross-)Correlation 𝑧 𝑘 = 𝑖 𝑥 𝑖 𝑦 𝑖−𝑘 = 𝑗 𝑥 𝑗+𝑘 𝑦 𝑗 = 𝐱∗ 𝐲 𝑅 𝑘+𝑛−1 A convolution without the initial reversal, with a shift of indices. 𝑧 𝑘 = 𝑖 𝑥 𝑖 𝑦 𝑖−𝑘 = 𝑗 𝑥 𝑗+𝑘 𝑦 𝑗 = 𝐱∗ 𝐲 𝑅 𝑘+𝑛−1 𝑘=−(𝑛−1),…,𝑛−1. The correlation of two vectors of length 𝑛 can be computed in 𝑂 𝑛 log 𝑛 time.

(Cross-)Correlation (unequal lengths) 𝑥 0 𝑥 1 𝑥 2 𝑥 3 𝑥 4 𝑥 5 𝑦 0 𝑦 1 𝑦 2 𝑦 3 𝑧 −3 = 𝑥 0 𝑦 3

(Cross-)Correlation 𝑥 0 𝑥 1 𝑥 2 𝑥 3 𝑥 4 𝑥 5 𝑦 0 𝑦 1 𝑦 2 𝑦 3 𝑧 −3 = 𝑥 0 𝑦 3 𝑧 −2 = 𝑥 0 𝑦 2 + 𝑥 1 𝑦 3

(Cross-)Correlation 𝑥 0 𝑥 1 𝑥 2 𝑥 3 𝑥 4 𝑥 5 𝑦 0 𝑦 1 𝑦 2 𝑦 3 𝑧 −3 = 𝑥 0 𝑦 3 𝑧 −2 = 𝑥 0 𝑦 2 + 𝑥 1 𝑦 3 𝑧 −1 = 𝑥 0 𝑦 1 + 𝑥 1 𝑦 2 + 𝑥 2 𝑦 3

(Cross-)Correlation 𝑥 0 𝑥 1 𝑥 2 𝑥 3 𝑥 4 𝑥 5 𝑦 0 𝑦 1 𝑦 2 𝑦 3 𝑧 −3 = 𝑥 0 𝑦 3 𝑧 −2 = 𝑥 0 𝑦 2 + 𝑥 1 𝑦 3 𝑧 −1 = 𝑥 0 𝑦 1 + 𝑥 1 𝑦 2 + 𝑥 2 𝑦 3 𝑧 0 = 𝑥 0 𝑦 0 + 𝑥 1 𝑦 1 + 𝑥 2 𝑦 2 + 𝑥 3 𝑦 3

(Cross-)Correlation 𝑥 0 𝑥 1 𝑥 2 𝑥 3 𝑥 4 𝑥 5 𝑦 0 𝑦 1 𝑦 2 𝑦 3 𝑧 −3 = 𝑥 0 𝑦 3 𝑧 −2 = 𝑥 0 𝑦 2 + 𝑥 1 𝑦 3 𝑧 −1 = 𝑥 0 𝑦 1 + 𝑥 1 𝑦 2 + 𝑥 2 𝑦 3 𝑧 0 = 𝑥 0 𝑦 0 + 𝑥 1 𝑦 1 + 𝑥 2 𝑦 2 + 𝑥 3 𝑦 3 𝑧 1 = 𝑥 1 𝑦 0 + 𝑥 2 𝑦 1 + 𝑥 3 𝑦 2 + 𝑥 4 𝑦 3

(Cross-)Correlation 𝑥 0 𝑥 1 𝑥 2 𝑥 3 𝑥 4 𝑥 5 𝑦 0 𝑦 1 𝑦 2 𝑦 3 𝑧 −3 = 𝑥 0 𝑦 3 𝑧 −2 = 𝑥 0 𝑦 2 + 𝑥 1 𝑦 3 𝑧 −1 = 𝑥 0 𝑦 1 + 𝑥 1 𝑦 2 + 𝑥 2 𝑦 3 𝑧 0 = 𝑥 0 𝑦 0 + 𝑥 1 𝑦 1 + 𝑥 2 𝑦 2 + 𝑥 3 𝑦 3 𝑧 1 = 𝑥 1 𝑦 0 + 𝑥 2 𝑦 1 + 𝑥 3 𝑦 2 + 𝑥 4 𝑦 3 𝑧 2 = 𝑥 2 𝑦 0 + 𝑥 3 𝑦 1 + 𝑥 4 𝑦 2 + 𝑥 5 𝑦 3 𝑧 3 = 𝑥 3 𝑦 0 + 𝑥 4 𝑦 1 + 𝑥 5 𝑦 2

(Cross-)Correlation 𝑥 0 𝑥 1 𝑥 2 𝑥 3 𝑥 4 𝑥 5 𝑦 0 𝑦 1 𝑦 2 𝑦 3 𝑧 −3 = 𝑥 0 𝑦 3 𝑧 −2 = 𝑥 0 𝑦 2 + 𝑥 1 𝑦 3 𝑧 −1 = 𝑥 0 𝑦 1 + 𝑥 1 𝑦 2 + 𝑥 2 𝑦 3 𝑧 0 = 𝑥 0 𝑦 0 + 𝑥 1 𝑦 1 + 𝑥 2 𝑦 2 + 𝑥 3 𝑦 3 𝑧 1 = 𝑥 1 𝑦 0 + 𝑥 2 𝑦 1 + 𝑥 3 𝑦 2 + 𝑥 4 𝑦 3 𝑧 2 = 𝑥 2 𝑦 0 + 𝑥 3 𝑦 1 + 𝑥 4 𝑦 2 + 𝑥 5 𝑦 3 𝑧 3 = 𝑥 3 𝑦 0 + 𝑥 4 𝑦 1 + 𝑥 5 𝑦 2 𝑧 4 = 𝑥 4 𝑦 0 + 𝑥 5 𝑦 1

(Cross-)Correlation 𝑥 0 𝑥 1 𝑥 2 𝑥 3 𝑥 4 𝑥 5 𝑦 0 𝑦 1 𝑦 2 𝑦 3 𝑧 −3 = 𝑥 0 𝑦 3 𝑧 −2 = 𝑥 0 𝑦 2 + 𝑥 1 𝑦 3 𝑧 −1 = 𝑥 0 𝑦 1 + 𝑥 1 𝑦 2 + 𝑥 2 𝑦 3 𝑧 0 = 𝑥 0 𝑦 0 + 𝑥 1 𝑦 1 + 𝑥 2 𝑦 2 + 𝑥 3 𝑦 3 𝑧 1 = 𝑥 1 𝑦 0 + 𝑥 2 𝑦 1 + 𝑥 3 𝑦 2 + 𝑥 4 𝑦 3 𝑧 2 = 𝑥 2 𝑦 0 + 𝑥 3 𝑦 1 + 𝑥 4 𝑦 2 + 𝑥 5 𝑦 3 𝑧 3 = 𝑥 3 𝑦 0 + 𝑥 4 𝑦 1 + 𝑥 5 𝑦 2 𝑧 4 = 𝑥 4 𝑦 0 + 𝑥 5 𝑦 1 𝑧 5 = 𝑥 5 𝑦 0

(Cross-)Correlation 𝑧 𝑘 = 𝑖 𝑥 𝑖 𝑦 𝑖−𝑘 = 𝑗 𝑥 𝑗+𝑘 𝑦 𝑗 = 𝐱∗ 𝐲 𝑅 𝑘+𝑚−1 𝑧 𝑘 = 𝑖 𝑥 𝑖 𝑦 𝑖−𝑘 = 𝑗 𝑥 𝑗+𝑘 𝑦 𝑗 = 𝐱∗ 𝐲 𝑅 𝑘+𝑚−1 If 𝐱 is of length 𝑛 and 𝐲 of length 𝑚, where 𝑚≤𝑛, then 𝑘=−(𝑚−1),…,𝑛−1. Sometimes, only the values 𝑘=0,…,𝑛−𝑚, corresponding to a full overlap of 𝐱 with a shift of 𝐲, are of interest. Exercise: The correlation of two vectors of length 𝑛 and 𝑚, where 𝑚≤𝑛, can be computed in 𝑂 𝑛 log 𝑚 time.

Counting mismatches [Fischer-Paterson (1974)] Let Σ be the alphabet of the pattern and text. We may assume that Σ ≤𝑚+1. (Why?) For every 𝑎∈Σ create two Boolean strings: 𝑃 𝑎 𝑗 =1 iff 𝑃 𝑗 =𝑎 𝑇 𝑎 𝑖 =1 iff 𝑇 𝑖 ≠𝑎 Correlation of 𝑃 𝑎 and 𝑇 𝑎 counts mismatches involving 𝑎.

abraabracadabracadabraabara Counting mismatches abraabracadabracadabraabara abracadabra 011001101010110101011001010 10010101001

Counting mismatches abraabracadabracadabraabara abracadabra 011001101010110101011001010 10010101001 abraabracadabracadabraabara abracadabra 011001101010110101011001010 10010101001

Counting mismatches Let Σ be the alphabet of the pattern and text. We may assume that Σ ≤𝑚+1. (Why?) For every 𝑎∈Σ create two Boolean strings: 𝑃 𝑎 𝑗 =1 iff 𝑃 𝑗 =𝑎 𝑇 𝑎 𝑖 =1 iff 𝑇 𝑖 ≠𝑎 Correlation of 𝑃 𝑎 and 𝑇 𝑎 counts mismatches involving 𝑎. Summing over all 𝑎∈Σ we get the total no. of mismatches. Complexity: 𝑂( Σ 𝑛 log 𝑚 ) word operations. (Each word assumed to hold Θ log 𝑛 bits.) Fast only if Σ is small.

Counting mismatches with wildcards [Fischer-Paterson (1974)] For every 𝑎∈Σ create two Boolean strings: 𝑃 𝑎 𝑗 =1 iff 𝑃 𝑗 =𝑎 𝑇 𝑎 𝑖 =1 iff 𝑇 𝑖 ≠𝑎 and 𝑇 𝑖 ≠ ∗ Complexity: 𝑂( Σ 𝑛 log 𝑚 ) word operations.

Counting mismatches with wildcards abraabraca*abracadabraabara abracada*ra 011001101000110101011001010 10010101001 abraabra*adabracadabraabara abracada*ra 011001100010110101011001010 10010101001

Counting mismatches with wildcards If we only want to find exact matches, replace each character 𝑎∈Σ by a specific log 2 |Σ| bit string

Counting mismatches with wildcards b r ∗ c 001 010 011 ∗∗∗ 100 Count mismatches of the binary strings as before (2 convolutions) A result of 0 corresponds to a match Complexity drops to 𝑂( log Σ 𝑛 log 𝑚 ). Can we get rid of the dependence on |Σ| ?

𝐿 2 -matching [Lipsky-Porat (2011)] Standard string matching uses the Hamming distance. Two characters either match or they do not. 𝑎 is not closer to 𝑏 than to 𝑧. Suppose that each “character” is a real number. We want to find approximate matches. For each 𝑘=0,1,…,𝑛−𝑚 we want to compute 𝑑 𝑘 = 𝑗=0 𝑚−1 𝑝 𝑗 − 𝑡 𝑘+𝑗 2 𝐿 2 -distance: 𝐱−𝐲 2 = 𝑗=0 𝑚−1 𝑥 𝑗 − 𝑦 𝑗 2

𝐿 2 -matching can be computed in 𝑂(𝑛 log 𝑚 ) time. [Lipsky-Porat (2011)] 𝑗=0 𝑚−1 𝑝 𝑗 − 𝑡 𝑘+𝑗 2 = 𝑗=0 𝑚−1 𝑝 𝑗 2 −2 𝑗=0 𝑚−1 𝑝 𝑗 𝑡 𝑘+𝑗 + 𝑗=0 𝑚−1 𝑡 𝑘+𝑗 2 Constant. 𝑂(𝑚) time. Correlation. 𝑂 𝑛 log 𝑚 time. Easy in 𝑂 𝑛 time. 𝐿 2 -matching can be computed in 𝑂(𝑛 log 𝑚 ) time.

Exact matches with wildcards [Clifford-Clifford (2007)] Replace each character by a positive integer. Replace the wildcard by 0. For each 𝑘=0,1,…,𝑛−𝑚 compute 𝑑 𝑘 = 𝑗=0 𝑚−1 𝑝 𝑗 𝑡 𝑘+𝑗 𝑝 𝑗 − 𝑡 𝑘+𝑗 2 There is an exact match at position 𝑘 iff 𝑑 𝑘 =0.

Exact matches with wildcards [Clifford-Clifford (2007)] 𝑑 𝑘 = 𝑗=0 𝑚−1 𝑝 𝑗 𝑡 𝑘+𝑗 𝑝 𝑗 − 𝑡 𝑘+𝑗 2 = 𝑗=0 𝑚−1 𝑝 𝑗 3 𝑡 𝑘+𝑗 −2 𝑗=0 𝑚−1 𝑝 𝑗 2 𝑡 𝑘+𝑗 2 + 𝑗=0 𝑚−1 𝑝 𝑗 𝑡 𝑘+𝑗 3 Compute three correlations of appropriate sequences in 𝑂 𝑛 log 𝑚 time. Running time is independent of |Σ| ! Assuming that each character fits in an Θ log 𝑛 -bit word and that operations on such words takes constant time.