Download presentation
Presentation is loading. Please wait.
1
Fast Fourier Transform
Algorithms in Action Fast Fourier Transform Haim Kaplan, Uri Zwick Tel Aviv University March 2016 Last updated: March 28, 2017
2
String Matching abraabracadabracadabraabara abracadabra abracadabra
Given a text of length π and a pattern of length π, find all occurrences of the pattern in the text. The naΓ―ve algorithm runs in π ππ time. Several classical algorithms run in π π+π time. [Knuth-Morris-Pratt (1977)] [Boyer-Moore (1977)]
3
More String Matching Problems
abraabracadabracadabraabara abracadabra abracadabra Count the number of matches/mismatches in each alignment of the pattern with the text. (Find all aligments with at most π mismatches.) Allow a wildcard (βdonβt careβ) (β) that match any (single) symbol in the pattern and/or text. βTraditionalβ string matching techniques are not so efficient for these extensions.
4
(Cross-)Correlation π₯ 0 π₯ 1 π₯ 2 π₯ 3 π¦ 0 π¦ 1 π¦ 2 π¦ 3 π§ β3 = π₯ 0 π¦ 3
π§ β3 = π₯ 0 π¦ 3 π§ β2 = π₯ 0 π¦ 2 + π₯ 1 π¦ 3 π§ β1 = π₯ 0 π¦ 1 + π₯ 1 π¦ 2 + π₯ 2 π¦ 3 π§ 0 = π₯ 0 π¦ 0 + π₯ 1 π¦ 1 + π₯ 2 π¦ 2 + π₯ 3 π¦ 3 π§ 1 = π₯ 1 π¦ 0 + π₯ 2 π¦ 1 + π₯ 3 π¦ 2 π§ 2 = π₯ 2 π¦ 0 + π₯ 3 π¦ 1 π§ 3 = π₯ 3 π¦ 0
5
(Cross-)Correlation π§ π = π π₯ π π¦ πβπ = π π₯ π+π π¦ π = π±β π² π
π+πβ1
A convolution without the initial reversal, with a shift of indices. π§ π = π π₯ π π¦ πβπ = π π₯ π+π π¦ π = π±β π² π
π+πβ1 π=β(πβ1),β¦,πβ1. The correlation of two vectors of length π can be computed in π π log π time.
6
(Cross-)Correlation (unequal lengths)
π₯ 0 π₯ 1 π₯ 2 π₯ 3 π₯ 4 π₯ 5 π¦ 0 π¦ 1 π¦ 2 π¦ 3 π§ β3 = π₯ 0 π¦ 3
7
(Cross-)Correlation π₯ 0 π₯ 1 π₯ 2 π₯ 3 π₯ 4 π₯ 5 π¦ 0 π¦ 1 π¦ 2 π¦ 3
π§ β3 = π₯ 0 π¦ 3 π§ β2 = π₯ 0 π¦ 2 + π₯ 1 π¦ 3
8
(Cross-)Correlation π₯ 0 π₯ 1 π₯ 2 π₯ 3 π₯ 4 π₯ 5 π¦ 0 π¦ 1 π¦ 2 π¦ 3
π§ β3 = π₯ 0 π¦ 3 π§ β2 = π₯ 0 π¦ 2 + π₯ 1 π¦ 3 π§ β1 = π₯ 0 π¦ 1 + π₯ 1 π¦ 2 + π₯ 2 π¦ 3
9
(Cross-)Correlation π₯ 0 π₯ 1 π₯ 2 π₯ 3 π₯ 4 π₯ 5 π¦ 0 π¦ 1 π¦ 2 π¦ 3
π§ β3 = π₯ 0 π¦ 3 π§ β2 = π₯ 0 π¦ 2 + π₯ 1 π¦ 3 π§ β1 = π₯ 0 π¦ 1 + π₯ 1 π¦ 2 + π₯ 2 π¦ 3 π§ 0 = π₯ 0 π¦ 0 + π₯ 1 π¦ 1 + π₯ 2 π¦ 2 + π₯ 3 π¦ 3
10
(Cross-)Correlation π₯ 0 π₯ 1 π₯ 2 π₯ 3 π₯ 4 π₯ 5 π¦ 0 π¦ 1 π¦ 2 π¦ 3
π§ β3 = π₯ 0 π¦ 3 π§ β2 = π₯ 0 π¦ 2 + π₯ 1 π¦ 3 π§ β1 = π₯ 0 π¦ 1 + π₯ 1 π¦ 2 + π₯ 2 π¦ 3 π§ 0 = π₯ 0 π¦ 0 + π₯ 1 π¦ 1 + π₯ 2 π¦ 2 + π₯ 3 π¦ 3 π§ 1 = π₯ 1 π¦ 0 + π₯ 2 π¦ 1 + π₯ 3 π¦ 2 + π₯ 4 π¦ 3
11
(Cross-)Correlation π₯ 0 π₯ 1 π₯ 2 π₯ 3 π₯ 4 π₯ 5 π¦ 0 π¦ 1 π¦ 2 π¦ 3
π§ β3 = π₯ 0 π¦ 3 π§ β2 = π₯ 0 π¦ 2 + π₯ 1 π¦ 3 π§ β1 = π₯ 0 π¦ 1 + π₯ 1 π¦ 2 + π₯ 2 π¦ 3 π§ 0 = π₯ 0 π¦ 0 + π₯ 1 π¦ 1 + π₯ 2 π¦ 2 + π₯ 3 π¦ 3 π§ 1 = π₯ 1 π¦ 0 + π₯ 2 π¦ 1 + π₯ 3 π¦ 2 + π₯ 4 π¦ 3 π§ 2 = π₯ 2 π¦ 0 + π₯ 3 π¦ 1 + π₯ 4 π¦ 2 + π₯ 5 π¦ 3 π§ 3 = π₯ 3 π¦ 0 + π₯ 4 π¦ 1 + π₯ 5 π¦ 2
12
(Cross-)Correlation π₯ 0 π₯ 1 π₯ 2 π₯ 3 π₯ 4 π₯ 5 π¦ 0 π¦ 1 π¦ 2 π¦ 3
π§ β3 = π₯ 0 π¦ 3 π§ β2 = π₯ 0 π¦ 2 + π₯ 1 π¦ 3 π§ β1 = π₯ 0 π¦ 1 + π₯ 1 π¦ 2 + π₯ 2 π¦ 3 π§ 0 = π₯ 0 π¦ 0 + π₯ 1 π¦ 1 + π₯ 2 π¦ 2 + π₯ 3 π¦ 3 π§ 1 = π₯ 1 π¦ 0 + π₯ 2 π¦ 1 + π₯ 3 π¦ 2 + π₯ 4 π¦ 3 π§ 2 = π₯ 2 π¦ 0 + π₯ 3 π¦ 1 + π₯ 4 π¦ 2 + π₯ 5 π¦ 3 π§ 3 = π₯ 3 π¦ 0 + π₯ 4 π¦ 1 + π₯ 5 π¦ 2 π§ 4 = π₯ 4 π¦ 0 + π₯ 5 π¦ 1
13
(Cross-)Correlation π₯ 0 π₯ 1 π₯ 2 π₯ 3 π₯ 4 π₯ 5 π¦ 0 π¦ 1 π¦ 2 π¦ 3
π§ β3 = π₯ 0 π¦ 3 π§ β2 = π₯ 0 π¦ 2 + π₯ 1 π¦ 3 π§ β1 = π₯ 0 π¦ 1 + π₯ 1 π¦ 2 + π₯ 2 π¦ 3 π§ 0 = π₯ 0 π¦ 0 + π₯ 1 π¦ 1 + π₯ 2 π¦ 2 + π₯ 3 π¦ 3 π§ 1 = π₯ 1 π¦ 0 + π₯ 2 π¦ 1 + π₯ 3 π¦ 2 + π₯ 4 π¦ 3 π§ 2 = π₯ 2 π¦ 0 + π₯ 3 π¦ 1 + π₯ 4 π¦ 2 + π₯ 5 π¦ 3 π§ 3 = π₯ 3 π¦ 0 + π₯ 4 π¦ 1 + π₯ 5 π¦ 2 π§ 4 = π₯ 4 π¦ 0 + π₯ 5 π¦ 1 π§ 5 = π₯ 5 π¦ 0
14
(Cross-)Correlation π§ π = π π₯ π π¦ πβπ = π π₯ π+π π¦ π = π±β π² π
π+πβ1
π§ π = π π₯ π π¦ πβπ = π π₯ π+π π¦ π = π±β π² π
π+πβ1 If π± is of length π and π² of length π, where πβ€π, then π=β(πβ1),β¦,πβ1. Sometimes, only the values π=0,β¦,πβπ, corresponding to a full overlap of π± with a shift of π², are of interest. Exercise: The correlation of two vectors of length π and π, where πβ€π, can be computed in π π log π time.
15
Counting mismatches [Fischer-Paterson (1974)]
Let Ξ£ be the alphabet of the pattern and text. We may assume that Ξ£ β€π+1. (Why?) For every πβΞ£ create two Boolean strings: π π π =1 iff π π =π π π π =1 iff π π β π Correlation of π π and π π counts mismatches involving π.
16
abraabracadabracadabraabara
Counting mismatches abraabracadabracadabraabara abracadabra
17
Counting mismatches abraabracadabracadabraabara abracadabra
abraabracadabracadabraabara abracadabra
18
Counting mismatches Let Ξ£ be the alphabet of the pattern and text.
We may assume that Ξ£ β€π+1. (Why?) For every πβΞ£ create two Boolean strings: π π π =1 iff π π =π π π π =1 iff π π β π Correlation of π π and π π counts mismatches involving π. Summing over all πβΞ£ we get the total no. of mismatches. Complexity: π( Ξ£ π log π ) word operations. (Each word assumed to hold Ξ log π bits.) Fast only if Ξ£ is small.
19
Counting mismatches with wildcards [Fischer-Paterson (1974)]
For every πβΞ£ create two Boolean strings: π π π =1 iff π π =π π π π =1 iff π π β π and π π β β Complexity: π( Ξ£ π log π ) word operations.
20
Counting mismatches with wildcards
abraabraca*abracadabraabara abracada*ra abraabra*adabracadabraabara abracada*ra
21
Counting mismatches with wildcards
If we only want to find exact matches, replace each character πβΞ£ by a specific log 2 |Ξ£| bit string
22
Counting mismatches with wildcards
b r β c 001 010 011 βββ 100 Count mismatches of the binary strings as before (2 convolutions) A result of 0 corresponds to a match Complexity drops to π( log Ξ£ π log π ). Can we get rid of the dependence on |Ξ£| ?
23
πΏ 2 -matching [Lipsky-Porat (2011)]
Standard string matching uses the Hamming distance. Two characters either match or they do not. π is not closer to π than to π§. Suppose that each βcharacterβ is a real number. We want to find approximate matches. For each π=0,1,β¦,πβπ we want to compute π π = π=0 πβ1 π π β π‘ π+π 2 πΏ 2 -distance: π±βπ² 2 = π=0 πβ1 π₯ π β π¦ π 2
24
πΏ 2 -matching can be computed in π(π log π ) time.
[Lipsky-Porat (2011)] π=0 πβ1 π π β π‘ π+π 2 = π=0 πβ1 π π 2 β2 π=0 πβ1 π π π‘ π+π + π=0 πβ1 π‘ π+π 2 Constant. π(π) time. Correlation. π π log π time. Easy in π π time. πΏ 2 -matching can be computed in π(π log π ) time.
25
Exact matches with wildcards
[Clifford-Clifford (2007)] Replace each character by a positive integer. Replace the wildcard by 0. For each π=0,1,β¦,πβπ compute π π = π=0 πβ1 π π π‘ π+π π π β π‘ π+π 2 There is an exact match at position π iff π π =0.
26
Exact matches with wildcards
[Clifford-Clifford (2007)] π π = π=0 πβ1 π π π‘ π+π π π β π‘ π+π 2 = π=0 πβ1 π π 3 π‘ π+π β2 π=0 πβ1 π π 2 π‘ π+π 2 + π=0 πβ1 π π π‘ π+π 3 Compute three correlations of appropriate sequences in π π log π time. Running time is independent of |Ξ£| ! Assuming that each character fits in an Ξ log π -bit word and that operations on such words takes constant time.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.