Presentation is loading. Please wait.

Presentation is loading. Please wait.

Piyush Kumar (Lecture 2 Div & Conquer)

Similar presentations


Presentation on theme: "Piyush Kumar (Lecture 2 Div & Conquer)"— Presentation transcript:

1 Piyush Kumar (Lecture 2 Div & Conquer)
Advanced Algorithms Piyush Kumar (Lecture 2 Div & Conquer) Welcome to COT5405 Source: Kevin Wayne, Harold Prokop.

2 Divide and Conquer Inversions Closest Point problems
Integer multiplication Matrix Multiplication Cache Aware algorithms/ Cache oblivious Algorithms Static Van Emde Boas Layout FFTs Streaming Algorithms

3 5.3 Counting Inversions

4 Counting Inversions A B C D E Inversions 3-2, 4-2 Me 1 2 3 4 5 You 1 3
Music site tries to match your song preferences with others. You rank n songs. Music site consults database to find people with similar tastes. Similarity metric: number of inversions between two rankings. My rank: 1, 2, …, n. Your rank: a1, a2, …, an. Songs i and j inverted if i < j, but ai > aj. Brute force: check all (n2) pairs i and j. Songs amazon.com, launch.com, restaurants, movies, . . . Note: there can be a quadratic number of inversions. Asymptotically faster algorithm must compute total number without even looking at each inversion individually. A B C D E Inversions 3-2, 4-2 Me 1 2 3 4 5 You 1 3 4 2 5

5 Applications. Voting theory. Collaborative filtering.
Measuring the "sortedness" of an array. Sensitivity analysis of Google's ranking function. Rank aggregation for meta-searching on the Web. Nonparametric statistics (e.g., Kendall's Tau distance).

6 Counting Inversions: Divide-and-Conquer
1 5 4 8 10 2 6 9 12 11 3 7

7 Counting Inversions: Divide-and-Conquer
Divide: separate list into two pieces. Divide: O(1). 1 5 4 8 10 2 6 9 12 11 3 7 1 5 4 8 10 2 6 9 12 11 3 7

8 Counting Inversions: Divide-and-Conquer
Divide: separate list into two pieces. Conquer: recursively count inversions in each half. Divide: O(1). 1 5 4 8 10 2 6 9 12 11 3 7 1 5 4 8 10 2 6 9 12 11 3 7 Conquer: 2T(n / 2) 5 blue-blue inversions 8 green-green inversions 5-4, 5-2, 4-2, 8-2, 10-2 6-3, 9-3, 9-7, 12-3, 12-7, 12-11, 11-3, 11-7

9 Counting Inversions: Divide-and-Conquer
Divide: separate list into two pieces. Conquer: recursively count inversions in each half. Combine: count inversions where ai and aj are in different halves, and return sum of three quantities. Divide: O(1). 1 5 4 8 10 2 6 9 12 11 3 7 1 5 4 8 10 2 6 9 12 11 3 7 Conquer: 2T(n / 2) 5 blue-blue inversions 8 green-green inversions 9 blue-green inversions 5-3, 4-3, 8-6, 8-3, 8-7, 10-6, 10-9, 10-3, 10-7 Combine: ??? Total = = 22.

10 Counting Inversions: Combine
Combine: count blue-green inversions Assume each half is sorted. Count inversions where ai and aj are in different halves. Merge two sorted halves into sorted whole. to maintain sorted invariant 3 7 10 14 18 19 2 11 16 17 23 25 6 3 2 2 13 blue-green inversions: Count: O(n) 2 3 7 10 11 14 16 17 18 19 23 25 Merge: O(n)

11 Counting Inversions: Implementation
Pre-condition. [Merge-and-Count] A and B are sorted. Post-condition. [Sort-and-Count] L is sorted. Sort-and-Count(L) { if list L has one element return 0 and the list L Divide the list into two halves A and B (rA, A)  Sort-and-Count(A) (rB, B)  Sort-and-Count(B) (rB, L)  Merge-and-Count(A, B) return r = rA + rB + r and the sorted list L }

12 Homework 1 #!/usr/bin/env python """ When this program is run, it outputs 1 Million random permutations of in the 'input.txt' file. import random L = list(str(x) for x in range(32)) fp = open("input.txt", "w") for i in range(10**6): random.shuffle(L) fp.write(",".join(L)+"\n")

13 Homework 1 On Linprog Implement an executable such that Usage:
It reads input.txt Pre-processes it for fast queries Given a new permutation q Finds the least inversion distance permutation from the input.txt file fast Usage: a.out < query.txt Outputs the line number of the closest permutation (if there are multiple at the same distance, outputs them all)

14 Size 3 Example input.txt query.txt a.out query.txt 1,2,3 2,3,1 2,3,1
3,2,1 1 Warning: My script generates random permutations of size 32.

15 5.4 Closest Pair of Points "Divide and conquer", Caesar's other famous quote "I came, I saw, I conquered" Divide-and-conquer idea dates back to Julius Caesar. Favorite war tactic was to divide an opposing army in two halves, and then assault one half with his entire force.

16 Closest Pair of Points Closest pair. Given n points in the plane, find a pair with smallest Euclidean distance between them. Fundamental geometric primitive. Graphics, computer vision, geographic information systems, molecular modeling, air traffic control. Special case of nearest neighbor, Euclidean MST, Voronoi. Brute force. Check all pairs of points p and q with (n2) comparisons. 1-D version. O(n log n) easy if points are on a line. Assumption. No two points have same x coordinate. fast closest pair inspired fast algorithms for these problems Foundation of then-fledgling field of computational geometry. "Shamos and Hoey (1970's) wanted to work out basic computational primitives in computational geometry. Surprisingly challenging to find an efficient algorithm. Shamos and Hoey asked whether it was possible to do better than quadratic. The algorithm we present is essentially their solution." to make presentation cleaner

17 Closest Pair of Points: First Attempt
Divide. Sub-divide region into 4 quadrants. L

18 Closest Pair of Points: First Attempt
Divide. Sub-divide region into 4 quadrants. Obstacle. Impossible to ensure n/4 points in each piece. L

19 Closest Pair of Points L Algorithm.
Divide: draw vertical line L so that roughly ½n points on each side. L

20 Closest Pair of Points L 21 12 Algorithm.
Divide: draw vertical line L so that roughly ½n points on each side. Conquer: find closest pair in each side recursively. L 21 12

21 Closest Pair of Points L 8 21 12 Algorithm.
Divide: draw vertical line L so that roughly ½n points on each side. Conquer: find closest pair in each side recursively. Combine: find closest pair with one point in each side. Return best of 3 solutions. seems like (n2) L 8 21 12

22 Closest Pair of Points Find closest pair with one point in each side, assuming that distance < . L 21  = min(12, 21) 12

23 Closest Pair of Points Find closest pair with one point in each side, assuming that distance < . Observation: only need to consider points within  of line L. L 21  = min(12, 21) 12

24 Closest Pair of Points Find closest pair with one point in each side, assuming that distance < . Observation: only need to consider points within  of line L. Sort points in 2-strip by their y coordinate. L 7 6 21 5 4  = min(12, 21) 12 3 2 1

25 Closest Pair of Points L 21  = min(12, 21) 12 
Find closest pair with one point in each side, assuming that distance < . Observation: only need to consider points within  of line L. Sort points in 2-strip by their y coordinate. Only check distances of those within 11 positions in sorted list! L 7 6 21 5 4  = min(12, 21) 12 3 2 1

26 Closest Pair of Points j ½ ½ ½ i
Def. Let si be the point in the 2-strip, with the ith smallest y-coordinate. Claim. If |i – j|  12, then the distance between si and sj is at least . Pf. No two points lie in same ½-by-½ box. Two points at least 2 rows apart have distance  2(½). ▪ Fact. Still true if we replace 12 with 7. j 39 31 ½ 2 rows 30 ½ 29 i ½ 28 can reduce neighbors to 6 Note: no points on median line (assumption that no points have same x coordinate ensures that median line can be drawn in this way) 27 26 25

27 Closest Pair Algorithm
Closest-Pair(p1, …, pn) { Compute separation line L such that half the points are on one side and half on the other side. 1 = Closest-Pair(left half) 2 = Closest-Pair(right half)  = min(1, 2) Delete all points further than  from separation line L Sort remaining points by y-coordinate. Scan points in y-order and compare distance between each point and next 11 neighbors. If any of these distances is less than , update . return . } O(n log n) 2T(n / 2) O(n) O(n log n) O(n) separation can be done in O(N) using linear time median algorithm

28 Closest Pair of Points: Analysis
Running time. Q. Can we achieve O(n log n)? A. Yes. Don't sort points in strip from scratch each time. Each recursive returns two lists: all points sorted by y coordinate, and all points sorted by x coordinate. Sort by merging two pre-sorted lists.

29 5.5 Integer Multiplication
"Divide and conquer", Caesar's other famous quote "I came, I saw, I conquered" Divide-and-conquer idea dates back to Julius Caesar. Favorite war tactic was to divide an opposing army in two halves, and then assault one half with his entire force.

30 Integer Arithmetic Add. Given two n-digit integers a and b, compute a + b. O(n) bit operations. Multiply. Given two n-digit integers a and b, compute a × b. Brute force solution: (n2) bit operations. 1 * Multiply 1 + Add

31 Divide-and-Conquer Multiplication: Warmup
To multiply two n-digit integers: Multiply four ½n-digit integers. Add two ½n-digit integers, and shift to obtain result. assumes n is a power of 2

32 Karatsuba Multiplication
To multiply two n-digit integers: Add two ½n digit integers. Multiply three ½n-digit integers. Add, subtract, and shift ½n-digit integers to obtain result. Theorem. [Karatsuba-Ofman, 1962] Can multiply two n-digit integers in O(n1.585) bit operations. A B A C C Karatsuba: also works for multiplying two degree N univariate polynomials

33 Karatsuba: Recursion Tree
T(n) n T(n/2) T(n/2) T(n/2) 3(n/2) T(n/4) T(n/4) T(n/4) T(n/4) T(n/4) T(n/4) T(n/4) T(n/4) T(n/4) 9(n/4) . . . . . . T(n / 2k) 3k (n / 2k) . . . . . . T(2) T(2) T(2) T(2) T(2) T(2) T(2) T(2) 3 lg n (2)

34 Matrix Multiplication
"Divide and conquer", Caesar's other famous quote "I came, I saw, I conquered" Divide-and-conquer idea dates back to Julius Caesar. Favorite war tactic was to divide an opposing army in two halves, and then assault one half with his entire force.

35 Matrix Multiplication (MM)
Courtesy Harold Prokop Matrix Multiplication (MM) = n k kj ik ij b a c 1 C B A = ×

36 Matrix Multiplication
Matrix multiplication. Given two n-by-n matrices A and B, compute C = AB. Brute force. (n3) arithmetic operations. Fundamental question. Can we improve upon brute force? among most basic problems in linear algebra and scientific computing

37

38

39 Matrix Multiplication: Warmup
Divide-and-conquer. Divide: partition A and B into ½n-by-½n blocks. Conquer: multiply 8 ½n-by-½n recursively. Combine: add appropriate products using 4 matrix additions. for scalars, probably not worth elimination one multiplication at expense of 14 extra additions (cost of add and multiply similar), but for matrices can make huge difference since standard multiplies are more expensive than additions

40 Matrix Multiplication: Key Idea
Key idea. multiply 2-by-2 block matrices with only 7 multiplications. 7 multiplications. 18 = additions (or subtractions). somewhat counterintuitive to be subtracting, especially if all original matrix entries are positive Winograd's variant. 7 multiplications, 15 additions, but less numerically stable than Strassen

41 Fast Matrix Multiplication
Fast matrix multiplication. (Strassen, 1969) Divide: partition A and B into ½n-by-½n blocks. Compute: 14 ½n-by-½n matrices via 10 matrix additions. Conquer: multiply 7 ½n-by-½n matrices recursively. Combine: 7 products into 4 terms using 8 matrix additions. Analysis. Assume n is a power of 2. T(n) = # arithmetic operations.

42 Fast Matrix Multiplication in Practice
Implementation issues. Sparsity. Caching effects. Numerical stability. Odd matrix dimensions. Crossover to classical algorithm around n = 128. Common misperception: "Strassen is only a theoretical curiosity." Advanced Computation Group at Apple Computer reports 8x speedup on G4 Velocity Engine when n ~ 2,500. Range of instances where it's useful is a subject of controversy. Remark. Can "Strassenize" Ax=b, determinant, eigenvalues, and other matrix ops. Parallelization can also be an issue. Not strongly numerically stable, but in practice behavior is better than theory.

43 Fast Matrix Multiplication in Theory
Q. Multiply two 2-by-2 matrices with only 7 scalar multiplications? A. Yes! [Strassen, 1969] Q. Multiply two 2-by-2 matrices with only 6 scalar multiplications? A. Impossible. [Hopcroft and Kerr, 1971] Q. Two 3-by-3 matrices with only 21 scalar multiplications? A. Also impossible. Q. Two 70-by-70 matrices with only 143,640 scalar multiplications? A. Yes! [Pan, 1980] Decimal wars. December, 1979: O(n ). January, 1980: O(n ). Hopcroft-Kerr bound applies to non-commutative case "Imagine excitement in January, 1980 when this was improved to"

44 Fast Matrix Multiplication in Theory
Best known. O(n2.376) [Coppersmith-Winograd, 1987.] Conjecture. O(n2+) for any  > 0. Caveat. Theoretical improvements to Strassen are progressively less practical.

45 Cache Cache - - Aware MM Aware MM [HK81] B - M ( A , B , C , n ) B - M
LOCK - M ULT ( A , B , C , n ) LOCK ULT 1 for i 1 to n / s 1 for i 1 to n / s 2 do for j 1 to n / s 2 do for j 1 to n / s 3 do for k 1 to n / s 3 do for k 1 to n / s 4 do O - M ( A , B , C , s ) 4 do O RD - M ULT ( A , B , C , s ) RD ULT ik kj ij ik kj ij s n s n [HK81]

46 Towards faster matrix multiplication… (Blocked version)
Advantages Exploit locality using blocking Do not assume that each access to memory is O(1) Can be extended to multiple levels of cache Usually the fastest algorithm after tuning. Disadvantages Needs tuning every time it runs on a new machine. Usually “s” is a voodoo parameter that is unknown.

47 Voodoo! ( ) Cache Cache - - Aware Matrix Multiplication
B - M ( A , B , C , n ) B LOCK - M ULT ( A , B , C , n ) LOCK ULT Voodoo! 1 for i 1 to n / s 1 for i 1 to n / s 2 do for j 1 to n / s 2 do for j 1 to n / s 3 do for k 1 to n / s 3 do for k 1 to n / s 4 do O - M ( A , B , C , s ) 4 do O RD - M ULT ( A , B , C , s ) RD ULT ik kj ij ik kj ij s n Tune s so that A ik , B kj , and C ij just fit into cache ? ( ) Z Q = ( ) . 3 2 Z L n s Q = If > , then Optimal [HK81] . © Harald Prokop 18 Oct 99 10

48

49 Cache Oblivious

50

51 Algorithms for External Memory

52 P M Computer ... ... P = Processor C = Cache M = Main Memory Secondary
Storage (Hard Disks)

53 typical capacity (bytes)
Storage Capacity offline tape nearline tape & optical disks 1015 1013 magnetic optical disks 1011 electronic secondary online tape typical capacity (bytes) 109 electronic main 107 from Gray & Reuter (2002) 105 cache 103 10-9 10-6 10-3 1 103 access time (sec)

54 Storage Cost from Gray & Reuter (2002) dollars/MB 10-9 10-6 10-3 1 103
104 cache electronic main online tape 102 electronic secondary magnetic optical disks nearline tape & optical disks dollars/MB 100 10-2 offline tape 10-4 10-9 10-6 10-3 1 103 access time (sec)

55 Disk Access Time Block X Request  Block in Memory Time = Seek Time +
Rotational Delay + Transfer Time + Other Typical Numbers (for random block access) Seek Time = 10ms Rot Delay = 4ms (7200rpm) transfer rate = 50MB/sec Other = CPU time to access delays+ Contention for controllers Contention for Bus Multiple copies of the same data

56 Rules for EM Algorithms
Sequential IO is cheap compared to Random IO 1kb block Seq : 1ms Random : 20ms The difference becomes smaller and smaller as the block size becomes larger.

57 The DAM Model Count the number of IOs.
Explicitly control which Blocks are in memory. Two level memory model. Notation: M = Size of memory. B = size of disk block. N = size of data. Question: How many block transfers for one scan of the data set?

58 Problem Mergesort Can we do better?
How many Block IOs does it need to sort N numbers (when the size of the data is extremely large compared to M) Can we do better?

59 External Memory Sorting
Penny Sort Competition 40GB , 433 million records, 1541 seconds on a 614$ Linux/AMD system

60 EM Sorting Two pass external memory merge sort. O(M) … k=O(M/B) … …
input: chunk 0 chunk 1 chunk 2 chunk 3 chunk 4 chunk 5 chunk 6 chunk 7 chunk 8 k=O(M/B) run 0 run 1 run 2 run n buffers B B B B k-merger B run 0123

61 The CO Memory Model Cache-oblivious memory model B = L , M = Z?
Reason about two-level, but prove results for unknown multilevel memory models Parameters B and M are unknown, thus optimized for all levels of memory hierarchy B = L , M = Z?

62 Matrix Transpose: DAM n CO
What is the number of blocks you need to move in a transpose of a large matrix? In DAM In CO

63 Static Searches Only for balanced binary trees
Assume there are no insertions and deletions Only searches Can we speed up such seaches?

64 What is a layout? Mapping of nodes of a tree to the Memory
Different kinds of layouts In-order Post-order Pre-order Van Emde Boas Main Idea : Store Recursive subtrees in contiguous memory

65 Example of Van Emde Boas

66 Another View

67 Theoretical Guarantees?
Cache Complexity Q(n) = O( ) Work Complexity W(n) = O(log n) From Prokop’s Thesis

68 In Practice??

69 In Practice II

70 In Practice! Matrix Operations by Morton Ordering , David S.Wise
(Cache oblivious Practical Matrix operation results) Bender, Duan, Wu (Cache oblivious dictionaries) Rahman, Cole, Raman (CO B-Trees)

71 Known Optimal Results Matrix Multiplication Matrix Transpose
n-point FFT LUP Decomposition Sorting Searching

72 Other Results Known Priority Q List Ranking Tree Algos
Directed BFS/DFS Undirected BFS MSF

73 Introduction to Streaming Algorithms

74 Brain Teaser Let P = { 1...n }. Let P’ = P \ {x}
x in P Paul shows Carole elements from P’ Carole can only use O(log n) bits memory to answer the question in the end. Now what about P’’?

75 Streaming Algorithms Data that computers are being asked to process is growing astronomically. Infeasible to store If I cant store the data I am looking at, how do I compute a summary of this data?

76 Another Brain Teaser Given a set of numbers in a large array.
In one pass, decide if some item is in majority (assume you only have constant size memory to work with). N = 12; item 9 is majority 9 3 4 6 7 2 Courtesy : Subhash Suri

77 Misra-Gries Algorithm (‘82)
A counter and an ID. If new item is same as stored ID, increment counter. Otherwise, decrement the counter. If counter 0, store new item with count = 1. If counter > 0, then its item is the only candidate for majority. 2 9 7 6 4 3 ID count 1

78 Data Stream Algorithms
Majority and Frequent are examples of data stream algorithms. Data arrives as an online sequence x1, x2, …, potentially infinite. Algorithm processes data in one pass (in given order) Algorithm’s memory is significantly smaller than input data Summarize the data: compute useful patterns

79 Streaming Data Sources
Internet traffic monitoring New Computer Graphics hardware Web logs and click streams Financial and stock market data Retail and credit card transactions Telecom calling records Sensor networks, surveillance RFID Instruction profiling in microprocessors Data warehouses (random access too expensive).

80 FFT

81 Fast Fourier Transform: Applications
Optics, acoustics, quantum physics, telecommunications, control systems, signal processing, speech recognition, data compression, image processing. DVD, JPEG, MP3, MRI, CAT scan. Numerical solutions to Poisson's equation. The FFT is one of the truly great computational developments of this [20th] century. It has changed the face of science and engineering so much that it is not an exaggeration to say that life as we know it would be very different without the FFT. -Charles van Loan Significance. Perhaps single algorithmic discovery that has had the greatest practical impact in history. Progress in these areas limited by lack of fast algorithms. Variants of FFT used for JPEG, e.g., fast sine/cosine transforms

82 Fast Fourier Transform: Brief History
Gauss (1805, 1866). Analyzed periodic motion of asteroid Ceres. Runge-König (1924). Laid theoretical groundwork. Danielson-Lanczos (1942). Efficient algorithm. Cooley-Tukey (1965). Monitoring nuclear tests in Soviet Union and tracking submarines. Rediscovered and popularized FFT. Importance not fully realized until advent of digital computers. Gauss' algorithm published posthumously in 1866

83 Polynomials: Coefficient Representation
Polynomial. [coefficient representation] Add: O(n) arithmetic operations. Evaluate: O(n) using Horner's method. Multiply (convolve): O(n2) using brute force. can also multiply in O(n^1.585) using Karatsuba polynomial multiplication

84 Polynomials: Point-Value Representation
Fundamental theorem of algebra. [Gauss, PhD thesis] A degree n polynomial with complex coefficients has n complex roots. Corollary. A degree n-1 polynomial A(x) is uniquely specified by its evaluation at n distinct values of x. x y xj yj = A(xj) deep theorem - proof requires analysis conjectures in 17th century, but not rigorously proved until Gauss' phd thesis established imaginary numbers as fundamental mathematical objects

85 Polynomials: Point-Value Representation
Polynomial. [point-value representation] Add: O(n) arithmetic operations. Multiply: O(n), but need 2n-1 points. Evaluate: O(n2) using Lagrange's formula. Lagrange's formula is numerically unstable

86 Converting Between Two Polynomial Representations
Tradeoff. Fast evaluation or fast multiplication. We want both! Goal. Make all ops fast by efficiently converting between two representations. Representation Multiply Evaluate Coefficient O(n2) O(n) Point-value O(n) O(n2) coefficient representation point-value representation

87 Converting Between Two Polynomial Representations: Brute Force
Coefficient to point-value. Given a polynomial a0 + a1 x an-1 xn-1, evaluate it at n distinct points x0, ... , xn-1. Point-value to coefficient. Given n distinct points x0, ..., xn-1 and values y0, ..., yn-1, find unique polynomial a0 + a1 x an-1 xn-1 that has given values at given points. O(n2) for matrix-vector multiply O(n3) for Gaussian elimination Vandermonde matrix is invertible iff xi distinct

88 Coefficient to Point-Value Representation: Intuition
Coefficient to point-value. Given a polynomial a0 + a1 x an-1 xn-1, evaluate it at n distinct points x0, ... , xn-1. Divide. Break polynomial up into even and odd powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Aeven(x) = a0 + a2x + a4x2 + a6x3. Aodd (x) = a1 + a3x + a5x2 + a7x3. A(-x) = Aeven(x2) + x Aodd(x2). A(-x) = Aeven(x2) - x Aodd(x2). Intuition. Choose two points to be 1. A(-1) = Aeven(1) + 1 Aodd(1). A(-1) = Aeven(1) - 1 Aodd(1). Can evaluate polynomial of degree  n at 2 points by evaluating two polynomials of degree  ½n at 1 point.

89 Coefficient to Point-Value Representation: Intuition
Coefficient to point-value. Given a polynomial a0 + a1 x an-1 xn-1, evaluate it at n distinct points x0, ... , xn-1. Divide. Break polynomial up into even and odd powers. A(x) = a0 + a1x + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + a7x7. Aeven(x) = a0 + a2x + a4x2 + a6x3. Aodd (x) = a1 + a3x + a5x2 + a7x3. A(-x) = Aeven(x2) + x Aodd(x2). A(-x) = Aeven(x2) - x Aodd(x2). Intuition. Choose four points to be 1, i. A(-1) = Aeven(-1) + 1 Aodd( 1). A(-1) = Aeven(-1) - 1 Aodd(-1). A(-i) = Aeven(-1) + i Aodd(-1). A(-i) = Aeven(-1) - i Aodd(-1). Can evaluate polynomial of degree  n at 4 points by evaluating two polynomials of degree  ½n at 2 points.

90 Discrete Fourier Transform
Coefficient to point-value. Given a polynomial a0 + a1 x an-1 xn-1, evaluate it at n distinct points x0, ... , xn-1. Key idea: choose xk = k where  is principal nth root of unity. Discrete Fourier transform Fourier matrix Fn

91 Roots of Unity Def. An nth root of unity is a complex number x such that xn = 1. Fact. The nth roots of unity are: 0, 1, …, n-1 where  = e 2 i / n. Pf. (k)n = (e 2 i k / n) n = (e  i ) 2k = (-1) 2k = 1. Fact. The ½nth roots of unity are: 0, 1, …, n/2-1 where  = e 4 i / n. Fact. 2 =  and (2)k = k. 2 = 1 = i 3 1 forms a group under multiplication (similar to additive group Z_n modulo n) 4 = 2 = -1 n = 8 0 = 0 = 1 5 7 6 = 3 = -i

92 Fast Fourier Transform
Goal. Evaluate a degree n-1 polynomial A(x) = a an-1 xn-1 at its nth roots of unity: 0, 1, …, n-1. Divide. Break polynomial up into even and odd powers. Aeven(x) = a0 + a2x + a4x2 + … + an/2-2 x(n-1)/2. Aodd (x) = a1 + a3x + a5x2 + … + an/2-1 x(n-1)/2. A(x) = Aeven(x2) + x Aodd(x2). Conquer. Evaluate degree Aeven(x) and Aodd(x) at the ½nth roots of unity: 0, 1, …, n/2-1. Combine. A(k+n) = Aeven(k) + k Aodd(k), 0  k < n/2 A(k+n/2) = Aeven(k) - k Aodd(k), 0  k < n/2 k+n/2 = -k k = (k)2 = (k+n/2)2

93 FFT Algorithm fft(n, a0,a1,…,an-1) { if (n == 1) return a0
(e0,e1,…,en/2-1)  FFT(n/2, a0,a2,a4,…,an-2) (d0,d1,…,dn/2-1)  FFT(n/2, a1,a3,a5,…,an-1) for k = 0 to n/2 - 1 { k  e2ik/n yk+n/2  ek + k dk yk+n/2  ek - k dk } return (y0,y1,…,yn-1)

94 FFT Summary Theorem. FFT algorithm evaluates a degree n-1 polynomial at each of the nth roots of unity in O(n log n) steps. Running time. T(2n) = 2T(n) + O(n)  T(n) = O(n log n). assumes n is a power of 2 O(n log n) coefficient representation point-value representation

95 Point-Value to Coefficient Representation: Inverse DFT
Goal. Given the values y0, ... , yn-1 of a degree n-1 polynomial at the n points 0, 1, …, n-1, find unique polynomial a0 + a1 x an-1 xn-1 that has given values at given points. Inverse DFT Fourier matrix inverse (Fn)-1

96 Inverse FFT Claim. Inverse of Fourier matrix is given by following formula. Consequence. To compute inverse FFT, apply same algorithm but use -1 = e -2 i / n as principal nth root of unity (and divide by n).

97 Inverse FFT: Proof of Correctness
Claim. Fn and Gn are inverses. Pf. Summation lemma. Let  be a principal nth root of unity. Then If k is a multiple of n then k = 1  sums to n. Each nth root of unity k is a root of xn - 1 = (x - 1) (1 + x + x xn-1). if k  1 we have: 1 + k + k(2) k(n-1) = 0  sums to 0. ▪ summation lemma

98 Inverse FFT: Algorithm
ifft(n, a0,a1,…,an-1) { if (n == 1) return a0 (e0,e1,…,en/2-1)  FFT(n/2, a0,a2,a4,…,an-2) (d0,d1,…,dn/2-1)  FFT(n/2, a1,a3,a5,…,an-1) for k = 0 to n/2 - 1 { k  e-2ik/n yk+n/2  (ek + k dk) / n yk+n/2  (ek - k dk) / n } return (y0,y1,…,yn-1)

99 Inverse FFT Summary Theorem. Inverse FFT algorithm interpolates a degree n-1 polynomial given values at each of the nth roots of unity in O(n log n) steps. assumes n is a power of 2 O(n log n) O(n log n) coefficient representation point-value representation

100 Polynomial Multiplication
Theorem. Can multiply two degree n-1 polynomials in O(n log n) steps. coefficient representation coefficient representation FFT O(n log n) inverse FFT O(n log n) point-value multiplication O(n)

101 FFT in Practice Fastest Fourier transform in the West. [Frigo and Johnson] Optimized C library. Features: DFT, DCT, real, complex, any size, any dimension. Won 1999 Wilkinson Prize for Numerical Software. Portable, competitive with vendor-tuned code. Implementation details. Instead of executing predetermined algorithm, it evaluates your hardware and uses a special-purpose compiler to generate an optimized algorithm catered to "shape" of the problem. Core algorithm is nonrecursive version of Cooley-Tukey radix 2 FFT. O(n log n), even for prime sizes. Reference:

102 More on FFT 0,4,2,6,1,5,3,7

103 FFT

104 Parallel FFT a y0 a y1 a y2 a y3 a y4 a y5 a y6 a y7

105 Integer Multiplication
Integer multiplication. Given two n bit integers a = an-1 … a1a0 and b = bn-1 … b1b0, compute their product c = a  b. Convolution algorithm. Form two polynomials. Note: a = A(2), b = B(2). Compute C(x) = A(x)  B(x). Evaluate C(2) = a  b. Running time: O(n log n) complex arithmetic steps. Theory. [Schönhage-Strassen 1971] O(n log n log log n) bit operations. Practice. [GNU Multiple Precision Arithmetic Library] GMP proclaims to be "the fastest bignum library on the planet." It uses brute force, Karatsuba, and FFT, depending on the size of n.


Download ppt "Piyush Kumar (Lecture 2 Div & Conquer)"

Similar presentations


Ads by Google