Advanced Algorithms Piyush Kumar (Lecture 11: Div & Conquer) Welcome to COT5405 Source: Kevin Wayne, Harold Prokop.

Slides:



Advertisements
Similar presentations
The Study of Cache Oblivious Algorithms Prepared by Jia Guo.
Advertisements

Advanced Algorithms Piyush Kumar (Lecture 12: Parallel Algorithms) Welcome to COT5405 Courtesy Baker 05.
Divide-and-Conquer The most-well known algorithm design strategy:
Advanced Algorithms Piyush Kumar (Lecture 11: Div & Conquer) Welcome to COT5405 Source: Kevin Wayne, Harold Prokop.
DIVIDE AND CONQUER. 2 Algorithmic Paradigms Greedy. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer.
Dr. Deshi Ye Divide-and-conquer Dr. Deshi Ye
1 Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 5 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.
1 Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances.
Lectures on Recursive Algorithms1 COMP 523: Advanced Algorithmic Techniques Lecturer: Dariusz Kowalski.
Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2013 Lecture 11.
1 Divide-and-Conquer Divide-and-conquer. n Break up problem into several parts. n Solve each part recursively. n Combine solutions to sub-problems into.
Algorithm Design Strategy Divide and Conquer. More examples of Divide and Conquer  Review of Divide & Conquer Concept  More examples  Finding closest.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
Chapter 4 Divide-and-Conquer Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Princeton University COS 423 Theory of Algorithms Spring 2002 Kevin Wayne Fast Fourier Transform Jean Baptiste Joseph Fourier ( ) These lecture.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4)
Closest Pair of Points Computational Geometry, WS 2006/07 Lecture 9, Part II Prof. Dr. Thomas Ottmann Khaireel A. Mohamed Algorithmen & Datenstrukturen,
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 421 Algorithms Richard Anderson Lecture 13 Divide and Conquer.
Introduction to Algorithms
Analysis of Algorithms COMP171 Fall Analysis of Algorithms / Slide 2 Introduction * What is Algorithm? n a clearly specified set of simple instructions.
CSE 373 Data Structures Lecture 15
1 How to Multiply Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. integers, matrices, and polynomials.
HOW TO SOLVE IT? Algorithms. An Algorithm An algorithm is any well-defined (computational) procedure that takes some value, or set of values, as input.
Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2013 Lecture 12.
1 Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
1 Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: MWF.
A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 5 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.
9/10/10 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis L ECTURES Divide.
FFT1 The Fast Fourier Transform. FFT2 Outline and Reading Polynomial Multiplication Problem Primitive Roots of Unity (§10.4.1) The Discrete Fourier Transform.
Piyush Kumar (Lecture 2 Div & Conquer)
Divide and Conquer Andreas Klappenecker [based on slides by Prof. Welch]
5.6 Convolution and FFT. 2 Fast Fourier Transform: Applications Applications. n Optics, acoustics, quantum physics, telecommunications, control systems,
1 Closest Pair of Points (from “Algorithm Design” by J.Kleinberg and E.Tardos) Closest pair. Given n points in the plane, find a pair with smallest Euclidean.
1 Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
1 Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Karatsuba’s Algorithm for Integer Multiplication
Applied Symbolic Computation1 Applied Symbolic Computation (CS 300) Karatsuba’s Algorithm for Integer Multiplication Jeremy R. Johnson.
The Fast Fourier Transform and Applications to Multiplication
CSE 421 Algorithms Lecture 15 Closest Pair, Multiplication.
1 Chapter 4-2 Divide and Conquer Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Divide and Conquer Andreas Klappenecker [based on slides by Prof. Welch]
Lecture 5 Today, how to solve recurrences We learned “guess and proved by induction” We also learned “substitution” method Today, we learn the “master.
15-853: Algorithms in the Real World Locality II: Cache-oblivious algorithms – Matrix multiplication – Distribution sort – Static searching.
Internal and External Sorting External Searching
1 Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
1 How to Multiply Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. integers, matrices, and polynomials.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 11.
Today’s Material Sorting: Definitions Basic Sorting Algorithms
1 Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 16.
May 9, 2001Applied Symbolic Computation1 Applied Symbolic Computation (CS 680/480) Lecture 6: Multiplication, Interpolation, and the Chinese Remainder.
Chapter 2 Divide-and-Conquer algorithms
Chapter 5 Divide and Conquer
Chapter 2 Divide-and-Conquer algorithms
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Chapter 6 Transform-and-Conquer
Chapter 2 Divide-and-Conquer algorithms
Divide and Conquer / Closest Pair of Points Yin Tat Lee
Divide-and-Conquer Divide-and-conquer.
Chapter 5 Divide and Conquer
Advanced Algorithms Analysis and Design
Chapter 5 Divide and Conquer
Richard Anderson Lecture 14 Inversions, Multiplication, FFT
Divide and Conquer / Closest Pair of Points Yin Tat Lee
CSCI 256 Data Structures and Algorithm Analysis Lecture 12
Presentation transcript:

Advanced Algorithms Piyush Kumar (Lecture 11: Div & Conquer) Welcome to COT5405 Source: Kevin Wayne, Harold Prokop.

Divide and Conquer Inversions Closest Point problems Integer multiplication Matrix Multiplication Cache Aware algorithms/ Cache oblivious Algorithms Static Van Emde Boas Layout

5.3 Counting Inversions

Music site tries to match your song preferences with others. –You rank n songs. –Music site consults database to find people with similar tastes. Similarity metric: number of inversions between two rankings. –My rank: 1, 2, …, n. –Your rank: a 1, a 2, …, a n. –Songs i and j inverted if i a j. Brute force: check all  (n 2 ) pairs i and j. You Me ABCDE Songs Counting Inversions Inversions 3-2, 4-2

Applications. –Voting theory. –Collaborative filtering. –Measuring the "sortedness" of an array. –Sensitivity analysis of Google's ranking function. –Rank aggregation for meta-searching on the Web. –Nonparametric statistics (e.g., Kendall's Tau distance).

Counting Inversions: Divide-and-Conquer Divide-and-conquer

Counting Inversions: Divide-and-Conquer Divide-and-conquer. –Divide: separate list into two pieces Divide: O(1).

Counting Inversions: Divide-and-Conquer Divide-and-conquer. –Divide: separate list into two pieces. –Conquer: recursively count inversions in each half blue-blue inversions8 green-green inversions Divide: O(1). Conquer: 2T(n / 2) 5-4, 5-2, 4-2, 8-2, , 9-3, 9-7, 12-3, 12-7, 12-11, 11-3, 11-7

Counting Inversions: Divide-and-Conquer Divide-and-conquer. –Divide: separate list into two pieces. –Conquer: recursively count inversions in each half. –Combine: count inversions where a i and a j are in different halves, and return sum of three quantities blue-blue inversions8 green-green inversions Divide: O(1). Conquer: 2T(n / 2) Combine: ??? 9 blue-green inversions 5-3, 4-3, 8-6, 8-3, 8-7, 10-6, 10-9, 10-3, 10-7 Total = = 22.

13 blue-green inversions: Counting Inversions: Combine Combine: count blue-green inversions –Assume each half is sorted. –Count inversions where a i and a j are in different halves. –Merge two sorted halves into sorted whole. Count: O(n) Merge: O(n) to maintain sorted invariant

Counting Inversions: Implementation Pre-condition. [Merge-and-Count] A and B are sorted. Post-condition. [Sort-and-Count] L is sorted. Sort-and-Count(L) { if list L has one element return 0 and the list L Divide the list into two halves A and B (r A, A)  Sort-and-Count(A) (r B, B)  Sort-and-Count(B) (r B, L)  Merge-and-Count(A, B) return r = r A + r B + r and the sorted list L }

Closest Pair of Points

Closest Pair of Points Closest pair. Given n points in the plane, find a pair with smallest Euclidean distance between them. Fundamental geometric primitive. –Graphics, computer vision, geographic information systems, molecular modeling, air traffic control. –Special case of nearest neighbor, Euclidean MST, Voronoi. Brute force. Check all pairs of points p and q with  (n 2 ) comparisons. 1-D version. O(n log n) easy if points are on a line. Assumption. No two points have same x coordinate. to make presentation cleaner fast closest pair inspired fast algorithms for these problems

Closest Pair of Points: First Attempt Divide. Sub-divide region into 4 quadrants. L

Closest Pair of Points: First Attempt Divide. Sub-divide region into 4 quadrants. Obstacle. Impossible to ensure n/4 points in each piece. L

Closest Pair of Points Algorithm. –Divide: draw vertical line L so that roughly ½n points on each side. L

Closest Pair of Points Algorithm. –Divide: draw vertical line L so that roughly ½n points on each side. –Conquer: find closest pair in each side recursively L

Closest Pair of Points Algorithm. –Divide: draw vertical line L so that roughly ½n points on each side. –Conquer: find closest pair in each side recursively. –Combine: find closest pair with one point in each side. –Return best of 3 solutions L seems like  (n 2 )

Closest Pair of Points Find closest pair with one point in each side, assuming that distance <   = min(12, 21) L

Closest Pair of Points Find closest pair with one point in each side, assuming that distance < . –Observation: only need to consider points within  of line L.  L  = min(12, 21)

 Closest Pair of Points Find closest pair with one point in each side, assuming that distance < . –Observation: only need to consider points within  of line L. –Sort points in 2  -strip by their y coordinate. L  = min(12, 21)

 Closest Pair of Points Find closest pair with one point in each side, assuming that distance < . –Observation: only need to consider points within  of line L. –Sort points in 2  -strip by their y coordinate. –Only check distances of those within 11 positions in sorted list! L  = min(12, 21)

Closest Pair of Points Def. Let s i be the point in the 2  -strip, with the i th smallest y-coordinate. Claim. If |i – j|  12, then the distance between s i and s j is at least . Pf. –No two points lie in same ½  -by-½  box. –Two points at least 2 rows apart have distance  2(½  ). ▪ Fact. Still true if we replace 12 with 7.   ½½ 2 rows ½½ ½½ 39 i j

Closest Pair Algorithm Closest-Pair(p 1, …, p n ) { Compute separation line L such that half the points are on one side and half on the other side.  1 = Closest-Pair(left half)  2 = Closest-Pair(right half)  = min(  1,  2 ) Delete all points further than  from separation line L Sort remaining points by y-coordinate. Scan points in y-order and compare distance between each point and next 11 neighbors. If any of these distances is less than , update . return . } O(n log n) 2T(n / 2) O(n) O(n log n) O(n)

Closest Pair of Points: Analysis Running time. Q. Can we achieve O(n log n)? A. Yes. Don't sort points in strip from scratch each time. –Each recursive returns two lists: all points sorted by y coordinate, and all points sorted by x coordinate. –Sort by merging two pre-sorted lists.

Integer Multiplication

Integer Arithmetic Add. Given two n-digit integers a and b, compute a + b. –O(n) bit operations. Multiply. Given two n-digit integers a and b, compute a × b. –Brute force solution:  (n 2 ) bit operations Add Multiply

To multiply two n-digit integers: –Multiply four ½n-digit integers. –Add two ½n-digit integers, and shift to obtain result. Divide-and-Conquer Multiplication: Warmup assumes n is a power of 2

To multiply two n-digit integers: –Add two ½n digit integers. –Multiply three ½n-digit integers. –Add, subtract, and shift ½n-digit integers to obtain result. Theorem. [Karatsuba-Ofman, 1962] Can multiply two n-digit integers in O(n ) bit operations. Karatsuba Multiplication AB C A C

Karatsuba: Recursion Tree n 3(n/2) 9(n/4) 3 k (n / 2 k ) 3 lg n (2)... T(n) T(n/2) T(n/4) T(2) T(n / 2 k ) T(n/4) T(n/2) T(n/4) T(n/2) T(n/4)...

31 Matrix Multiplication

Matrix Multiplication (MM) ∑   n k kjikij bac 1 CBA = × Courtesy Harold Prokop

Matrix multiplication. Given two n-by-n matrices A and B, compute C = AB. Brute force.  (n 3 ) arithmetic operations. Fundamental question. Can we improve upon brute force? Matrix Multiplication

Matrix Multiplication: Warmup Divide-and-conquer. –Divide: partition A and B into ½n-by-½n blocks. –Conquer: multiply 8 ½n-by-½n recursively. –Combine: add appropriate products using 4 matrix additions.

Matrix Multiplication: Key Idea Key idea. multiply 2-by-2 block matrices with only 7 multiplications. –7 multiplications. –18 = additions (or subtractions).

Fast Matrix Multiplication Fast matrix multiplication. (Strassen, 1969) –Divide: partition A and B into ½n-by-½n blocks. –Compute: 14 ½n-by-½n matrices via 10 matrix additions. –Conquer: multiply 7 ½n-by-½n matrices recursively. –Combine: 7 products into 4 terms using 8 matrix additions. Analysis. –Assume n is a power of 2. –T(n) = # arithmetic operations.

Fast Matrix Multiplication in Practice Implementation issues. –Sparsity. –Caching effects. –Numerical stability. –Odd matrix dimensions. –Crossover to classical algorithm around n = 128. Common misperception: "Strassen is only a theoretical curiosity." –Advanced Computation Group at Apple Computer reports 8x speedup on G4 Velocity Engine when n ~ 2,500. –Range of instances where it's useful is a subject of controversy. Remark. Can "Strassenize" Ax=b, determinant, eigenvalues, and other matrix ops.

Fast Matrix Multiplication in Theory Q. Multiply two 2-by-2 matrices with only 7 scalar multiplications? A. Yes! [Strassen, 1969] Q. Multiply two 2-by-2 matrices with only 6 scalar multiplications? A. Impossible. [Hopcroft and Kerr, 1971] Q. Two 3-by-3 matrices with only 21 scalar multiplications? A. Also impossible. Q. Two 70-by-70 matrices with only 143,640 scalar multiplications? A. Yes! [Pan, 1980] Decimal wars. –December, 1979: O(n ). –January, 1980: O(n ).

Fast Matrix Multiplication in Theory Best known. O(n ) [Coppersmith-Winograd, 1987.] Conjecture. O(n 2+  ) for any  > 0. Caveat. Theoretical improvements to Strassen are progressively less practical.

B LOCK -M ULT ( A, B, C, n ) 1 for i  1 to n / s 2 dofor j  1 to n / s 3 dofor k  1 to n / s 4 do O RD -M ULT ( A ik, B kj, C ij, s ) B LOCK -M ULT ( A, B, C, n ) 1 for i  1 to n / s 2 dofor j  1 to n / s 3 dofor k  1 to n / s 4 do O RD -M ULT ( A ik, B kj, C ij, s ) Cache - - Aware MM s s n n [HK81] s s n n

Towards faster matrix multiplication… (Blocked version) Advantages –Exploit locality using blocking –Do not assume that each access to memory is O(1) –Can be extended to multiple levels of cache –Usually the fastest algorithm after tuning. Disadvantages –Needs tuning every time it runs on a new machine. –Usually “s” is a voodoo parameter that is unknown.

© Harald Prokop 18 Oct 9910 B LOCK -M ULT ( A, B, C, n ) 1 for i  1 to n / s 2 dofor j  1 to n / s 3 dofor k  1 to n / s 4 do O RD -M ULT ( A ik, B kj, C ij, s ) B LOCK -M ULT ( A, B, C, n ) 1 for i  1 to n / s 2 dofor j  1 to n / s 3 dofor k  1 to n / s 4 do O RD -M ULT ( A ik, B kj, C ij, s ) Voodoo! Cache - - Aware Matrix Multiplication s s n n Tune s so that A ik, B kj, and C ij just fit into cache ?  Zs    . )()( 3 23 ZLn LssnnQ   If n > s, then Optimal[HK81].

Cache Oblivious

Algorithms for External Memory

P M Computer Secondary Storage (Hard Disks)... C P = Processor C = Cache M = Main Memory

Storage Capacity access time (sec) cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape typical capacity (bytes) from Gray & Reuter (2002)

Storage Cost access time (sec) cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape dollars/MB from Gray & Reuter (2002)

Disk Access Time Block X Request  Block in Memory Time = Seek Time + Rotational Delay + Transfer Time + Other Typical Numbers (for random block access) Seek Time = 10ms Rot Delay = 4ms (7200rpm) transfer rate = 50MB/sec Other = CPU time to access delays+ Contention for controllers Contention for Bus Multiple copies of the same data

Rules for EM Algorithms Sequential IO is cheap compared to Random IO 1kb block –Seq : 1ms –Random : 20ms The difference becomes smaller and smaller as the block size becomes larger.

The DAM Model Count the number of IOs. Explicitly control which Blocks are in memory. Two level memory model. Notation: M = Size of memory. B = size of disk block. N = size of data. Question: How many block transfers for one scan of the data set?

Problem Mergesort –How many Block IOs does it need to sort N numbers (when the size of the data is extremely large compared to M) Can we do better?

External Memory Sorting Penny Sort Competition 40GB, 433 million records, 1541 seconds on a 614$ Linux/AMD system

EM Sorting Two pass external memory merge sort. run 0 input : O(M) chunk 0chunk 1chunk 2chunk 3chunk 4 run 1run 2run n chunk 5chunk 6 … B buffers BBB k=O(M/B) k-merger chunk 7chunk 8 run 0123 B … … …

The CO Memory Model Cache-oblivious memory model –Reason about two-level, but prove results for unknown multilevel memory models –Parameters B and M are unknown, thus optimized for all levels of memory hierarchy B = L, M = Z?

Matrix Transpose: DAM n CO What is the number of blocks you need to move in a transpose of a large matrix? –In DAM –In CO

Static Searches Only for balanced binary trees Assume there are no insertions and deletions Only searches Can we speed up such seaches?

What is a layout? Mapping of nodes of a tree to the Memory Different kinds of layouts –In-order –Post-order –Pre-order –Van Emde Boas Main Idea : Store Recursive subtrees in contiguous memory

Example of Van Emde Boas

Another View

Theoretical Guarantees? Cache Complexity Q(n) = O( ) Work Complexity W(n) = O(log n) From Prokop’s Thesis

In Practice??

In Practice II

In Practice! Matrix Operations by Morton Ordering, David S.Wise (Cache oblivious Practical Matrix operation results) Bender, Duan, Wu (Cache oblivious dictionaries) Rahman, Cole, Raman (CO B-Trees)

Known Optimal Results Matrix Multiplication Matrix Transpose n-point FFT LUP Decomposition Sorting Searching

Other Results Known Priority Q List Ranking Tree Algos Directed BFS/DFS Undirected BFS MSF

Introduction to Streaming Algorithms

Brain Teaser Let P = { 1...n }. Let P’ = P \ {x} –x in P Paul shows Carole elements from P’ Carole can only use O(log n) bits memory to answer the question in the end. Now what about P’’?

Streaming Algorithms Data that computers are being asked to process is growing astronomically. Infeasible to store If I cant store the data I am looking at, how do I compute a summary of this data?

Another Brain Teaser Given a set of numbers in a large array. In one pass, decide if some item is in majority (assume you only have constant size memory to work with) N = 12; item 9 is majority Courtesy : Subhash Suri

Misra-Gries Algorithm (‘82) A counter and an ID. –If new item is same as stored ID, increment counter. –Otherwise, decrement the counter. –If counter 0, store new item with count = 1. If counter > 0, then its item is the only candidate for majority ID count

Data Stream Algorithms Majority and Frequent are examples of data stream algorithms. Data arrives as an online sequence x 1, x 2, …, potentially infinite. Algorithm processes data in one pass (in given order) Algorithm’s memory is significantly smaller than input data Summarize the data: compute useful patterns

Streaming Data Sources –Internet traffic monitoring –New Computer Graphics hardware –Web logs and click streams –Financial and stock market data –Retail and credit card transactions –Telecom calling records –Sensor networks, surveillance –RFID –Instruction profiling in microprocessors –Data warehouses (random access too expensive).

FFTFFT

Fast Fourier Transform: Applications Applications. –Optics, acoustics, quantum physics, telecommunications, control systems, signal processing, speech recognition, data compression, image processing. –DVD, JPEG, MP3, MRI, CAT scan. –Numerical solutions to Poisson's equation. The FFT is one of the truly great computational developments of this [20th] century. It has changed the face of science and engineering so much that it is not an exaggeration to say that life as we know it would be very different without the FFT. -Charles van Loan

Fast Fourier Transform: Brief History Gauss (1805, 1866). Analyzed periodic motion of asteroid Ceres. Runge-König (1924). Laid theoretical groundwork. Danielson-Lanczos (1942). Efficient algorithm. Cooley-Tukey (1965). Monitoring nuclear tests in Soviet Union and tracking submarines. Rediscovered and popularized FFT. Importance not fully realized until advent of digital computers.

Polynomials: Coefficient Representation Polynomial. [coefficient representation] Add: O(n) arithmetic operations. Evaluate: O(n) using Horner's method. Multiply (convolve): O(n 2 ) using brute force.

Polynomials: Point-Value Representation Fundamental theorem of algebra. [Gauss, PhD thesis] A degree n polynomial with complex coefficients has n complex roots. Corollary. A degree n-1 polynomial A(x) is uniquely specified by its evaluation at n distinct values of x. x y xjxj y j = A(x j )

Polynomials: Point-Value Representation Polynomial. [point-value representation] Add: O(n) arithmetic operations. Multiply: O(n), but need 2n-1 points. Evaluate: O(n 2 ) using Lagrange's formula.

Converting Between Two Polynomial Representations Tradeoff. Fast evaluation or fast multiplication. We want both! Goal. Make all ops fast by efficiently converting between two representations. Coefficient Representation O(n 2 ) Multiply O(n) Evaluate Point-valueO(n)O(n 2 ) coefficient representation point-value representation

Converting Between Two Polynomial Representations: Brute Force Coefficient to point-value. Given a polynomial a 0 + a 1 x a n-1 x n-1, evaluate it at n distinct points x 0,..., x n-1. Point-value to coefficient. Given n distinct points x 0,..., x n-1 and values y 0,..., y n-1, find unique polynomial a 0 + a 1 x a n-1 x n-1 that has given values at given points. Vandermonde matrix is invertible iff x i distinct O(n 3 ) for Gaussian elimination O(n 2 ) for matrix-vector multiply

Coefficient to Point-Value Representation: Intuition Coefficient to point-value. Given a polynomial a 0 + a 1 x a n-1 x n- 1, evaluate it at n distinct points x 0,..., x n-1. Divide. Break polynomial up into even and odd powers. –A(x) = a 0 + a 1 x + a 2 x 2 + a 3 x 3 + a 4 x 4 + a 5 x 5 + a 6 x 6 + a 7 x 7. –A even (x) = a 0 + a 2 x + a 4 x 2 + a 6 x 3. –A odd (x) = a 1 + a 3 x + a 5 x 2 + a 7 x 3. –A(-x) = A even (x 2 ) + x A odd (x 2 ). –A(-x) = A even (x 2 ) - x A odd (x 2 ). Intuition. Choose two points to be  1. –A(-1) = A even (1) + 1 A odd (1). –A(-1) = A even (1) - 1 A odd (1). Can evaluate polynomial of degree  n at 2 points by evaluating two polynomials of degree  ½n at 1 point.

Coefficient to Point-Value Representation: Intuition Coefficient to point-value. Given a polynomial a 0 + a 1 x a n-1 x n- 1, evaluate it at n distinct points x 0,..., x n-1. Divide. Break polynomial up into even and odd powers. –A(x) = a 0 + a 1 x + a 2 x 2 + a 3 x 3 + a 4 x 4 + a 5 x 5 + a 6 x 6 + a 7 x 7. –A even (x) = a 0 + a 2 x + a 4 x 2 + a 6 x 3. –A odd (x) = a 1 + a 3 x + a 5 x 2 + a 7 x 3. –A(-x) = A even (x 2 ) + x A odd (x 2 ). –A(-x) = A even (x 2 ) - x A odd (x 2 ). Intuition. Choose four points to be  1,  i. –A(-1) = A even (-1) + 1 A odd ( 1). –A(-1) = A even (-1) - 1 A odd (-1). –A(-i) = A even (-1) + i A odd (-1). –A(-i) = A even (-1) - i A odd (-1). Can evaluate polynomial of degree  n at 4 points by evaluating two polynomials of degree  ½n at 2 points.

Discrete Fourier Transform Coefficient to point-value. Given a polynomial a 0 + a 1 x a n-1 x n- 1, evaluate it at n distinct points x 0,..., x n-1. Key idea: choose x k =  k where  is principal n th root of unity. Discrete Fourier transform Fourier matrix F n

Roots of Unity Def. An n th root of unity is a complex number x such that x n = 1. Fact. The n th roots of unity are:  0,  1, …,  n-1 where  = e 2  i / n. Pf. (  k ) n = (e 2  i k / n ) n = (e  i ) 2k = (-1) 2k = 1. Fact. The ½n th roots of unity are: 0, 1, …, n/2-1 where = e 4  i / n. Fact.  2 = and (  2 ) k = k.  0 = 0 = 1 11  2 = 1 = i 33  4 = 2 = -1 55  6 = 3 = -i 77 n = 8

Fast Fourier Transform Goal. Evaluate a degree n-1 polynomial A(x) = a a n-1 x n-1 at its n th roots of unity:  0,  1, …,  n-1. Divide. Break polynomial up into even and odd powers. –A even (x) = a 0 + a 2 x + a 4 x 2 + … + a n/2-2 x (n-1)/2. –A odd (x) = a 1 + a 3 x + a 5 x 2 + … + a n/2-1 x (n-1)/2. –A(x) = A even (x 2 ) + x A odd (x 2 ). Conquer. Evaluate degree A even (x) and A odd (x) at the ½n th roots of unity: 0, 1, …, n/2-1. Combine. –A(  k+n ) = A even ( k ) +  k A odd ( k ), 0  k < n/2 –A(  k+n/2 ) = A even ( k ) -  k A odd ( k ), 0  k < n/2  k+n/2 = -  k k = (  k ) 2 = (  k+n/2 ) 2

fft(n, a 0,a 1,…,a n-1 ) { if (n == 1) return a 0 (e 0,e 1,…,e n/2-1 )  FFT(n/2, a 0,a 2,a 4,…,a n-2 ) (d 0,d 1,…,d n/2-1 )  FFT(n/2, a 1,a 3,a 5,…,a n-1 ) for k = 0 to n/2 - 1 {  k  e 2  ik/n y k+n/2  e k +  k d k y k+n/2  e k -  k d k } return (y 0,y 1,…,y n-1 ) } FFT Algorithm

FFT Summary Theorem. FFT algorithm evaluates a degree n-1 polynomial at each of the n th roots of unity in O(n log n) steps. Running time. T(2n) = 2T(n) + O(n)  T(n) = O(n log n). assumes n is a power of 2 O(n log n) coefficient representation point-value representation

Point-Value to Coefficient Representation: Inverse DFT Goal. Given the values y 0,..., y n-1 of a degree n-1 polynomial at the n points  0,  1, …,  n-1, find unique polynomial a 0 + a 1 x a n-1 x n-1 that has given values at given points. Inverse DFT Fourier matrix inverse (F n ) -1

Claim. Inverse of Fourier matrix is given by following formula. Consequence. To compute inverse FFT, apply same algorithm but use  -1 = e -2  i / n as principal n th root of unity (and divide by n). Inverse FFT

Inverse FFT: Proof of Correctness Claim. F n and G n are inverses. Pf. Summation lemma. Let  be a principal n th root of unity. Then Pf. –If k is a multiple of n then  k = 1  sums to n. –Each n th root of unity  k is a root of x n - 1 = (x - 1) (1 + x + x x n-1 ). –if  k  1 we have: 1 +  k +  k(2)  k(n-1) = 0  sums to 0. ▪ summation lemma

Inverse FFT: Algorithm ifft(n, a 0,a 1,…,a n-1 ) { if (n == 1) return a 0 (e 0,e 1,…,e n/2-1 )  FFT(n/2, a 0,a 2,a 4,…,a n-2 ) (d 0,d 1,…,d n/2-1 )  FFT(n/2, a 1,a 3,a 5,…,a n-1 ) for k = 0 to n/2 - 1 {  k  e -2  ik/n y k+n/2  (e k +  k d k ) / n y k+n/2  (e k -  k d k ) / n } return (y 0,y 1,…,y n-1 ) }

Inverse FFT Summary Theorem. Inverse FFT algorithm interpolates a degree n-1 polynomial given values at each of the n th roots of unity in O(n log n) steps. assumes n is a power of 2 O(n log n) coefficient representation O(n log n) point-value representation

Polynomial Multiplication Theorem. Can multiply two degree n-1 polynomials in O(n log n) steps. O(n) point-value multiplication O(n log n) FFT inverse FFT O(n log n) coefficient representation

FFT in Practice Fastest Fourier transform in the West. [Frigo and Johnson] –Optimized C library. –Features: DFT, DCT, real, complex, any size, any dimension. –Won 1999 Wilkinson Prize for Numerical Software. –Portable, competitive with vendor-tuned code. Implementation details. –Instead of executing predetermined algorithm, it evaluates your hardware and uses a special-purpose compiler to generate an optimized algorithm catered to "shape" of the problem. –Core algorithm is nonrecursive version of Cooley-Tukey radix 2 FFT. –O(n log n), even for prime sizes. Reference:

More on FFT 0,4,2,6,1,5,3,7

FFT

Parallel FFT a 0 y 0 a 1 y 1 a 2 y 2 a 3 y 3 a 4 y 4 a 5 y 5 a 6 y 6 a 7 y 7

Integer Multiplication Integer multiplication. Given two n bit integers a = a n-1 … a 1 a 0 and b = b n-1 … b 1 b 0, compute their product c = a  b. Convolution algorithm. –Form two polynomials. –Note: a = A(2), b = B(2). –Compute C(x) = A(x)  B(x). –Evaluate C(2) = a  b. –Running time: O(n log n) complex arithmetic steps. Theory. [Schönhage-Strassen 1971] O(n log n log log n) bit operations. Practice. [GNU Multiple Precision Arithmetic Library] GMP proclaims to be "the fastest bignum library on the planet." It uses brute force, Karatsuba, and FFT, depending on the size of n.