Communication costs of Schönhage-Strassen fast integer multiplication

Slides:



Advertisements
Similar presentations
1 Fast Multiplication of Large Numbers Using Fourier Techniques Henry Skiba Advisor: Dr. Marcus Pendergrass.
Advertisements

Communication Lower Bound for the Fast Fourier Transform Michael Anderson Communication-Avoiding Algorithms (CS294) Fall 2011.
DFT properties Note: it is important to ensure that the DFTs are the same length If x1(n) and x2(n) have different lengths, the shorter sequence must be.
Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.
Counting the bits Analysis of Algorithms Will it run on a larger problem? When will it fail?
A simple example finding the maximum of a set S of n numbers.
DIVIDE AND CONQUER. 2 Algorithmic Paradigms Greedy. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer.
Strassen's Matrix Multiplication Sibel KIRMIZIGÜL.
1 Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances.
CS4413 Divide-and-Conquer
Lecture 10 Jianjun Hu Department of Computer Science and Engineering University of South Carolina CSCE350 Algorithms and Data Structure.
Advanced Topics in Algorithms and Data Structures Lecture pg 1 Recursion.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Solving the Greatest Common Divisor Problem in Parallel Derrick Coetzee University of California, Berkeley CS 273, Fall 2010, Prof. Satish Rao.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4)
Design and Analysis of Algorithms - Chapter 41 Divide and Conquer The most well known algorithm design strategy: 1. Divide instance of problem into two.
CS2336: Computer Science II
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 5 Instructor: Paul Beame TA: Gidon Shavit.
Design and Analysis of Algorithms - Chapter 41 Divide and Conquer The most well known algorithm design strategy: 1. Divide instance of problem into two.
Fast Fourier Transform Irina Bobkova. Overview I. Polynomials II. The DFT and FFT III. Efficient implementations IV. Some problems.
Numerical Analysis – Digital Signal Processing Hanyang University Jong-Il Park.
MA/CSSE 473 Day 17 Divide-and-conquer Convex Hull Strassen's Algorithm: Matrix Multiplication.
Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 3 Prof. Erik Demaine.
1 Computer Algorithms Lecture 7 Master Theorem Some of these slides are courtesy of D. Plaisted, UNC and M. Nicolescu, UNR.
FFT1 The Fast Fourier Transform. FFT2 Outline and Reading Polynomial Multiplication Problem Primitive Roots of Unity (§10.4.1) The Discrete Fourier Transform.
Karatsuba’s Algorithm for Integer Multiplication
Complexity 20-1 Complexity Andrei Bulatov Parallel Arithmetic.
Applied Symbolic Computation1 Applied Symbolic Computation (CS 300) Karatsuba’s Algorithm for Integer Multiplication Jeremy R. Johnson.
The Fast Fourier Transform and Applications to Multiplication
1 Fast Polynomial and Integer Multiplication Jeremy R. Johnson.
CSE 421 Algorithms Lecture 15 Closest Pair, Multiplication.
1 Chapter 4-2 Divide and Conquer Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
1 Mathematical Algorithms 1. Arithmetic, (Pseudo) Random Numbers and all that 2. Evaluation & Multiplication of Polynomials I 3. Multiplication of Large.
Divide and Conquer. Recall Divide the problem into a number of sub-problems that are smaller instances of the same problem. Conquer the sub-problems by.
Introduction: Efficient Algorithms for the Problem of Computing Fibonocci Numbers Prepared by John Reif, Ph.D. Analysis of Algorithms.
Master Method Some of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill.
1 How to Multiply Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. integers, matrices, and polynomials.
Fast Fourier Transforms. 2 Discrete Fourier Transform The DFT pair was given as Baseline for computational complexity: –Each DFT coefficient requires.
Divide and Conquer Faculty Name: Ruhi Fatima Topics Covered Divide and Conquer Matrix multiplication Recurrence.
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 16.
May 9, 2001Applied Symbolic Computation1 Applied Symbolic Computation (CS 680/480) Lecture 6: Multiplication, Interpolation, and the Chinese Remainder.
Prepared by John Reif, Ph.D.
Introduction to Algorithms: Divide-n-Conquer Algorithms
Recursion Ali.
DIGITAL SIGNAL PROCESSING ELECTRONICS
An Iterative FFT We rewrite the loop to calculate nkyk[1] once
Chapter 4 Divide-and-Conquer
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Chapter 4: Divide and Conquer
Chapter 4 Divide-and-Conquer
Applied Symbolic Computation
CSCE 411 Design and Analysis of Algorithms
Applied Symbolic Computation
Real-time double buffer For hard real-time
DFT and FFT By using the complex roots of unity, we can evaluate and interpolate a polynomial in O(n lg n) An example, here are the solutions to 8 =
4.1 DFT In practice the Fourier components of data are obtained by digital computation rather than by analog processing. The analog values have to be.
Punya Biswas Lecture 15 Closest Pair, Multiplication
The Fast Fourier Transform
Applied Symbolic Computation
September 4, 1997 Applied Symbolic Computation (CS 567) Fast Polynomial and Integer Multiplication Jeremy R. Johnson.
Lecture 17 DFT: Discrete Fourier Transform
Divide-and-Conquer The most-well known algorithm design strategy:
Lecture 15, Winter 2019 Closest Pair, Multiplication
David Kauchak cs161 Summer 2009
Applied Symbolic Computation
Divide-and-Conquer 7 2  9 4   2   4   7
Lecture 15 Closest Pair, Multiplication
Fast Fourier Transform
Fast Polynomial and Integer Multiplication
Presentation transcript:

Communication costs of Schönhage-Strassen fast integer multiplication Derrick Coetzee This work released under Creative Commons Zero Waiver by its author, Derrick Coetzee (dcoetzee@eecs.berkeley.edu).

Schönhage-Strassen Algorithm Classical FFT-based integer multiplication algorithm O(n log n log log n) on n-bit inputs Asymptotically fastest algorithm used in practice

Acyclic convolution Vector operation “Multiplication without carrying” Can compute efficiently with convolution theorem 1 2 3 4 5 6 7 8 8 16 24 32 7 14 21 28 6 12 18 24 5 10 15 20_________ 5 16 34 60 61 52 32

Using convolution to multiply 1234 5678 Split and zero pad Split and zero pad 4 3 2 1 8 7 6 5 DFT DFT Recursive pointwise multiplication IDFT 32 52 61 60 34 16 5______ 7006652 32 52 61 60 34 16 5 Recombination (carrying) 7006652

Vector length and entry size 01111001 Take n-bit input, split into 𝑛 parts of 𝑛 bits each Vector length = 𝑛 Vector entries must be large enough to hold sum of 𝑛 values, each a product of two 𝑛 -bit numbers ⌈lg 𝑛 ( 2 𝑛 −1)2⌉ = 2 𝑛 + ½lg n bits 01 11 10 01 m=2 d = 4 1 2 3 4 5 6 7 8 8 16 24 32 7 14 21 28 6 12 18 24 5 10 15 20_________ 5 16 34 60 61 52 32

Arithmetic cost All ops except FFT and recursive multiplications are O(n) Avoid recursive multiplications in FFTs by performing all operations in ℤ2n′+1 FFTs take O(n log n) time T(n) = 𝑛 T(2 𝑛 + ½lg n) + O(n log n) Solution: T(n) = O(n log n log log n)

Communication cost Suppose each FFT done independently Hong-Kung FFT lower bound (met by algorithm of Frigo): O(n log n/log M) Stop recursion when subproblem fits in cache (Q(n) = n for n < αM) Q(n) = 𝑛 Q(2 𝑛 ) + O(n log n/log M) Q(n) = O(n (log n/log M) log(log n/log M))