What is the WHT anyway, and why are there so many ways to compute it?

Slides:



Advertisements
Similar presentations
DIVIDE AND CONQUER. 2 Algorithmic Paradigms Greedy. Build up a solution incrementally, myopically optimizing some local criterion. Divide-and-conquer.
Advertisements

Introduction to Algorithms Jiafen Liu Sept
Chapter 2: Fundamentals of the Analysis of Algorithm Efficiency
3.II.1. Representing Linear Maps with Matrices 3.II.2. Any Matrix Represents a Linear Map 3.II. Computing Linear Maps.
Proof Techniques and Recursion. Proof Techniques Proof by induction –Step 1: Prove the base case –Step 2: Inductive hypothesis, assume theorem is true.
Divide-and-Conquer1 7 2  9 4   2  2 79  4   72  29  94  4.
Divide-and-Conquer1 7 2  9 4   2  2 79  4   72  29  94  4.
Chapter 4: Divide and Conquer The Design and Analysis of Algorithms.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
Analysis of Algorithms CS 477/677
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
A Prototypical Self-Optimizing Package for Parallel Implementation of Fast Signal Transforms Kang Chen and Jeremy Johnson Department of Mathematics and.
Divide-and-Conquer 7 2  9 4   2   4   7
What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…
Divide-and-Conquer1 7 2  9 4   2  2 79  4   72  29  94  4.
File Organization and Processing Week 13 Divide and Conquer.
Analysis of Algorithms CS 477/677
Distributed WHT Algorithms Kang Chen Jeremy Johnson Computer Science Drexel University Franz Franchetti Electrical and Computer Engineering.
Data Structure Introduction.
Java Methods Big-O Analysis of Algorithms Object-Oriented Programming
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 3.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 2 Prepared by İnanç TAHRALI.
Recurrence Relations Analyzing the performance of recursive algorithms.
1 How to Multiply Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. integers, matrices, and polynomials.
Performance Analysis of Divide and Conquer Algorithms for the WHT Jeremy Johnson Mihai Furis, Pawel Hitczenko, Hung-Jen Huang Dept. of Computer Science.
Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.
In Search of the Optimal WHT Algorithm J. R. Johnson Drexel University Markus Püschel CMU
CMPT 438 Algorithms.
Analysis of Algorithms CS 477/677
Design and Analysis of Algorithms
Fang Fang James C. Hoe Markus Püschel Smarahara Misra
Design and Analysis of Algorithms
Analysis of Algorithms
Analysis of Algorithms CS 477/677
Analysis of Algorithms
Conditional Branch Example
Divide-and-Conquer 6/30/2018 9:16 AM
MA/CSSE 473 Day 02 Some Numeric Algorithms and their Analysis
David Kauchak CS52 – Spring 2015
COSC160: Data Structures Linked Lists
Algorithms Furqan Majeed.
CSCE 411 Design and Analysis of Algorithms
CS 3343: Analysis of Algorithms
CS 154, Lecture 4: Limitations on DFAs (I),
Time Complexity Problem size (Input size)
The Instruction Set Architecture Level
Advance Analysis of Lecture No 9 Institute of Southern Punjab Multan
WHT Package Jeremy Johnson.
Greedy Algorithms Many optimization problems can be solved more quickly using a greedy approach The basic principle is that local optimal decisions may.
Divide-and-Conquer 7 2  9 4   2   4   7
Instructors: Majd Sakr and Khaled Harras
CSE 373 Data Structures and Algorithms
Searching, Sorting, and Asymptotic Complexity
Lecture 8. Paradigm #6 Dynamic Programming
Algorithms: the big picture
Algorithm Analysis, Asymptotic notations CISC4080 CIS, Fordham Univ.
Divide-and-Conquer 7 2  9 4   2   4   7
Heapsort Sorting in place
Analysis of Algorithms
Algorithms Recurrences.
The Selection Problem.
Divide-and-Conquer 7 2  9 4   2   4   7
Algorithms Classification – Part 2
Discrete Mathematics CS 2610
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Mathematical Induction II
Compiler Construction CS 606 Sohail Aslam Lecture 1.
Divide-and-Conquer 7 2  9 4   2   4   7
Presentation transcript:

What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…

Walsh-Hadamard Transform y = WHTN x, N = 2n

WHT Algorithms Factor WHTN into a product of sparse structured matrices Compute: y = (M1 M2 … Mt)x yt = Mtx … y2 = M2 y3 y = M1 y2

Factoring the WHT Matrix AC Ä BD = (A Ä B)(C Ä D) A Ä B = (A Ä I)(I Ä B) A Ä (B Ä C) = (A Ä B) Ä C Im Ä In = Imn WHT2 Ä WHT2 = (WHT2 Ä I2)(I2 Ä WHT2)

Recursive and Iterative Factorization WHT8 = (WHT2 Ä I4)(I2 Ä WHT4) = (WHT2 Ä I4)(I2 Ä ((WHT2 Ä I2) (I2 Ä WHT2))) = (WHT2 Ä I4)(I2 Ä (WHT2 Ä I2)) (I2 Ä (I2 Ä WHT2)) = (WHT2 Ä I4)(I2 Ä (WHT2 Ä I2)) ((I2 Ä I2) Ä WHT2) = (WHT2 Ä I4)(I2 Ä WHT2 Ä I2) ((I2 Ä I2) Ä WHT2) = (WHT2 Ä I4)(I2 Ä WHT2 Ä I2) (I4 Ä WHT2)

Recursive Algorithm WHT8 = (WHT2  I4)(I2  (WHT2  I2)(I2  WHT2))

Iterative Algorithm WHT8 = (WHT2  I4)(I2  WHT2  I2)(I4  WHT2) ú û ù ê ë é - = 1 WHT8 = (WHT2  I4)(I2  WHT2  I2)(I4  WHT2)

WHT Algorithms Recursive Iterative General

WHT Implementation Õ WHT ( I Ä WHT Ä I ) R=N; S=1; Definition/formula N=N1* N2Nt Ni=2ni x=WHTN*x x =(x(b),x(b+s),…x(b+(M-1)s)) Implementation(nested loop) R=N; S=1; for i=t,…,1 R=R/Ni for j=0,…,R-1 for k=0,…,S-1 S=S* Ni; M b,s t Õ WHT = ( I Ä WHT Ä I ) n i +1 + ··· + t 2 n 2 n 1 + ··· + i - 2 n i 2 i = 1

Partition Trees 9 3 4 2 1 Left Recursive Right Recursive 4 1 3 2 4 1 3 Balanced 4 2 1 Iterative 4 1

Ordered Partitions There is a 1-1 mapping from ordered partitions of n onto (n-1)-bit binary numbers. There are 2n-1 ordered partitions of n. 162 = 1 0 1 0 0 0 1 0 1|1 1|1 1 1 1|1 1  1+2+4+2 = 9

Enumerating Partition Trees 00 01 01 3 3 3 2 1 2 1 1 1 10 10 11 3 3 1 2 3 1 2 1 1 1

Search Space Optimization of the WHT becomes a search, over the space of partition trees, for the fastest algorithm. The number of trees:

Size of Search Space Let T(z) be the generating function for Tn Tn = (n/n3/2), where =4+8  6.8284 Restricting to binary trees Tn = (5n/n3/2)

WHT Package Püschel & Johnson (ICASSP ’00) Allows easy implementation of any of the possible WHT algorithms Partition tree representation W(n)=small[n] | split[W(n1),…W(nt)] Tools Measure runtime of any algorithm Measure hardware events Search for good implementation Dynamic programming Evolutionary algorithm

Histogram (n = 16, 10,000 samples) Wide range in performance despite equal number of arithmetic operations (n2n flops) Pentium III consumes more run time (more pipeline stages) Ultra SPARC II spans a larger range

Operation Count Theorem. Let WN be a WHT algorithm of size N. Then the number of floating point operations (flops) used by WN is Nlg(N). Proof. By induction.

Instruction Count Model A(n) = number of calls to WHT procedure = number of instructions outside loops Al(n) = Number of calls to base case of size l  l = number of instructions in base case of size l Li = number of iterations of outer (i=1), middle (i=2), and outer (i=3) loop i = number of instructions in outer (i=1), middle (i=2), and outer (i=3) loop body

Small[1] .file "s_1.c" .version "01.01" gcc2_compiled.: .text .align 4 .globl apply_small1 .type apply_small1,@function apply_small1: movl 8(%esp),%edx //load stride S to EDX movl 12(%esp),%eax //load x array's base address to EAX fldl (%eax) // st(0)=R7=x[0] fldl (%eax,%edx,8) //st(0)=R6=x[S] fld %st(1) //st(0)=R5=x[0] fadd %st(1),%st // R5=x[0]+x[S] fxch %st(2) //st(0)=R5=x[0],s(2)=R7=x[0]+x[S] fsubp %st,%st(1) //st(0)=R6=x[S]-x[0] ????? fxch %st(1) //st(0)=R6=x[0]+x[S],st(1)=R7=x[S]-x[0] fstpl (%eax) //store x[0]=x[0]+x[S] fstpl (%eax,%edx,8) //store x[0]=x[0]-x[S] ret

Recurrences

Recurrences

Histogram using Instruction Model (P3)  l = 12,  l = 34, and  l = 106  = 27 1 = 18, 2 = 18, and 1 = 20

Algorithm Comparison

Dynamic Programming ), Cost( min n n1+… + nt= n … T T where Tn is the optimial tree of size n. This depends on the assumption that Cost only depends on the size of a tree and not where it is located. (true for IC, but false for runtime). For IC, the optimal tree is iterative with appropriate leaves. For runtime, DP is a good heuristic (used with binary trees).

Optimal Formulas UltraSPARC Pentium [1], [2], [3], [4], [5], [6] [7] [[4],[4]] [[5],[4]] [[5],[5]] [[5],[6]] [[2],[[5],[5]]] [[2],[[5],[6]]] [[2],[[2],[[5],[5]]]] [[2],[[2],[[5],[6]]]] [[2],[[2],[[2],[[5],[5]]]]] [[2],[[2],[[2],[[5],[6]]]]] [[2],[[2],[[2],[[2],[[5],[5]]]]]] [[2],[[2],[[2],[[2],[[5],[6]]]]]] [[2],[[2],[[2],[[2],[[2],[[5],[5]]]]]]] UltraSPARC [1], [2], [3], [4], [5], [6] [[3],[4]] [[4],[4]] [[4],[5]] [[5],[5]] [[5],[6]] [[4],[[4],[4]]] [[[4],[5]],[4]] [[4],[[5],[5]]] [[[5],[5]],[5]] [[[5],[5]],[6]] [[4],[[[4],[5]],[4]]] [[4],[[4],[[5],[5]]]] [[4],[[[5],[5]],[5]]] [[5],[[[5],[5]],[5]]]

Different Strides Dynamic programming assumption is not true. Execution time depends on stride.