On the Performance of Parametric Polymorphism in Maple Laurentiu DraganStephen M. Watt Ontario Research Centre for Computer Algebra University of Western.

Slides:

Advertisements

Similar presentations

The Study of Cache Oblivious Algorithms Prepared by Jia Guo.

Advertisements

Procedural programming in Java

Fast Algorithms For Hierarchical Range Histogram Constructions

Names and Bindings.

Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.

Chapter 5 Methods to our madness. Review Identifiers are names used for variables, constants, methods, classes, and packages. Must start with a letter,

JavaScript Part for Repetition Statement for statement Cpecifies each of the items needed for counter-controlled repetition with a control variable.

Lightweight Abstraction for Mathematical Computation in Java 1 Pavel Bourdykine and Stephen M. Watt Department of Computer Science Western University London.

CENG536 Computer Engineering Department Çankaya University.

MATH 685/ CSI 700/ OR 682 Lecture Notes

FFT1 The Fast Fourier Transform. FFT2 Outline and Reading Polynomial Multiplication Problem Primitive Roots of Unity (§10.4.1) The Discrete Fourier Transform.

J. P. Cohoon and J. W. Davidson © 1999 McGraw-Hill, Inc. Templates and Polymorphism Generic functions and classes.

1 Lecture 6 Performance Measurement and Improvement.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.

CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.

Hash Tables1 Part E Hash Tables  

Hash Tables1 Part E Hash Tables  

High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Generative Programming. Generic vs Generative Generic Programming focuses on representing families of domain concepts Generic Programming focuses on representing.

OBJECT ORIENTED PROGRAMMING IN C++ LECTURE

Fundamentals of Python: From First Programs Through Data Structures

Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.

Chapter 1 Algorithm Analysis

Fundamentals of Python: First Programs

CSM-Java Programming-I Spring,2005 Introduction to Objects and Classes Lesson - 1.

C++ Programming. Table of Contents History What is C++? Development of C++ Standardized C++ What are the features of C++? What is Object Orientation?

Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.

Background (Floating-Point Representation 101)  Floating-point represents real numbers as (± sig × 2 exp )  Sign bit  Significand (“mantissa” or “fraction”)

Programming in Java Unit 2. Class and variable declaration A class is best thought of as a template from which objects are created. You can create many.

Overloading Binary Operators Two ways to overload –As a member function of a class –As a friend function As member functions –General syntax Data Structures.

1 Introduction Modules  Most computer programs solve much larger problem than the examples in last sessions.  The problem is more manageable and easy.

Arrays Module 6. Objectives Nature and purpose of an array Using arrays in Java programs Methods with array parameter Methods that return an array Array.

Hello.java Program Output 1 public class Hello { 2 public static void main( String [] args ) 3 { 4 System.out.println( “Hello!" ); 5 } // end method main.

Guided Notes Ch. 9 ADT and Modules Ch. 10 Object-Oriented Programming PHP support for OOP and Assignment 4 Term project proposal C++ and Java Designer.

Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.

P.1 Real Numbers. 2 What You Should Learn Represent and classify real numbers. Order real numbers and use inequalities. Find the absolute values of real.

Hash Tables1   © 2010 Goodrich, Tamassia.

Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.

Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.

1 Workshop Topics - Outline Workshop 1 - Introduction Workshop 2 - module instantiation Workshop 3 - Lexical conventions Workshop 4 - Value Logic System.

An Object-Oriented Approach to Programming Logic and Design Fourth Edition Chapter 6 Using Methods.

Lecture 8 February 29, Topics Questions about Exercise 4, due Thursday? Object Based Programming (Chapter 8) –Basic Principles –Methods –Fields.

Copyright © Curt Hill Generic Classes Template Classes or Container Classes.

The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

OBJECT-ORIENTED PROGRAMMING (OOP) WITH C++ Instructor: Dr. Hany H. Ammar Dept. of Electrical and Computer Engineering, WVU.

Procedural programming in Java Methods, parameters and return values.

Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.

Sparse Matrix Vector Multiply Algorithms and Optimizations on Modern Architectures Ankit Jain, Vasily Volkov CS252 Final Presentation 5/9/2007

Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.

Reconstruction of Solid Models from Oriented Point Sets Misha Kazhdan Johns Hopkins University.

Texas A&M University, Department of Aerospace Engineering AN EMBEDDED FUNCTION TOOL FOR MODELING AND SIMULATING ESTIMATION PROBLEMS IN AEROSPACE ENGINEERING.

Unconventional Fixed-Radix Number Systems

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

JAVA: An Introduction to Problem Solving & Programming, 5 th Ed. By Walter Savitch and Frank Carrano. ISBN © 2008 Pearson Education, Inc., Upper.

Basic Concepts of OOP.  Object-Oriented Programming (OOP) is a type of programming added to php5 that makes building complex, modular and reusable web.

Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University

Arrays Chapter 7. MIS Object Oriented Systems Arrays UTD, SOM 2 Objectives Nature and purpose of an array Using arrays in Java programs Methods.

Chapter 1 C++ Templates (Sections 1.6, 1.7). Templates Type-independent patterns that can work with multiple data types. Function Templates  These define.

CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.

Number Systems. The position of each digit in a weighted number system is assigned a weight based on the base or radix of the system. The radix of decimal.

Binary Numbers The arithmetic used by computers differs in some ways from that used by people. Computers perform operations on numbers with finite and.

Martin Kruliš by Martin Kruliš (v1.0)1.

Binary Numbers The arithmetic used by computers differs in some ways from that used by people. Computers perform operations on numbers with finite and.

CUDA Interoperability with Graphical Environments

JavaScript: Functions.

© 2013 Goodrich, Tamassia, Goldwasser

Presentation transcript:

On the Performance of Parametric Polymorphism in Maple Laurentiu DraganStephen M. Watt Ontario Research Centre for Computer Algebra University of Western Ontario Maple Conference 2006

Outline Parametric Polymorphism Parametric Polymorphism SciMark SciMark SciGMark SciGMark A Maple Version of SciGMark A Maple Version of SciGMark Results Results Conclusions Conclusions

Parametric Polymorphism Type Polymorphism – Allows a single definition of a function to be used with different types of data Type Polymorphism – Allows a single definition of a function to be used with different types of data Parametric Polymorphism Parametric Polymorphism –A form of polymophism where the code does not use any specific type information –Instances with type parameters Increasing popularity – C++, C#, Java Increasing popularity – C++, C#, Java Code reusability and reliability Code reusability and reliability Generic Libraries – STL, Boost, NTL, LinBox, Sum-IT (Aldor) Generic Libraries – STL, Boost, NTL, LinBox, Sum-IT (Aldor)

SciMark National Institute of Standards and Technology National Institute of Standards and Technology – Consists of five kernels: Consists of five kernels: 1. Fast Fourier transform –One-dimensional transform of 1024 complex numbers –Each complex number 2 consecutive entries in the array –Exercises complex arithmetic, non-constant memory references and trigonometric functions

SciMark 2. Jacobi successive over-relaxation –100 × 100 grid –Represented by a two dimensional array –Exercises basic “grid averaging” – each A(i, j) is assigned the average weighting of its four nearest neighbors 3. Monte Carlo –Approximates the value of π by computing the integral part of the quarter unit cycle –Random points inside the unit square – compute the ratio of those within the cycle –Exercises random-number generators, function inlining

SciMark 4. Sparse matrix multiplication –Uses an unstructured sparse matrix representation stored in a compressed-row format –Exercises indirection addressing and non-regular memory references 5. Dense LU factorization –LU factorization of a dense 100 × 100 matrix using partial pivoting –Exercises dense matrix operations

SciMark The kernels are repeated until the time spent in each kernel exceeds a certain threshold (2 seconds in our case) The kernels are repeated until the time spent in each kernel exceeds a certain threshold (2 seconds in our case) After the threshold is reached, the kernel is run once more and timed After the threshold is reached, the kernel is run once more and timed The time is divided by number of floating point operations The time is divided by number of floating point operations The result is reported in MFlops (or Million Floating- point instructions per second) The result is reported in MFlops (or Million Floating- point instructions per second)

SciMark There are two sets of data for the tests: large and small There are two sets of data for the tests: large and small Small uses small data sets to reduce the effect of cache misses Small uses small data sets to reduce the effect of cache misses Large is the opposite of small Large is the opposite of small For our Maple tests we used only the small data set For our Maple tests we used only the small data set

SciGMark Generic version of SciMark (SYNASC 2005) Generic version of SciMark (SYNASC 2005) – Measure difference in performance between generic and specialized code Measure difference in performance between generic and specialized code Kernels rewritten to operate over a generic numerical type supporting basic arithmetic operations (+, -, ×, /, zero, one) Kernels rewritten to operate over a generic numerical type supporting basic arithmetic operations (+, -, ×, /, zero, one) Current version implements a wrapper for numbers using double precision floating-point representation Current version implements a wrapper for numbers using double precision floating-point representation

Parametric Polymorphism in Maple Module-producing functions Module-producing functions –Functions that take one or more modules as arguments and produce modules as their result –Resulting modules use operations from the parameter modules to provide abstract algorithms in a generic form

Example MyGenericType := proc(R) module () export f, g; #Here f and g can use u and v from R f := proc(a, b) foo(R:-u(a), R:-v(b)) end; g := proc(a, b) goo(R:-u(a), R:-v(b)) end; end module: end proc: MyGenericType := proc(R) module () export f, g; #Here f and g can use u and v from R f := proc(a, b) foo(R:-u(a), R:-v(b)) end; g := proc(a, b) goo(R:-u(a), R:-v(b)) end; end module: end proc:

Approaches Object-oriented Object-oriented –Data and operations together –Module for each value –Closer to the original SciGMark implementation Abstract Data Type Abstract Data Type –Each value is some data object –Operations are implemented separately in a generic module –Same module shared by all the values belonging to each type

Object-Oriented Approach DoubleRing := proc(val::float) local Me; Me := module() export v, a, s, m, d, gt, zero, one, coerce, absolute, sine, sqroot; v := val; # Data value of object # Implementations for +, -, *, /, >, etc a := (b) -> DoubleRing(Me:-v + b:-v); s := (b) -> DoubleRing(Me:-v – b:-v); m := (b) -> DoubleRing(Me:-v * b:-v); d := (b) -> DoubleRing(Me:-v / b:-v); gt := (b) -> Me:-v > b:-v; zero := () -> DoubleRing(0.0); coerce := () -> Me:-v;... end module: return Me; end proc: DoubleRing := proc(val::float) local Me; Me := module() export v, a, s, m, d, gt, zero, one, coerce, absolute, sine, sqroot; v := val; # Data value of object # Implementations for +, -, *, /, >, etc a := (b) -> DoubleRing(Me:-v + b:-v); s := (b) -> DoubleRing(Me:-v – b:-v); m := (b) -> DoubleRing(Me:-v * b:-v); d := (b) -> DoubleRing(Me:-v / b:-v); gt := (b) -> Me:-v > b:-v; zero := () -> DoubleRing(0.0); coerce := () -> Me:-v;... end module: return Me; end proc:

Object-Oriented Approach Previous example simulates object-oriented approach by storing the value in the module Previous example simulates object-oriented approach by storing the value in the module The exports a, s, m, d correspond to basic arithmetic operations The exports a, s, m, d correspond to basic arithmetic operations We chose names other than the standard +, -, ×, / for two reasons: We chose names other than the standard +, -, ×, / for two reasons: –The code looks similar to the original SciGMark (Java does not have operator overloading) –It is not very easy to overload operators in Maple Functions like sine and sqroot are used by the FFT algorithm to replace complex operations Functions like sine and sqroot are used by the FFT algorithm to replace complex operations

Abstract Data Type Approach DoubleRing := module() export a, s, m, d, gt, zero, one, coerce, absolute, sine, sqroot; # Implementations for +, -, *, /, >, etc a := (a, b) -> a + b; s := (a, b) -> a – b; m := (a, b) -> a * b; d := (a, b) -> a / b; gt := (a, b) -> a > b; zero := () -> 0.0; one := () -> 1.0; coerce := (a::float) -> a; absolute := (a) -> abs(a); sine := (a) -> sin(a); sqroot := (a) -> sqrt(a); end module: DoubleRing := module() export a, s, m, d, gt, zero, one, coerce, absolute, sine, sqroot; # Implementations for +, -, *, /, >, etc a := (a, b) -> a + b; s := (a, b) -> a – b; m := (a, b) -> a * b; d := (a, b) -> a / b; gt := (a, b) -> a > b; zero := () -> 0.0; one := () -> 1.0; coerce := (a::float) -> a; absolute := (a) -> abs(a); sine := (a) -> sin(a); sqroot := (a) -> sqrt(a); end module:

Abstract Data Type Approach Module does not store data, provides only the operations Module does not store data, provides only the operations As a convention one must coerce the float type to the representation used by this module As a convention one must coerce the float type to the representation used by this module In this case the representation is exactly float In this case the representation is exactly float DoubleRing module created only once for each kernel DoubleRing module created only once for each kernel

Kernels Each SciGMark kernel exports an implementation of its algorithm and a function to compute the estimated floating point operations Each SciGMark kernel exports an implementation of its algorithm and a function to compute the estimated floating point operations Each kernel is parametrized by a module R, that abstracts the numerical type Each kernel is parametrized by a module R, that abstracts the numerical type

Kernel Structure gFFT := proc(R) module() export num_flops, transform, inverse; local transform_internal, bitreverse; num_flops :=...; transform :=...; inverse :=...; transform_internal :=...; bitreverse :=...; end module: end proc: gFFT := proc(R) module() export num_flops, transform, inverse; local transform_internal, bitreverse; num_flops :=...; transform :=...; inverse :=...; transform_internal :=...; bitreverse :=...; end module: end proc:

Kernels The high level structure is the same for object- oriented and for abstract data type The high level structure is the same for object- oriented and for abstract data type Implementation inside the functions is different Implementation inside the functions is different ModelCode Specialized x*x + y*y Object-oriented(x:-m(x):-a(y:-m(y))):-coerce() Abstract Data Type R:-coerce(R:-a(R:-m(x,x), R:-m(y,y)))

Kernel Sample (Abstract Data) GenMonteCarlo := proc(DR::`module`) local m; m := module () export num_flops, integrate; local SEED; SEED := 113; num_flops := (Num_samples) -> Num_samples * 4.0; integrate := proc (numSamples) local R, under_curve, count, x, y, nsm1; R := Random(SEED); under_curve := 0; nsm1 := numSamples - 1; for count from 0 to nsm1 do x := DR:-coerce(R:-nextDouble()); y := DR:-coerce(R:-nextDouble()); if DR:-coerce(DR:-a(DR:-m(x,x), DR:-m(y, y))) Num_samples * 4.0; integrate := proc (numSamples) local R, under_curve, count, x, y, nsm1; R := Random(SEED); under_curve := 0; nsm1 := numSamples - 1; for count from 0 to nsm1 do x := DR:-coerce(R:-nextDouble()); y := DR:-coerce(R:-nextDouble()); if DR:-coerce(DR:-a(DR:-m(x,x), DR:-m(y, y))) <= 1.0 then under_curve := under_curve + 1; end if; end do; return (under_curve / numSamples) * 4.0; end proc; end module: return m; end proc:

Kernel Sample (Object-Oriented) GenMonteCarlo := proc(r::`procedure`) local m; m := module () export num_flops, integrate; local SEED; SEED := 113; num_flops := (Num_samples) -> Num_samples * 4.0; integrate := proc (numSamples) local R, under_curve, count, x, y, nsm1; R := Random(SEED); under_curve := 0; nsm1 := numSamples - 1; for count from 0 to nsm1 do x := r(R:-nextDouble()); y := r(R:-nextDouble()); if (x:-m(x):-a(y:-m(y))):-coerce() Num_samples * 4.0; integrate := proc (numSamples) local R, under_curve, count, x, y, nsm1; R := Random(SEED); under_curve := 0; nsm1 := numSamples - 1; for count from 0 to nsm1 do x := r(R:-nextDouble()); y := r(R:-nextDouble()); if (x:-m(x):-a(y:-m(y))):-coerce() <= 1.0 then under_curve := under_curve + 1; end if; end do; return (under_curve / numSamples) * 4.0; end proc; end module: return m; end proc:

Kernel Sample (Contd.) measureMonteCarlo := proc(min_time, R) local Q, cycles; Q := Stopwatch(); cycles := 1; while true do Q:-strt(); GenMonteCarlo(DoubleRing):-integrate(cycles); Q:-stp(); if Q:-rd() >= min_time then break; end if; cycles := cycles * 2; end do; return GenMonteCarlo(DoubleRing):-num_flops(cycles) / Q:-rd() * 1.0e-6; end proc; measureMonteCarlo := proc(min_time, R) local Q, cycles; Q := Stopwatch(); cycles := 1; while true do Q:-strt(); GenMonteCarlo(DoubleRing):-integrate(cycles); Q:-stp(); if Q:-rd() >= min_time then break; end if; cycles := cycles * 2; end do; return GenMonteCarlo(DoubleRing):-num_flops(cycles) / Q:-rd() * 1.0e-6; end proc;

Results (MFlops) TestSpecialized Abstract Data Type Object Oriented Fast Fourier Transform Successive Over Relaxation Monte Carlo Sparse Matrix Multiplication LU Factorization Composite Ratio100%74%10% Note: Larger means faster

Results Abstract Data Type is very close in performance to the specialized version – about 75% as fast Abstract Data Type is very close in performance to the specialized version – about 75% as fast Object-oriented model simulates closely the original SciGMark – produces many modules and this leads to a significant overhead about only 10% as fast Object-oriented model simulates closely the original SciGMark – produces many modules and this leads to a significant overhead about only 10% as fast Useful to separate the instance specific data from the shared methods module – values are formed as composite objects from the instance and the shared methods module Useful to separate the instance specific data from the shared methods module – values are formed as composite objects from the instance and the shared methods module

Conclusions Performance penalty should not discourage writing generic code Performance penalty should not discourage writing generic code –Provides code reusability that can simplify libraries –Writing generic programs in mathematical context helps programmers operate at a higher level of abstraction Generic code optimization is possible and we proposed an approach to optimize it by specializing the generic type according to the instances of the type parameters Generic code optimization is possible and we proposed an approach to optimize it by specializing the generic type according to the instances of the type parameters

Conclusions (Contd.) Parametric polymorphism does not introduce excessive performance penalty Parametric polymorphism does not introduce excessive performance penalty –Possible because of the interpreted nature of Maple, not many optimizations performed on the specialized code (even specialized code uses many function calls) Object-oriented use of modules not well supported in Maple; simulating sub-classing polymorphism in Maple is very expensive and should be avoided Object-oriented use of modules not well supported in Maple; simulating sub-classing polymorphism in Maple is very expensive and should be avoided Better support for overloading would help programmers write more generic code in Maple. Better support for overloading would help programmers write more generic code in Maple. More info about SciGMark at: More info about SciGMark at:

Acknowledgments ORCCA members ORCCA members MapleSoft MapleSoft