UBlas: Boost High Performance Vector and Matrix Classes Juan José Gómez Cadenas University of Geneve and University of Valencia (thanks to: Joerg Walter,

Slides:



Advertisements
Similar presentations
Brown Bag #2 Advanced C++. Topics  Templates  Standard Template Library (STL)  Pointers and Smart Pointers  Exceptions  Lambda Expressions  Tips.
Advertisements

CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
The University of Adelaide, School of Computer Science
Rational Apex 4.0 Optimization “Beware the benchmark!”
1cs542g-term Notes  Assignment 1 will be out later today (look on the web)
1cs542g-term Notes  Assignment 1 is out (questions?)
Stanford University CS243 Winter 2006 Wei Li 1 Loop Transformations and Locality.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
C++ Training Datascope Lawrence D’Antonio Lecture 1 Quiz 1.
Computer Science and Software Engineering University of Wisconsin - Platteville 7. Inheritance and Polymorphism Yan Shi CS/SE 2630 Lecture Notes.
Chapter 1 Algorithm Analysis
Parallel Programming in Java with Shared Memory Directives.
5.3 Machine-Independent Compiler Features
LECTURE LECTURE 17 More on Templates 20 An abstract recipe for producing concrete code.
Spring 2010 Advanced Programming Section 1-STL Computer Engineering Department Faculty of Engineering Cairo University Advanced Programming Spring 2010.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Taken from slides of Starting Out with C++ Early Objects Seventh Edition.
Prof. amr Goneid, AUC1 CSCE 110 PROGRAMMING FUNDAMENTALS WITH C++ Prof. Amr Goneid AUC Part 10. Pointers & Dynamic Data Structures.
SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson 2 Robert Johnson 3, David Padua 1 1 Computer Science, University of Illinois.
CSE 425: Data Types II Survey of Common Types I Records –E.g., structs in C++ –If elements are named, a record is projected into its fields (e.g., via.
CS 11 C++ track: lecture 7 Today: Templates!. Templates: motivation (1) Lots of code is generic over some type Container data types: List of integers,
Feb 20, 2002Soumya Mohanty, AEI1 GEO++ A general purpose C++ DSP library Plan of presentation Soumya Mohanty: Overview R. Balasubramanian: MPI Shell, Frame.
Compiler Support for Profiling C++ Template Metaprograms József Mihalicza, Norbert Pataki, Zoltán Porkoláb Eötvös Loránd University Faculty of Informatics.
C++ Template Metaprogramming Why, When and How? Zoltán Porkoláb Dept. of Programming Languages and Compilers, Faculty of Informatics Eötvös.
CSE 332: C++ Type Programming: Associated Types, Typedefs and Traits A General Look at Type Programming in C++ Associated types (the idea) –Let you associate.
Generative Programming. Automated Assembly Lines.
Programming Language Support for Generic Libraries Jeremy Siek and Walid Taha Abstract The generic programming methodology is revolutionizing the way we.
Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.
Copyright © Curt Hill Generic Classes Template Classes or Container Classes.
1. The term STL stands for ? a) Simple Template Library b) Static Template Library c) Single Type Based Library d) Standard Template Library Answer : d.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
C arrays are limited: -they are represented by pointers (which may or may not be valid); -Indexes not checked (which means you can overrun your array);
Standard Template Library The Standard Template Library was recently added to standard C++. –The STL contains generic template classes. –The STL permits.
Operator Overloading: indexing Useful to create range-checked structures: class four_vect { double stor[4]; // private member, actual contents of vector.
Generating Truly Optimal Code Using a Metaprogramming Library Don Clugston First D Programming Conference, 24 August 2007.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
FORTRAN History. FORTRAN - Interesting Facts n FORTRAN is the oldest Language actively in use today. n FORTRAN is still used for new software development.
Computer Graphics 3 Lecture 1: Introduction to C/C++ Programming Benjamin Mora 1 University of Wales Swansea Pr. Min Chen Dr. Benjamin Mora.
More Array Access Examples Here is an example showing array access logic: const int MAXSTUDENTS = 100; int Test[MAXSTUDENTS]; int numStudents = 0;... //
 2008 Pearson Education, Inc. All rights reserved. 1 Arrays and Vectors.
Generic Compressed Matrix Insertion P ETER G OTTSCHLING – S MART S OFT /TUD D AG L INDBO – K UNGLIGA T EKNISKA H ÖGSKOLAN SmartSoft – TU Dresden
STL CSSE 250 Susan Reeder. What is the STL? Standard Template Library Standard C++ Library is an extensible framework which contains components for Language.
Chapter 3 Templates. Objective In Chapter 3, we will discuss: The concept of a template Function templates Class templates vector and matrix classes Fancy.
1 Chapter 1 C++ Templates Readings: Sections 1.6 and 1.7.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
Parallel Programming & Cluster Computing Linear Algebra Henry Neeman, University of Oklahoma Paul Gray, University of Northern Iowa SC08 Education Program’s.
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
DGrid: A Library of Large-Scale Distributed Spatial Data Structures Pieter Hooimeijer,
CMSC 202 Computer Science II for Majors. CMSC 202UMBC Topics Templates Linked Lists.
Prof. Amr Goneid, AUC1 CSCE 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part R2. Elementary Data Structures.
Copyright © 2002 Pearson Education, Inc. Slide 1.
Copyright © 2002 Pearson Education, Inc. Slide 1.
5th MaCS Debrecen On the Turing-Completeness of Generative Metaprograms Zoltán Porkoláb, István Zólyomi Dept. of Programming.
Inheritance Modern object-oriented (OO) programming languages provide 3 capabilities: encapsulation inheritance polymorphism which can improve the design,
Lecture 04: Sequences structures
Indexing and hashing.
CS 215 Final Review Ismail abumuhfouz Fall 2014.
Sparse Matrix-Vector Multiplication (Sparsity, Bebop)
CS 213: Data Structures and Algorithms
CSCE 210 Data Structures and Algorithms
Introduction to MATLAB
Lab 03 - Iterator.
CMSC 202 Lesson 22 Templates I.
Constructors and destructors
(HPEC-LB) Outline Notes
Memory Efficient Radar Processing
<PETE> Shape Programmability
Simulation And Modeling
Templates I CMSC 202.
C++Builder Existing user? Migration and Performance David Millington C++ Product Manager, Embarcadero.
Presentation transcript:

uBlas: Boost High Performance Vector and Matrix Classes Juan José Gómez Cadenas University of Geneve and University of Valencia (thanks to: Joerg Walter, uBlas co-author. Todd Vedhuizem, ET co-inventor)

Vector and Matrix classes in C++ Use of C++ vector and matrix classes for scientific calculations typically results in poor performance w.r.t Fortran or C. This is due two several factors: Use of virtual functions (dynamic polymorphism) Temporaries

Polymorphism Standard tool in C++ Requires virtual functions that have big performance penalties Extra memory access Compiler cannot optimize around the virtual function call. It prevents desired features such as loop unrolling, etc. Virtual functions are acceptable if function is big or not called very often

Polymorphism (II) Unfortunately, in scientific code some of the most useful places for virtual functions are in inner loop bodies and involve small routines class HepGenMatrix {HepGenMatrix public: virtual ~HepGenMatrix() {} virtual int num_row() const = 0; virtual int num_col() const = 0; virtual const double & operator()(int row, int col) const =0; virtual double & operator()(int row, int col) =0 Virtual function dispatch to operator () results in poor performance

Static Polymorphism Replace dynamic polymorphism with static (i.e, compile time) polymorphism Use of expression templates Expression templates heavily depend on the famous Barton-Nackman trick, also coined 'curiously defined recursive templates'

Barton-Nachman trick template class class Matrix{ public: T_leaf& assign_leaf(){ return static_cast (*this);} double operator () (int i, int j){ //delegate to leaf return assign_leaf()(i,j) … class symmetric_matric : public Matrix

Static Polymorphism at Work The trick is that the base class takes a template parameter which is the type of the leaf class. This ensures that the complete type of an object is known at compile time. No need for virtual function dispatch Methods can be selectively specialized in the leaf classes (default in the base, overridden when necessary) Leaf classes can have methods which are unique to the leaf class

Temporaries When you write: Vector a(n), b(n), c(n); a = b + c + d; The compiler does the following: Vector* _t1 = new Vector(n); for(int i=0; i < n; i++) _t1(i) = b(i) + c(i); Vector* _t2 = new Vector(n); for(int i=0; i < n; i++) _t2(i) = _t1(i) + b(i);

Temporaries(II) for(int i=0; i < n; i++) a(i) = _t2(i) + _t1(i) ; delete _t2; delete _t1; So you have created and deleted two temporaries!

Performance Implications For small arrays (HEP case!) the overhead of new and delete result in very poor performance (about 1/10 of C) For large arrays the cost is in the temporaries. It depends on the operation. For example, they are expensive for + operation

Expression Templates Invented independently by Todd Veldhuizen and Daveed Vandevoorde The basic idea is to use operator overloading to build parse trees. Take advantage of the basic fact that a class can take itself as a template parameter

Example Array A,B,C,D; D=A+B+C; The expression A+B+C could be represented by a type such as: X > Consider: struct plus{} ; // addition class Array {}; // some array class

Example (cont) // X represents a node in a parse tree template class x{}; //The overloaded operator with does parsing for expressions of the // form A+B+C+D… Template X operator + (T, Array){ return x (); }

Example (cont) With the above code, A+B+C is parsed like this: Array A,B,C,D; D=A+B+C; X ()+ C; =X, plus, Array> ();

uBlas Consistent use of expression templates to eliminate virtual function calls and temporaries results in very high performance (for a C++ standalone library) Carefully designed (boost pair reviewed) interface. Maps Blas calls supports conventional dense, packed and basic sparse vector and matrix storage layouts Symmetric, hermitian, triangular matrices, etc Template type (T=int, float, double, complex…) STL like iterators Proxies (ranges, slices) to access views of vector and matrices

uBlas (ii) Extensive checking via consistent use of exceptions Very well documented Part of the boost library (i.e, reliable maintenance)

uBlas (III) Real High Performance libraries (like ATLAS) are using platform specific assembler kernels Toon Knapen and Kresimir Fresl are working on C++ bindings to such kernels, which already allow the interfacing of uBLAS with ATLAS

Comments on CLHEP matrix classes 10 years old already (i.e, a success!) But: Use of virtual functions Inefficient array indexing M[][] (temp objects) Temporaries problems “Messy” interface Linear algebra functions are often part of the class M.inverse()???

Conclusion uBlas: Modern C++, very professional, very well documented, part of boost. Fast “Blas compliant” Very clean interface Seems a very good candidate to replace current CLHEP vector and matrix classes