Memory Efficient Radar Processing

Slides:



Advertisements
Similar presentations
U Albany SUNY PETE code Review Xingmin Luo 6/12/2003.
Advertisements

INTRODUCTION TO PROGRAMMING STRUCTURE Chapter 4 1.
College of Nanoscale Science and Engineering A uniform algebraically-based approach to computational physics and efficient programming James E. Raynolds.
XYZ 11/16/2015 MIT Lincoln Laboratory AltiVec Extensions to the Portable Expression Template Engine (PETE)* Edward Rutledge HPEC September,
©Fraser Hutchinson & Cliff Green C++ Certificate Program C++ Intermediate Operator Overloading.
XP New Perspectives on XML, 2 nd Edition Tutorial 7 1 TUTORIAL 7 CREATING A COMPUTATIONAL STYLESHEET.
 2008 Pearson Education, Inc. All rights reserved. 1 Arrays and Vectors.
Lecture 5 functions 1 © by Pearson Education, Inc. All Rights Reserved.
Fall 2015CISC/CMPE320 - Prof. McLeod1 CISC/CMPE320 Assignment 1 due Friday, 7pm. RAD due next Friday. Presentations week 6. Today: –More details on functions,
U Albany SUNY 1 Outline Notes (not in presentation) Intro Overview of PETE (very high level) –What pete does. –Files – say what each file does Explain.
U Albany SUNY PETE code Review Xingmin Luo 6/12/2003.
University at Albany,SUNY lrm-1 lrm 6/28/2016 Levels of Processor/Memory Hierarchy Can be Modeled by Increasing Dimensionality of Data Array. –Additional.
Lesson 3 Functions. Lesson 3 Functions are declared with function. For example, to calculate the cube of a number function function name (parameters)
Arrays Chapter 7.
Arrays.
Pointers and Linked Lists
Introduction to Analysis of Algorithms
Chapter 13: Overloading and Templates
Pointers and Linked Lists
Programming with ANSI C ++
Standard Template Library
Chapter 19 Java Data Structures
Arrays in PHP are quite versatile
Chapter 14 Templates C++ How to Program, 8/e
Introduction to Custom Templates
5. Function (2) and Exercises
Chapter 15: Overloading and Templates
Student Book An Introduction
JavaScript: Functions.
CS212: Object Oriented Analysis and Design
Java Review: Reference Types
Data Structures Recursion CIS265/506: Chapter 06 - Recursion.
Starting Out with C++ Early Objects Eighth Edition
The dirty secrets of objects
Tuesday, February 20, 2018 Announcements… For Today… 4+ For Next Time…
C++ for Engineers and Scientists Second Edition
FIRST AND SECOND-ORDER TRANSIENT CIRCUITS
Dynamic Memory Allocation
C Passing arrays to a Function
Outline Notes (not in presentation)
Object - Oriented Programming Language
Building the Support for Radar Processing Across Memory Hierarchies:
7 Arrays.
The Relational Algebra and Relational Calculus
Chapter 6 Intermediate-Code Generation
Monolithic Compiler Experiments Using C++ Expression Templates*
Arrays We often want to organize objects or primitive data in a way that makes them easy to access and change. An array is simple but powerful way to.
Constructors and Other Tools
Object Oriented Programming in java
Monolithic Compiler Experiments Using C++ Expression Templates*
Building the Support for Radar Processing Across Memory Hierarchies:
Building the Support for Radar Processing Across Memory Hierarchies:
Building the Support for Radar Processing Across Memory Hierarchies:
(HPEC-LB) Outline Notes
Stacks.
Building the Support for Radar Processing Across Memory Hierarchies:
Building the Support for Radar Processing Across Memory Hierarchies:
Building the Support for Radar Processing Across Memory Hierarchies:
7 Arrays.
Overloading functions
<PETE> Shape Programmability
Outline Notes (not in presentation)
Object - Oriented Programming Language
VIRTUAL FUNCTIONS RITIKA SHARMA.
Submitted By : Veenu Saini Lecturer (IT)
Introduction to Programming
Managing 2D Arrays Simple Dynamic Allocation
Object - Oriented Programming Language
Building the Support for Radar Processing Across Memory Hierarchies:
SPL – PS2 C++ Memory Handling.
Presentation transcript:

Memory Efficient Radar Processing Paper: An Array Class with Shape and C++ Expression Templates by Lenore R. Mullin, Xingmin Luo and Lawrence A. Bush

Motivation Paper Summary The objective of the paper/ and more generally our project is to enable efficient / fast array computations. Applications - This is important to many scientific programming applications such as dsp computations. Strategies - Various strategies have been used in this pursuit, for example 1,2,3. Our strategy is different from these strategies, but is related to the expression template strategies used in MTL and PETE. Our Strategy Using PETE (portable expression template engine). PETE is a library that facilitates loop unrolling for computations. This greatly speeds it up because it removes most temporary arrays from the computation. Integrate Psi with PETE In our paper, we take steps toward this goal. In order to do this, we wrote a specialized multi-dimensional array class which works with pete. This was needed in order to give pete n-dimensional capability. It will also serve as a platform to integrate Psi into PETE. As part of our paper, we tested the class to show that the performance of our class using pete was comparable to the performance of hand coded C for the same multi-dimensional array operations. Our strategy wants to extend pete to use psi calculus rules. Psi calculus rules are one strategy to significantly speed up computations because it reduces intermediate computations to index manipulations (putting it briefly) which has the added effect of elimination many intermediate arrays used to store intermediate computations. i.e. td convolution and moa design Paper Summary:

Array Class N – Dimensionl Psi-Calculus Platform Fast Why do we need shape.h? A: so that we can do multi dimensional arrays in pete. We also want our array class to be extendable to use psi calculus.

Results:

Expression Tree + C + A B Show how it is evaluated (tree) Explain that the nodes are just templated types and are not actually evaluated. The evaluation is done by the overloaded assignment operator rather than cascading operator overloading. A B

Type const Expression <BinaryNode<OpAdd, Reference<Array>, &expr = A + B + C Show templated type for the A = B + C + D expression. Related notes: The last step in making Vec3 PETE-compatible is to provide a way for PETE to assign to a Vec3 from an arbitrary expression. This is done by overloading operator= to take a PETE expression as input, and copy values into its owner: 064 template<class RHS> 065 Vec3 &operator=(const Expression<RHS> &rhs) 066 { 067 d[0] = forEach(rhs, EvalLeaf1(0), OpCombine()); 068 d[1] = forEach(rhs, EvalLeaf1(1), OpCombine()); 069 d[2] = forEach(rhs, EvalLeaf1(2), OpCombine()); 070 071 return *this; 072 } The first thing to notice about this method is that it is templated on an arbitrary class RHS, but its single formal parameter has type Expression<RHS>. This combination means that the compiler can match it against anything that is wrapped in the generic PETE template Expression<>, and only against things that are wrapped in that way. The compiler cannot match against int, complex<short>, or GreatAuntJane_t, since these do not have the form Expression<RHS> for some type RHS. The forEach function is used to traverse expression trees. The first argument is the expression. The second argument is the leaf tag denoting the operation applied at the leaves. The third argument is a combiner tag, which is used to combine results at non-leaf nodes. By passing EvalLeaf1(0) in line 67, we are indicating that we want the Vec3s at the leaves to return the element at index 0. The LeafFunctor<Scalar<T>, EvalLeaf1> (defined inside of PETE) ensures that scalars return their value no matter the index. While EvalLeaf1 obtains values from the leaves, OpCombine takes these values and combines them according to the operators present at the non-leaf nodes. The result is that line 67 evaluates the expression on the right side of the assignment operator at index 0. Line 68 does this at index 1, and so on. Once evaluation is complete, operator= returns the Vec3 to which values have been assigned, in keeping with normal C++ conventions.

Future Work Psi – Calculus Implementation Benefit Psi complements this (reverse). Psi will improve this (to remove temporaries) How to implement: 1-Iterator like concept. 2-Index composition using expression templates -> special scalar – like type that is copied by value but only performed once per array operation regardless of the matrix size. Removes more temporaries which pete cannot No intermediate computations on itself (i.e. reverse). Eliminates the entire loop (not merely reducing it to one (or the minimum number). Reduces loop to a constant time indexing operation rather than a loop calculation.

End