Memory Efficient Radar Processing

Slides:

Advertisements

Similar presentations

U Albany SUNY PETE code Review Xingmin Luo 6/12/2003.

Advertisements

INTRODUCTION TO PROGRAMMING STRUCTURE Chapter 4 1.

College of Nanoscale Science and Engineering A uniform algebraically-based approach to computational physics and efficient programming James E. Raynolds.

XYZ 11/16/2015 MIT Lincoln Laboratory AltiVec Extensions to the Portable Expression Template Engine (PETE)* Edward Rutledge HPEC September,

©Fraser Hutchinson & Cliff Green C++ Certificate Program C++ Intermediate Operator Overloading.

XP New Perspectives on XML, 2 nd Edition Tutorial 7 1 TUTORIAL 7 CREATING A COMPUTATIONAL STYLESHEET.

 2008 Pearson Education, Inc. All rights reserved. 1 Arrays and Vectors.

Lecture 5 functions 1 © by Pearson Education, Inc. All Rights Reserved.

Fall 2015CISC/CMPE320 - Prof. McLeod1 CISC/CMPE320 Assignment 1 due Friday, 7pm. RAD due next Friday. Presentations week 6. Today: –More details on functions,

U Albany SUNY 1 Outline Notes (not in presentation) Intro Overview of PETE (very high level) –What pete does. –Files – say what each file does Explain.

U Albany SUNY PETE code Review Xingmin Luo 6/12/2003.

University at Albany,SUNY lrm-1 lrm 6/28/2016 Levels of Processor/Memory Hierarchy Can be Modeled by Increasing Dimensionality of Data Array. –Additional.

Lesson 3 Functions. Lesson 3 Functions are declared with function. For example, to calculate the cube of a number function function name (parameters)

Arrays Chapter 7.

Pointers and Linked Lists

Introduction to Analysis of Algorithms

Chapter 13: Overloading and Templates

Pointers and Linked Lists

Programming with ANSI C ++

Standard Template Library

Chapter 19 Java Data Structures

Arrays in PHP are quite versatile

Chapter 14 Templates C++ How to Program, 8/e

Introduction to Custom Templates

5. Function (2) and Exercises

Chapter 15: Overloading and Templates

Student Book An Introduction

JavaScript: Functions.

CS212: Object Oriented Analysis and Design

Java Review: Reference Types

Data Structures Recursion CIS265/506: Chapter 06 - Recursion.

Starting Out with C++ Early Objects Eighth Edition

The dirty secrets of objects

Tuesday, February 20, 2018 Announcements… For Today… 4+ For Next Time…

C++ for Engineers and Scientists Second Edition

FIRST AND SECOND-ORDER TRANSIENT CIRCUITS

Dynamic Memory Allocation

C Passing arrays to a Function

Outline Notes (not in presentation)

Object - Oriented Programming Language

Building the Support for Radar Processing Across Memory Hierarchies:

The Relational Algebra and Relational Calculus

Chapter 6 Intermediate-Code Generation

Monolithic Compiler Experiments Using C++ Expression Templates*

Arrays We often want to organize objects or primitive data in a way that makes them easy to access and change. An array is simple but powerful way to.

Constructors and Other Tools

Object Oriented Programming in java

Monolithic Compiler Experiments Using C++ Expression Templates*

Building the Support for Radar Processing Across Memory Hierarchies:

Building the Support for Radar Processing Across Memory Hierarchies:

Building the Support for Radar Processing Across Memory Hierarchies:

(HPEC-LB) Outline Notes

Building the Support for Radar Processing Across Memory Hierarchies:

Building the Support for Radar Processing Across Memory Hierarchies:

Building the Support for Radar Processing Across Memory Hierarchies:

Overloading functions

<PETE> Shape Programmability

Outline Notes (not in presentation)

Object - Oriented Programming Language

VIRTUAL FUNCTIONS RITIKA SHARMA.

Submitted By : Veenu Saini Lecturer (IT)

Introduction to Programming

Managing 2D Arrays Simple Dynamic Allocation

Object - Oriented Programming Language

Building the Support for Radar Processing Across Memory Hierarchies:

SPL – PS2 C++ Memory Handling.

Presentation transcript:

Memory Efficient Radar Processing Paper: An Array Class with Shape and C++ Expression Templates by Lenore R. Mullin, Xingmin Luo and Lawrence A. Bush

Motivation Paper Summary The objective of the paper/ and more generally our project is to enable efficient / fast array computations. Applications - This is important to many scientific programming applications such as dsp computations. Strategies - Various strategies have been used in this pursuit, for example 1,2,3. Our strategy is different from these strategies, but is related to the expression template strategies used in MTL and PETE. Our Strategy Using PETE (portable expression template engine). PETE is a library that facilitates loop unrolling for computations. This greatly speeds it up because it removes most temporary arrays from the computation. Integrate Psi with PETE In our paper, we take steps toward this goal. In order to do this, we wrote a specialized multi-dimensional array class which works with pete. This was needed in order to give pete n-dimensional capability. It will also serve as a platform to integrate Psi into PETE. As part of our paper, we tested the class to show that the performance of our class using pete was comparable to the performance of hand coded C for the same multi-dimensional array operations. Our strategy wants to extend pete to use psi calculus rules. Psi calculus rules are one strategy to significantly speed up computations because it reduces intermediate computations to index manipulations (putting it briefly) which has the added effect of elimination many intermediate arrays used to store intermediate computations. i.e. td convolution and moa design Paper Summary:

Array Class N – Dimensionl Psi-Calculus Platform Fast Why do we need shape.h? A: so that we can do multi dimensional arrays in pete. We also want our array class to be extendable to use psi calculus.

Results:

Expression Tree + C + A B Show how it is evaluated (tree) Explain that the nodes are just templated types and are not actually evaluated. The evaluation is done by the overloaded assignment operator rather than cascading operator overloading. A B

Type const Expression <BinaryNode<OpAdd, Reference<Array>, &expr = A + B + C Show templated type for the A = B + C + D expression. Related notes: The last step in making Vec3 PETE-compatible is to provide a way for PETE to assign to a Vec3 from an arbitrary expression. This is done by overloading operator= to take a PETE expression as input, and copy values into its owner: 064 template<class RHS> 065 Vec3 &operator=(const Expression<RHS> &rhs) 066 { 067 d[0] = forEach(rhs, EvalLeaf1(0), OpCombine()); 068 d[1] = forEach(rhs, EvalLeaf1(1), OpCombine()); 069 d[2] = forEach(rhs, EvalLeaf1(2), OpCombine()); 070 071 return *this; 072 } The first thing to notice about this method is that it is templated on an arbitrary class RHS, but its single formal parameter has type Expression<RHS>. This combination means that the compiler can match it against anything that is wrapped in the generic PETE template Expression<>, and only against things that are wrapped in that way. The compiler cannot match against int, complex<short>, or GreatAuntJane_t, since these do not have the form Expression<RHS> for some type RHS. The forEach function is used to traverse expression trees. The first argument is the expression. The second argument is the leaf tag denoting the operation applied at the leaves. The third argument is a combiner tag, which is used to combine results at non-leaf nodes. By passing EvalLeaf1(0) in line 67, we are indicating that we want the Vec3s at the leaves to return the element at index 0. The LeafFunctor<Scalar<T>, EvalLeaf1> (defined inside of PETE) ensures that scalars return their value no matter the index. While EvalLeaf1 obtains values from the leaves, OpCombine takes these values and combines them according to the operators present at the non-leaf nodes. The result is that line 67 evaluates the expression on the right side of the assignment operator at index 0. Line 68 does this at index 1, and so on. Once evaluation is complete, operator= returns the Vec3 to which values have been assigned, in keeping with normal C++ conventions.

Future Work Psi – Calculus Implementation Benefit Psi complements this (reverse). Psi will improve this (to remove temporaries) How to implement: 1-Iterator like concept. 2-Index composition using expression templates -> special scalar – like type that is copied by value but only performed once per array operation regardless of the matrix size. Removes more temporaries which pete cannot No intermediate computations on itself (i.e. reverse). Eliminates the entire loop (not merely reducing it to one (or the minimum number). Reduces loop to a constant time indexing operation rather than a loop calculation.

End