Building the Support for Radar Processing Across Memory Hierarchies:

Building the Support for Radar Processing Across Memory Hierarchies:
Paper: Building the Support for Radar Processing Across Memory Hierarchies: On the Development of an Array Class with Shape using C++ Expression Templates X- Our approach can combine the high performance and programmability together. Related Algorithms: TD Convolution Mention mapping Radar algorithms to memory hierarchies. (ODES paper/talk given explains this mapping using hand designed and derived algorithms for Convolution using psi calclulus) The idea of the talk is to mechanize the process by building MOA and Psi calc into <PETE>. This requires: adding shape understanding that PETE creates an AST the psi calculus rules rewrite the AST This idea was presented last year at HPEC. In this talk we restructure the two vector algorithms for convolution, lift them to higher dimensions. With the support for shapes in to pete we will be able to mechanize everything. Although this is not yet done; demonstrate that we do not have any performance degradation by supporting shapes and an array class. The + example is ok since adding the other ops will not change performance. These things must be on the slides with a story as above radar memory hierarchies Define N-dimensional array class with shape in order to support the mechanization linear transformations in the Psi-Calculus. The new array class extends the support for array operations in PETE by defining the shape for the array class. We ran the experiment on two different platforms yet got the similar result: with PETE and our array class, we achieved similar performance as is obtained using C Future work may be in adding additional algorothm methods to enable other psi calculus operations Application: RADAR, SAR by Lenore R. Mullin, Xingmin Luo and Lawrence A. Bush

Application: RADAR Paper summary
Application: start with Kong’s paper and argument. Over the last 5 decades, the synthetic aperture radar (SAR) has been developed as a unique imaging instrument with high resolution, day/night and all-weather operation capabilities. As a result, the SAR has been found a wide area of applications, including target detection, continuously observing dynamic phenomena such as seismic movement, ocean current, sea ice motion and classification of vegetation. In comparison with the spectral analysis (FFT) and frequency domain convolution, the time-domain (TD) analysis has been introduced and become the simplest and most accurate algorithm for SAR signal processing. As the time-modulated wave transmission and receiving by SAR, the TD algorithm directly process the signal echo by using the matched filters without approximation. However, the TD algorithm is also the most computational consuming, thus it can only be applied to size-limited SAR data. As the requirement of large-size and high-resolution SAR imagery, the investigation and development of a novel time-domain scheme is conducted with respect to the fast computational algorithm to implement the time-domain analysis. With the far .eld approximation and by considering the scatterers as non-dispersive media, the range and cross-range parameters are decoupled and the integral ends up with a closed form function. As a result, the SAR image simulation based on the analytical process largely increases the computational e.ciency and makes it possible to process large size of images. TD convolution is used for (i.e.) : TD-C is used in Radar DSP. It can be used in various configurations. In other words, for differenct purposes or using different strategies, it can be used in conjunction with other methods to glean the desire or most information or cleanest information from the signal. For example, (A frequency domain de-convolution approach for transmitter noise cancellation is being developed. The time domain radar return from distributed clutter is the convolution of the coded transmit pulse and the distributed clutter field. By taking the Fast Fourier Transform (FFT) of the distributed clutter return, the IPN contribution of the noisy transmit waveform can be removed by dividing it by the frequency spectrum of the measured transmit waveform. An IFFT is used to return to the time domain for subsequent MTI processing.) One method to remove clutter uses the TD convolution of the coded transmit pulse and the distributed clutter field. Then it is FFTed and certain noise is removed by dividing it by the frequency spectrum of the waveform. Then an IFFT to revert to the time domain. By: Adaptive Distributed Clutter Improvement Factor (ADDCIF) John Hoffman, Louis Vasquez, Charles Farthing, and Clarence Ng Systems Engineering Group, Inc. Point out the main issues to get to the efficient algorithm: non-materilaization psi rules (on indexing operations (i.e. take, drop, reverse) mapping to memory / processors X- Our approach can combine the high performance and programmability together. Related Algorithms: TD Convolution Mention mapping Radar algorithms to memory hierarchies. (ODES paper/talk given explains this mapping using hand designed and derived algorithms for Convolution using psi calclulus) The idea of the talk is to mechanize the process by building MOA and Psi calc into <PETE>. This requires: adding shape understanding that PETE creates an AST the psi calculus rules rewrite the AST

Processor/Memory Mapping
Approach Example: “raising” array dimensionality x: < … > < > < > P0 < > < > HPEC ’02 ODES ’02 This idea was presented last year at HPEC. HPEC ’02, showed that they could implement psi calculus with expression templates (take, drop, reverse). This was integrated into pete. In this talk we restructure the two vector algorithms for convolution, lift them to higher dimensions. ODES: TD Convolution – Processor Split is represented vertically. An the splitting of the problem using and arbitrary cache size is represented by the sum – matrix notation. Benefits of MOA (Mathematics of Arrays) and Psi Calculus – A processor/memory hierarchy can be modeled by reshaping data using an extra dimension for each level. Composition of monolithic operations can be re-expressed as compositions of operations on smaller data granularities that; match memory hierarchy levels and avoid materialization of intermediate arrays Algorithms cam be automatically, algebraically, transformed to reflect array reshapings above. Facilitate programming expressed at a high level Facilitate intentiional program designs and analysis. Facilitate portability and scalability. This approach is applicable to many problems in radar. For any given array expression, reduction rules from the Psi Calculus can be applied in a mechanical process guaranteed to produce an implementation having the least possible number of memory reads and writes (Prof. Mullin “A Mathematics of Arrays” PhD thesis 1998). < > < > P1 Memory Hierarchy Map: < > < > < > < > P2 < > < >

< > < > (T H ) -1 (T H ) -1 < > < > Σ Σ CACHE I=0 I=0 < > < > < > < > … X processor 0 … … (T H ) -1 (T H ) -1 Σ Σ < > < > CACHE I=0 I=0 < > < > < > < > (T H ) -1 (T H ) -1 < > < > Σ Σ CACHE I=0 I=0 < > < > < > < > … X processor 1 … … (T H ) -1 (T H ) -1 Σ Σ < > < > CACHE I=0 I=0 < > < >

Approach Example: “raising” array dimensionality x: < … > < > < > P0 < > < > ODES: TD Convolution – Processor Split is represented vertically. An the splitting of the problem using and arbitrary cache size is represented by the sum – matrix notation. HPEC ’02, showed that they could implement psi calculus with expression templates (take, drop, reverse). This was integrated into pete. < > < > P1 Memory Hierarchy Map: < > < > < > < > P2 < > < >

Shape (our contribution) AST type graph
With the support for shapes in pete we will be able to mechanize everything. Although this is not yet done; demonstrate that we do not have any performance degradation by supporting shapes and an array class. The + example is ok since adding the other ops will not change performance. These things must be on the slides with a story as above radar memory hierarchies Define N-dimensional array class with shape in order to support the mechanization linear transformations in the Psi-Calculus. The new array class extends the support for array operations in PETE by defining the shape for the array class. We ran the experiment on two different platforms yet got the similar result: with PETE and our array class, we achieved similar performance as is obtained using C Future work may be in adding additional algorothm methods to enable other psi calculus operations Application: RADAR, SAR Future Work:

Motivation Paper Summary
The objective of the paper/ and more generally our project is to enable efficient / fast array computations. Applications - This is important to many scientific programming applications such as dsp computations. Strategies - Various strategies have been used in this pursuit, for example 1,2,3. Our strategy is different from these strategies, but is related to the expression template strategies used in MTL and PETE. Our Strategy Using PETE (portable expression template engine). PETE is a library that facilitates loop unrolling for computations. This greatly speeds it up because it removes most temporary arrays from the computation. Integrate Psi with PETE In our paper, we take steps toward this goal. In order to do this, we wrote a specialized multi-dimensional array class which works with pete. This was needed in order to give pete n-dimensional capability. It will also serve as a platform to integrate Psi into PETE. As part of our paper, we tested the class to show that the performance of our class using pete was comparable to the performance of hand coded C for the same multi-dimensional array operations. Our strategy wants to extend pete to use psi calculus rules. Psi calculus rules are one strategy to significantly speed up computations because it reduces intermediate computations to index manipulations (putting it briefly) which has the added effect of elimination many intermediate arrays used to store intermediate computations. i.e. td convolution and moa design Paper Summary:

Array Class N – Dimensionl Psi-Calculus Platform Fast
Why do we need shape.h? A: so that we can do multi dimensional arrays in pete. We also want our array class to be extendable to use psi calculus.

Expression Tree + C + A B const Expression
<BinaryNode<OpAdd, Reference<Array>, BinaryNode<OpAdd, Reference<Array>, Reference<Array> > > > &expr = A + B + C C + Show how it is evaluated (tree) Explain that the nodes are just templated types and are not actually evaluated. The evaluation is done by the overloaded assignment operator rather than cascading operator overloading. A B

Expression Tree + C + A B Show how it is evaluated (tree)
Explain that the nodes are just templated types and are not actually evaluated. The evaluation is done by the overloaded assignment operator rather than cascading operator overloading. A B

Results:

Type const Expression <BinaryNode<OpAdd, Reference<Array>,
&expr = A + B + C Show templated type for the A = B + C + D expression. Related notes: The last step in making Vec3 PETE-compatible is to provide a way for PETE to assign to a Vec3 from an arbitrary expression. This is done by overloading operator= to take a PETE expression as input, and copy values into its owner: 064 template<class RHS> 065 Vec3 &operator=(const Expression<RHS> &rhs) 066 { 067 d[0] = forEach(rhs, EvalLeaf1(0), OpCombine()); 068 d[1] = forEach(rhs, EvalLeaf1(1), OpCombine()); 069 d[2] = forEach(rhs, EvalLeaf1(2), OpCombine()); return *this; 072 } The first thing to notice about this method is that it is templated on an arbitrary class RHS, but its single formal parameter has type Expression<RHS>. This combination means that the compiler can match it against anything that is wrapped in the generic PETE template Expression<>, and only against things that are wrapped in that way. The compiler cannot match against int, complex<short>, or GreatAuntJane_t, since these do not have the form Expression<RHS> for some type RHS. The forEach function is used to traverse expression trees. The first argument is the expression. The second argument is the leaf tag denoting the operation applied at the leaves. The third argument is a combiner tag, which is used to combine results at non-leaf nodes. By passing EvalLeaf1(0) in line 67, we are indicating that we want the Vec3s at the leaves to return the element at index 0. The LeafFunctor<Scalar<T>, EvalLeaf1> (defined inside of PETE) ensures that scalars return their value no matter the index. While EvalLeaf1 obtains values from the leaves, OpCombine takes these values and combines them according to the operators present at the non-leaf nodes. The result is that line 67 evaluates the expression on the right side of the assignment operator at index 0. Line 68 does this at index 1, and so on. Once evaluation is complete, operator= returns the Vec3 to which values have been assigned, in keeping with normal C++ conventions.

Future Work Psi – Calculus Implementation Benefit
Psi complements this (reverse). Psi will improve this (to remove temporaries) How to implement: 1-Iterator like concept. 2-Index composition using expression templates -> special scalar – like type that is copied by value but only performed once per array operation regardless of the matrix size. Removes more temporaries which pete cannot No intermediate computations on itself (i.e. reverse). Eliminates the entire loop (not merely reducing it to one (or the minimum number). Reduces loop to a constant time indexing operation rather than a loop calculation.

Implementing Psi Calculus with Expression Templates
Example: A=take(4,drop(3,rev(B))) B=< > A=< > 1. Form expression tree Recall: Psi Reduction for 1-d arrays always yields one or more expressions of the form: x[i]=y[stride*i+ offset] l ≤ i < u take drop rev B 4 3 2. Add size information 3. Apply Psi Reduction rules size=4 take size=4 A[i]=B[-i+6] 4 drop size=7 size=7 A[i] =B[-(i+3)+9] =B[-i+6] Size info Reduction 3 rev size=10 size=10 A[i] =B[-i+B.size-1] =B[-i+9] size=10 B size=10 A[i]=B[i] 4. Rewrite as sub-expressions with iterators at the leaves, and loop bounds information at the root Iterators used for efficiency, rather than recalculating indices for each i One “for” loop to evaluate each sub-expression iterator: offset=6 stride=-1 size=4

∑ ∑

Building the Support for Radar Processing Across Memory Hierarchies:

Similar presentations

Presentation on theme: "Building the Support for Radar Processing Across Memory Hierarchies:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Building the Support for Radar Processing Across Memory Hierarchies:

Similar presentations

Presentation on theme: "Building the Support for Radar Processing Across Memory Hierarchies:"— Presentation transcript:

Similar presentations

About project

Feedback