High Performance Embedded Computing Software Initiative (HPEC-SI)

Slides:

Advertisements

Similar presentations

The HPEC Challenge Benchmark Suite

Advertisements

Agenda Definitions Evolution of Programming Languages and Personal Computers The C Language.

Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.

Android is a trademark of Google Inc. Use of this trademark is subject to Google Permissions. Linux is the registered trademark of Linus Torvalds in the.

Slide-1 HPEC-SI MITRE AFRL MIT Lincoln Laboratory High Performance Embedded Computing Software Initiative (HPEC-SI) This work is sponsored.

Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.

©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.

© 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, Scott Geaghan, Myra Jean Prelle, Brian Bouzas,

1 소프트웨어공학 강좌 Chap 9. Distributed Systems Architectures - Architectural design for software that executes on more than one processor -

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

MIT Lincoln Laboratory hch-1 HCH 5/26/2016 Achieving Portable Task and Data Parallelism on Parallel Signal Processing Architectures Hank Hoffmann.

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

MIT Lincoln Laboratory S3p-HPEC-jvk.ppt S 3 P: Automatic, Optimized Mapping of Signal Processing Applications to Parallel Architectures Mr. Henry.

Cousins HPEC 2002 Session 4: Emerging High Performance Software David Cousins Division Scientist High Performance Computing Dept. Newport,

HPEC2002_Session1 1 DRM 11/11/2015 MIT Lincoln Laboratory Session 1: Novel Hardware Architectures David R. Martinez 24 September 2002 This work is sponsored.

XYZ 11/13/2015 MIT Lincoln Laboratory 300x Matlab Dr. Jeremy Kepner MIT Lincoln Laboratory September 25, 2002 HPEC Workshop Lexington, MA This.

Haney - 1 HPEC 9/28/2004 MIT Lincoln Laboratory pMatlab Takes the HPCchallenge Ryan Haney, Hahn Kim, Andrew Funk, Jeremy Kepner, Charles Rader, Albert.

Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.

Hardware Benchmark Results for An Ultra-High Performance Architecture for Embedded Defense Signal and Image Processing Applications September 29, 2004.

An Open Architecture for an Embedded Signal Processing Subsystem

Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.

Implementation of a Shipboard Ballistic Missile Defense Processing Application Using the High Performance Embedded Computing Software Initiative (HPEC-SI)

Slide-1 Parallel MATLAB MIT Lincoln Laboratory Multicore Programming in pMatlab using Distributed Arrays Jeremy Kepner MIT Lincoln Laboratory This work.

25 April Unified Cryptologic Architecture: A Framework for a Service Based Architecture Unified Cryptologic Architecture: A Framework for a Service.

© 2003 Mercury Computer Systems, Inc. Data Reorganization Interface (DRI) Kenneth Cain Jr. Mercury Computer Systems, Inc. High Performance Embedded Computing.

1 VSIPL++: Parallel Performance HPEC 2004 CodeSourcery, LLC September 30, 2004.

High Performance Flexible DSP Infrastructure Based on MPI and VSIPL 7th Annual Workshop on High Performance Embedded Computing MIT Lincoln Laboratory

PTCN-HPEC-02-1 AIR 25Sept02 MIT Lincoln Laboratory Resource Management for Digital Signal Processing via Distributed Parallel Computing Albert I. Reuther.

MIT Lincoln Laboratory 1 of 4 MAA 3/8/2016 Development of a Real-Time Parallel UHF SAR Image Processor Matthew Alexander, Michael Vai, Thomas Emberley,

Distributed Systems Architectures Chapter 12. Objectives  To explain the advantages and disadvantages of different distributed systems architectures.

Distributed Systems Architectures. Topics covered l Client-server architectures l Distributed object architectures l Inter-organisational computing.

Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.

Computer System Structures

Using Ada-C/C++ Changer as a Converter Automatically convert to C/C++ to reuse or redeploy your Ada code Eliminate the need for a costly and.

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Introduction Edited by Enas Naffar using the following textbooks: - A concise introduction to Software Engineering - Software Engineering for students-

Embedded Digital Systems Group

Parallel Matlab programming using Distributed Arrays

Albert I. Reuther & Joel Goodman HPEC Sept 2003

M. Richards1 ,D. Campbell1 (presenter), R. Judd2, J. Lebak3, and R

LOCO Extract – Transform - Load

Self Healing and Dynamic Construction Framework:

SOFTWARE DESIGN AND ARCHITECTURE

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

Embedded Systems Design

Distribution and components

Parallel Vector & Signal Processing

CORBA Within the OS & Its Implementation

Department of Electrical & Computer Engineering

Parallel Processing in ROSA II

Introduction Edited by Enas Naffar using the following textbooks: - A concise introduction to Software Engineering - Software Engineering for students-

Matthew Lyon James Montgomery Lucas Scotta 13 October 2010

Introduction to cosynthesis Rabi Mahapatra CSCE617

Unit# 9: Computer Program Development

Ch 15 –part 3 -design evaluation

Multicore Programming in pMatlab using Distributed Arrays

CSE8380 Parallel and Distributed Processing Presentation

The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.

Introduction to Teradata

COMP60621 Fundamentals of Parallel and Distributed Systems

Multithreaded Programming

Mr. Henry Hoffmann, Dr. Jeremy Kepner, Mr. Robert Bond

The Challenge of Cross - Language Interoperability

Automated Analysis and Code Generation for Domain-Specific Models

VSIPL++: Parallel Performance HPEC 2004

Overview of Workflows: Why Use Them?

COMP60611 Fundamentals of Parallel and Distributed Systems

Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

High Performance Embedded Computing Software Initiative (HPEC-SI) Dr. Jeremy Kepner / Lincoln Laboratory This work is sponsored by the Department of Defense under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government. Title Slide: Technical and program review of HPEC-SI effort 1 1

Outline Introduction Demonstration Development Applied Research Goals Program Structure Introduction Demonstration Development Applied Research Future Challenges Summary Outline Slide: Introduction to program goals and program structure

Overview - High Performance Embedded Computing (HPEC) Initiative Common Imagery Processor (CIP) HPEC Software Initiative Programs Demonstration Development Applied Research DARPA Embedded multi-processor Shared memory server ASARS-2 Enhanced Tactical Radar Correlator (ETRAC) Core activity is transition of advanced software research into DoD programs Challenge: Transition advanced software technology and practices into major defense acquisition programs

Why Is DoD Concerned with Embedded Software? Source: “HPEC Market Study” March 2001 Estimated DoD expenditures for embedded signal and image processing hardware and software ($B) DoD HPEC SIP hardware and software costs are increasing COTS acquisition practices have shifted the burden from “point design” hardware to “point design” software Software costs for embedded systems could be reduced by one-third with improved programming models, methodologies, and standards

System Development/Acquisition Stages Issues with Current HPEC Development Inadequacy of Software Practices & Standards NSSN AEGIS Rivet Joint Standard Missile Predator Global Hawk U-2 JSTARS MSAT-Air P-3/APS-137 F-16 MK-48 Torpedo High Performance Embedded Computing pervasive through DoD applications Airborne Radar Insertion program 85% software rewrite for each hardware platform Missile common processor Processor board costs < $100k Software development costs > $100M Torpedo upgrade Two software re-writes required after changes in hardware design System Development/Acquisition Stages 4 Years Program Milestones System Tech. Development System Field Demonstration Engineering/ manufacturing Development Insertion to Military Asset Signal Processor Evolution 1st gen. 2nd gen. 3rd gen. 4th gen. 5th gen. 6th gen. Today – Embedded Software Is: Not portable Not scalable Difficult to develop Expensive to maintain DoD timescale is 10 years, COTS timescale is 18 months.

Evolution of Software Support Towards “Write Once, Run Anywhere/Anysize” DoD software development COTS development Application Application Application Application Middleware Application software has traditionally been tied to the hardware Middleware Embedded Standards Vendor Software Vendor Software Vendor Software Vendor SW Many acquisition programs are developing stove-piped middleware “standards” 1990 2000 2005 Open software standards can provide portability, performance, and productivity benefits Application Application Application Natural evolution of software is towards portable middleware and scalable applications Middleware Support “Write Once, Run Anywhere/Anysize”

Quantitative Goals & Impact Program Goals Develop and integrate software technologies for embedded parallel systems to address portability, productivity, and performance Engage acquisition community to promote technology insertion Deliver quantifiable benefits Performance (1.5x) Portability (3x) Productivity (3x) HPEC Software Initiative Demonstrate Develop Prototype Object Oriented Open Standards Interoperable & Scalable The goal of this initiative is to improve in a quantifiable manner the practice of developing software for embedded signal and image processing. Past and current efforts have demonstrated that the use of standards should provide at least a 3x improvement in software portability. Using object oriented programming languages like C++ typically provides a 3x reduction in code size. Current efforts indicate that domain of signal and image processing should expect the same level of code compaction as seen in other domains. Past and current efforts have demonstrated that performance overhead of using object oriented languages can be eliminated and can even result in performance improvements as high as 1.5x. Portability: reduction in lines-of-code to change port/scale to new system Productivity: reduction in overall lines-of-code Performance: computation and communication benchmarks

Organization Demonstration Dr. Keith Bromley SPAWAR Dr. Richard Games MITRE Dr. Jeremy Kepner, MIT/LL Mr. Brian Sroka MITRE Mr. Ron Williams MITRE ... Government Lead Dr. Rich Linderman AFRL Technical Advisory Board Mr. John Grosh OSD Mr. Bob Graybill DARPA/ITO Dr. Mark Richards GTRI Dr. Jeremy Kepner MIT/LL Executive Committee Dr. Charles Holland PADUSD(S+T) … Development Dr. James Lebak MIT/LL Mr. Dan Campbell GTRI Mr. Ken Cain MERCURY Mr. Randy Judd SPAWAR Applied Research Mr. Bob Bond MIT/LL Mr. Ken Flowers MERCURY Dr. Spaanenburg PENTUM Mr. Dennis Cottel SPAWAR Capt. Bergmann AFRL Dr. Tony Skjellum MPISoft Advanced Research Mr. Bob Graybill DARPA The Embedded Software Initiative involves a wide range of participants. The effort has a large number of participants from DoD Labs, FFRDCs and Universities. As the effort progresses vendor and contractor representatives will be added to the working groups. Partnership with ODUSD(S&T), Government Labs, FFRDCs, Universities, Contractors, Vendors and DoD programs Over 100 participants from over 20 organizations

HPEC-SI Capability Phases First demo successfully completed SeconddDemo Selected VSIPL++ v0.8 spec completed VSIPL++ v0.2 code available Parallel VSIPL++ v0.1 spec completed High performance C++ demonstrated Time Phase 3 Applied Research: Hybrid Architectures Phase 2 Applied Research: Fault tolerance Development: Fault tolerance prototype Phase 1 Applied Research: Unified Comp/Comm Lib Development: Unified Comp/Comm Lib Parallel VSIPL++ Demonstration: Unified Comp/Comm Lib prototype Development: Object-Oriented Standards Demonstration: Object-Oriented Standards VSIPL++ Functionality Parallel VSIPL++ Demonstration: Existing Standards The embedded software initiative has three planned phases each of which is culminated by demonstrations. The demonstrations have specific numerical targets in terms of portability, productivity and scalability. For all demonstrations compute and communication performance will also be measured with the expectation of no loss in performance. Unified embedded computation/ communication standard Demonstrate scalability VSIPL++ VSIPL MPI High-level code abstraction (AEGIS) Reduce code size 3x Demonstrate insertions into fielded systems (CIP) Demonstrate 3x portability

Outline Introduction Demonstration Development Applied Research Future Challenges Summary Common Imagery Processor AEGIS BMD (planned) Outline Slide: Demonstrations in Common Imagery Processor and AEGIS BMD

Common Imagery Processor - Demonstration Overview - Common Imagery Processor (CIP) is a cross-service component TEG and TES Sample list of CIP modes U-2 (ASARS-2, SYERS) F/A-18 ATARS (EO/IR/APG-73) LO HAE UAV (EO, SAR) System Manager 38.5” CIP* ETRAC Common Imagery Processor is used in a wide variety of DoD assets JSIPS-N and TIS JSIPS and CARS * CIP picture courtesy of Northrop Grumman Corporation

Common Imagery Processor - Demonstration Overview - Demonstrate standards-based platform-independent CIP processing (ASARS-2) Assess performance of current COTS portability standards (MPI, VSIPL) Validate SW development productivity of emerging Data Reorganization Interface MITRE and Northrop Grumman Embedded Multicomputers Common Imagery Processor APG 73 SAR IF The first demonstration of this Initiative is to port one mode of the APG 73 image formation pipeline to an embedded multi-processor using open software standards. Single code base optimized for all high performance architectures provides future flexibility Shared-Memory Servers Commodity Clusters Massively Parallel Processors

Embedded Multicomputers Software Ports Embedded Multicomputers CSPI - 500MHz PPC7410 (vendor loan) Mercury - 500MHz PPC7410 (vendor loan) Sky - 333MHz PPC7400 (vendor loan) Sky - 500MHz PPC7410 (vendor loan) Mainstream Servers HP/COMPAQ ES40LP - 833-MHz Alpha ev6 (CIP hardware) HP/COMPAQ ES40 - 500-MHz Alpha ev6 (CIP hardware) SGI Origin 2000 - 250MHz R10k (CIP hardware) SGI Origin 3800 - 400MHz R12k (ARL MSRC) IBM 1.3GHz Power 4 (ARL MSRC) Generic LINUX Cluster CIP software successfully ported to a wide range of HPEC and HPC platforms

Portability: SLOC Comparison ~1% Increase ~5% Increase CIP software showed minimal code bloat Sequential VSIPL Shared Memory VSIPL DRI VSIPL

Shared Memory / CIP Server versus Distributed Memory / Embedded Vendor Latency requirement Shared memory limit for this Alpha server HPEC systems able to meet requirement with 8 CPUs Application can now exploit many more processors, embedded processors (3x form factor advantage) and Linux clusters (3x cost advantage)

Form Factor Improvements Current Configuration Possible Configuration IOP IOP IFP1 6U VME IFP2 CIP could accommodate more CPUs or shrink overall form factor IOP: 6U VME chassis (9 slots potentially available) IFP: HP/COMPAQ ES40LP IOP could support 2 G4 IFPs form factor reduction (x2) 6U VME can support 5 G4 IFPs processing capability increase (x2.5)

HPEC-SI Goals 1st Demo Achievements Portability: zero code changes required Productivity: DRI code 6x smaller vs MPI (est*) Performance: 2x reduced cost or form factor Demonstrate Achieved Goal 3x Portability Achieved* Goal 3x Productivity Object Oriented Open Standards HPEC Software Initiative All quantitative goals were achieved or acceded Portability: reduction in lines-of-code to change port/scale to new system Productivity: reduction in overall lines-of-code Performance: computation and communication benchmarks Interoperable & Scalable Develop Prototype Performance Goal 1.5x Achieved

Outline Introduction Demonstration Development Applied Research Future Challenges Summary Object Oriented (VSIPL++) Parallel (||VSIPL++) Outline Slides: Development of VSIPL++ and ||VSIPL++ standards

Emergence of Component Standards Parallel Embedded Processor System Controller Node Controller Data Communication: MPI, MPI/RT, DRI Control Communication: CORBA, HP-CORBA P0 P1 P2 P3 Consoles Other Computers Computation: VSIPL VSIPL++, ||VSIPL++ HPEC-SI builds on a number of existing HPEC and mainstream standards Definitions VSIPL = Vector, Signal, and Image Processing Library ||VSIPL++ = Parallel Object Oriented VSIPL MPI = Message-passing interface MPI/RT = MPI real-time DRI = Data Re-org Interface CORBA = Common Object Request Broker Architecture HP-CORBA = High Performance CORBA HPEC Initiative - Builds on completed research and existing standards and libraries

VSIPL++ Productivity Examples BLAS zherk Routine BLAS = Basic Linear Algebra Subprograms Hermitian matrix M: conjug(M) = Mt zherk performs a rank-k update of Hermitian matrix C: C  a  A  conjug(A)t + b  C VSIPL code A = vsip_cmcreate_d(10,15,VSIP_ROW,MEM_NONE); C = vsip_cmcreate_d(10,10,VSIP_ROW,MEM_NONE); tmp = vsip_cmcreate_d(10,10,VSIP_ROW,MEM_NONE); vsip_cmprodh_d(A,A,tmp); /* A*conjug(A)t */ vsip_rscmmul_d(alpha,tmp,tmp);/* a*A*conjug(A)t */ vsip_rscmmul_d(beta,C,C); /* b*C */ vsip_cmadd_d(tmp,C,C); /* a*A*conjug(A)t + b*C */ vsip_cblockdestroy(vsip_cmdestroy_d(tmp)); vsip_cblockdestroy(vsip_cmdestroy_d(C)); vsip_cblockdestroy(vsip_cmdestroy_d(A)); VSIPL++ code (also parallel) Matrix<complex<double> > A(10,15); Matrix<complex<double> > C(10,10); C = alpha * prodh(A,A) + beta * C; VSIPL++ code will be smaller and easier to understand Sonar Example K-W Beamformer Converted C VSIPL to VSIPL++ 2.5x less SLOCs

PVL PowerPC AltiVec Experiments Results Hand coded loop achieves good performance, but is problem specific and low level Optimized VSIPL performs well for simple expressions, worse for more complex expressions PETE style array operators perform almost as well as the hand-coded loop and are general, can be composed, and are high-level A=B+C A=B+C*D+E*F A=B+C*D A=B+C*D+E/F Software Technology VSIPL++ code can also take better advantage of special hardware features AltiVec loop VSIPL (vendor optimized) PETE with AltiVec C For loop Direct use of AltiVec extensions Assumes unit stride Assumes vector alignment C AltiVec aware VSIPro Core Lite (www.mpi-softtech.com) No multiply-add Cannot assume unit stride Cannot assume vector alignment C++ PETE operators Indirect use of AltiVec extensions Assumes unit stride Assumes vector alignment

Parallel Pipeline Mapping Signal Processing Algorithm Filter XOUT = FIR(XIN ) Beamform XOUT = w *XIN Mapping Detect XOUT = |XIN|>c Parallel Computer Parallel Mapping Challenge. Mapping is the process of deciding how an algorithm is assigned to elements of the hardware. In this example a three stage pipelin is mapped to distinct sets of processors in a parallel computing. Preserving the software investment requires the ability to write software that will run efficiently on a variety of computing systems without having to be rewritten. The primary technical challenge in writing parallel software is how to assign different parts of the algorithm to processors in the parallel computing hardware. This process is referred to as “mapping” (see Figure above) and is analogous to the “place and route” problem encountered by hardware designers. Thus, for a program to run a variety of systems it must be map independent (i.e. the algorithm description does not depend upon the details of the parallel hardware). To achieve this level of hardware independence it is most efficient to write applications on top of a library that is built of hardware independent pieces. The library, or middleware, that we have developed to provide this capability is called the Parallel Vector Library or PVL. Data Parallel within stages Task/Pipeline Parallel across stages 1 1

Scalable Approach Lincoln Parallel Vector Library (PVL) Single Processor Mapping #include <Vector.h> #include <AddPvl.h> void addVectors(aMap, bMap, cMap) { Vector< Complex<Float> > a(‘a’, aMap, LENGTH); Vector< Complex<Float> > b(‘b’, bMap, LENGTH); Vector< Complex<Float> > c(‘c’, cMap, LENGTH); b = 1; c = 2; a=b+c; } A = B + C Multi Processor Mapping A = B + C Scalable approach separate mapping from the algorithm Single processor and multi-processor code are the same Maps can be changed without changing software High level code is compact Lincoln Parallel Vector Library (PVL)

Outline Introduction Demonstration Development Applied Research Future Challenges Summary Fault Tolerance Parallel Specification Hybrid Architectures (see SBR) Outline Slide: Applied research being conducted on Fault Tolerance, Parallel Specification and Hybrid Architectures

Dynamic Mapping for Fault Tolerance Nodes: 0,2 XIN XOUT Map0 Nodes: 0,1 Map2 Nodes: 1,3 Input Task Output Task Failure Dynamic Mapping for Fault Tolerance. The separation of mapping from the algorithm allows the application to more easily adapt to changing hardware by modify mapping at runtime. Remapping a matrix to spare processors allows the program to react to failed processors without requiring any fundamental change in the signal processing algorithm. The fundamental goal of ||VSIPL++ is to be able to write software that is independent of the underlying hardware; this includes if the hardware needs to change because of failure. By abstracting the hardware details, it is possible to implement a variety of fault tolerance strategies without having to modify the core signal processing algorithm. For example, if a matrix was built on one set of processors, it is not difficult to destroy the matrix and rebuild on another set of processors by simply providing the matrix with a new map (see Figure above). Spare Parallel Processor Switching processors is accomplished by switching maps No change to algorithm required Developing requirements for ||VSIPL++ 1 1

Parallel Specification Clutter Calculation (Linux Cluster) % Initialize pMATLAB_Init; Ncpus=comm_vars.comm_size; % Map X to first half and Y to second half. mapX=map([1 Ncpus/2],{},[1:Ncpus/2]) mapY=map([Ncpus/2 1],{},[Ncpus/2+1:Ncpus]); % Create arrays. X = complex(rand(N,M,mapX),rand(N,M,mapX)); Y = complex(zeros(N,M,mapY); % Initialize coefficents coefs = ... weights = ... % Parallel filter + corner turn. Y(:,:) = conv2(coefs,X); % Parallel matrix multiply. Y(:,:) = weights*Y; % Finalize pMATLAB and exit. pMATLAB_Finalize; exit; Parallel performance Speedup Parallel Matlab allows parallel specification using same constructs as ||VSIPL++ Number of Processors Matlab is the main specification language for signal processing pMatlab allows parallel specifciations using same mapping constructs being developed for ||VSIPL++

Outline Introduction Demonstration Development Applied Research Future Challenges Summary Future Challenges to be addressed by HPEC-SI program

Optimal Mapping of Complex Algorithms Application Input XIN Low Pass Filter XIN W1 FIR1 XOUT W2 FIR2 Beamform XIN W3 mult XOUT Matched Filter XIN W4 FFT IFFT XOUT Different Optimal Maps Intel Cluster Optimal Mapping Challenge. Real applications can easily involve several several stages with potentially thousands of mapped vectors, matrices and computations. The maps that produce the best performance will change depending on the hardware. Automated capabilities are necessary to generate maps and to determine which maps are the best. PVL allows algorithm concerns to be separated from mapping concerns. However, the current issue of determining the mapping of an application has not changed. As software is moved from one piece of hardware to another, the mapping will need to change (see Figure above). The current approach is for humans with both knowledge of the algorithm and the parallel computer to use intuition and simple rules to try estimate what the best mapping will be for a particular application on a particular hardware. This method is time consuming, labor intensive, inaccurate and does not in any way guarantee and optimal solution. The generation of optimal mapping is an important problem because without this key piece of information it is not possible to write truly portable parallel software. To address this problem, we have developed a framework called Self-optimizing Software for Signal Processing (S3P). Workstation Embedded Multi-computer Embedded Board PowerPC Cluster Hardware Need to automate process of mapping algorithm to hardware 1 1

HPEC-SI Future Challenges Time End of 5 Year Plan Phase 5 Applied Research: Higher Languages (Java?) Phase 4 Applied Research: PCA/Self-optimization Development: Self-optimization prototype Phase 3 Applied Research: Hybrid Architectures Development: Hybrid Architectures Hybrid VSIPL Demonstration: Hybrid Architectures prototype Development: Fault tolerance Demonstration: Fault tolerance FT VSIPL Functionality Hybrid VSIPL Demonstration: Unified Comp/Comm Lib HPEC-SI 5 Year plan ends after phase 3. Future phases are being planned to leverage HPEC-SI technology transition model. Portability across architectures RISC/FPGA Transparency Fault Tolerant VSIPL Parallel VSIPL++ Demonstrate Fault Tolerance Increased reliability Unified Comp/Comm Standard Demonstrate scalability

Summary HPEC-SI Program on track toward changing software practice in DoD HPEC Signal and Image Processing Outside funding obtained for DoD program specific activities (on top of core HPEC-SI effort) 1st Demo completed; 2nd selected Worlds first parallel, object oriented standard Applied research into task/pipeline parallelism; fault tolerance; parallel specification Keys to success Program Office Support: 5 Year Time horizon better match to DoD program development Quantitative goals for portability, productivity and performance Engineering community support Summary of HPEC-SI program accomplishments and keys to success

Web Links High Performance Embedded Computing Workshop http://www. ll Web Links High Performance Embedded Computing Workshop http://www.ll.mit.edu/HPEC High Performance Embedded Computing Software Initiative http://www.hpec-si.org/ Vector, Signal, and Image Processing Library http://www.vsipl.org/ MPI Software Technologies, Inc. http://www.mpi-softtech.com/ Data Reorganization Initiative http://www.data-re.org/ CodeSourcery, LLC http://www.codesourcery.com/ MatlabMPI http://www.ll.mit.edu/MatlabMPI Important HPEC-SI weblinks