Presentation is loading. Please wait.

Presentation is loading. Please wait.

Autonomously Adaptive Computing: Coping with Scalability, Reliability, and Dynamism in Future Generations of Computing Roman Lysecky Department of Electrical.

Similar presentations

Presentation on theme: "Autonomously Adaptive Computing: Coping with Scalability, Reliability, and Dynamism in Future Generations of Computing Roman Lysecky Department of Electrical."— Presentation transcript:

1 Autonomously Adaptive Computing: Coping with Scalability, Reliability, and Dynamism in Future Generations of Computing Roman Lysecky Department of Electrical and Computer Engineering University of Arizona Collaborators: Frank Vahid, Greg Stitt

2 Introduction Past & Present: Standard Software Binaries
Software Binaries of the Past Binary directly related to processor’s ISA Limited portability Instructions were executed as specified and in-order Current Software Binary Specifies application functionality but not specific to underlying processor architecture Develop new architectures for existing applications SW Application Compiler SW Binary Processor Architecture Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

3 Introduction Past & Present: Standard Software Binaries
Standard SW Binary Enabled ecosystem of applications, compilers, and architectures Provided separation of concerns Applications: Developers can focus on application Choose appropriate programming language to capture functionality Architectures: Focus on improving and developing new architectures to execute SW binary more better Compilers: Focus on optimizing application for specific architecture SW Application Compiler SW Binary Processor Architecture Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

4 Introduction Past & Present: Standard Software Binaries
Processor Architectures Many alternative architectures can exists for a given standard binary Current Software Binary Specifies application functionality in well defined manner but not specific to underlying processor architecture Develop new architectures for existing applications SW Application Compiler SW Binary VLIW Processor Architecture SuperScalar Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

5 Reconfigurable Computing Past & Present: FPGAs
Field Programmable Gate Arrays (FPGAs) Reconfigurable device that can implement any circuit simply by downloading bits Basic logic elements are N-input look up table (LUT) and flip-flops Small N-address memory that stores truth table for any N-input logic function Arrays of LUTs and routing elements (called switch matrices (SM)) LUT a b c d e f o1 o2 o3 o4 FPGA SM CLB Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

6 Reconfigurable Computing Past & Present: FPGAs
FPGAs are sometimes better than microprocessors Provide concurrency from the bit-level to the application-level C Code for Bit Reversal Circuit for Bit Reversal Bit Reversed X Value Original X Value Requires only 1 cycle (speedup of 32x to 128x) Synthesis FPGA x = (x >>16) | (x <<16); x = ((x >> 8) & 0x00ff00ff) | ((x << 8) & 0xff00ff00); x = ((x >> 4) & 0x0f0f0f0f) | ((x << 4) & 0xf0f0f0f0); x = ((x >> 2) & 0x ) | ((x << 2) & 0xcccccccc); x = ((x >> 1) & 0x ) | ((x << 1) & 0xaaaaaaaa); Compilation sll $v1[3],$v0[2],0x10 srl $v0[2],$v0[2],0x10 or $v0[2],$v1[3],$v0[2] srl $v1[3],$v0[2],0x8 and $v1[3],$v1[3],$t5[13] sll $v0[2],$v0[2],0x8 and $v0[2],$v0[2],$t4[12] srl $v1[3],$v0[2],0x4 and $v1[3],$v1[3],$t3[11] sll $v0[2],$v0[2],0x4 and $v0[2],$v0[2],$t2[10] ... Processor SW Binary Requires between 32 and 128 cycles Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

7 Reconfigurable Computing Past & Present: FPGAs
SW Application Software HW Circuit Compiler Synthesis SW Binary Software Bitstream Processor FPGA Hardware FPGA can implement “circuits” simply by download “software” bits Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

8 Reconfigurable Computing Past & Present: FPGAs
FPGAs can be combined with microprocessors Benefits of HW/SW Partitioning Speedup of 2X to 10X 1000X possible for some highly parallelizable applications Energy reduction of 25% to 95% Software Application (C/C++) Application Profiling Critical Kernels Partitioning µP I$ D$ COPROCESSOR (FPGA) HW SW Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

9 Reconfigurable Computing Past & Present: FPGAs
Why aren’t FPGAs common? Programmability Bitstream not standardized Solution: Hide FPGA from application developer Just like the underlying processor architecture is hidden Software Application (C/C++) Application Profiling Critical Kernels Partitioning µP I$ D$ COPROCESSOR (FPGA) HW SW Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

10 Adaptive Computing Present: Warp Processing

11 Adaptive Computing Present: Warp Processing
SW Binary Bitstream Circuit Partitioning Updated SW Binary Decompilation RT Synthesis Binary Update JIT FPGA Compilation Logic Synthesis Tech. Mapping/Packing Placement Routing µP On-chip CAD I$ D$ Profiler W-FPGA Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

12 Adaptive Computing Present: Warp Processing
Warp Processing – Adaptive Computing Embeds compiler/synthesis within architecture Autonomously adapts/optimizes software binary at runtime to improve performance or reduce power consumption Performance-Driven Warp Processing (Low End) Goal: Maximize application performance over software execution Target: Low to mid-range embedded processors E.g., MHz ARM processor Average speedup of 7.4X across several embedded benchmark applications µP On-chip CAD I$ D$ Profiler W-FPGA Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

13 Adaptive Computing Present: Warp Processing
Performance-Driven Warp Processing (High End) Target: High-end 624 MHz XScale processor Average speedup of 2.5X compared to 624 MHz XScale processor µP On-chip CAD I$ D$ Profiler W-FPGA Max Speedup: 6X Avg Speedup: 2.5X Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

14 Adaptive Computing Past & Present: Warp Processing
Low-Power Warp Processing Goal: Reduce overall power consumption without any degradation in performance Leverage dynamic voltage/frequency scaling of processor and FPGA µP On-chip CAD I$ D$ Profiler W-FPGA 1 4 7 APPLICATION EXEC (µP/FPGA) SW EXECUTION (µP) WARPED HW/SW EXECUTION (µP/FPGA) LOW-POWER WARPED EXECUTION 2 3 5 6 ON-CHIP CAD PROFILE ON-CHIP CAD µP V/FREQ SCALING FPGA FREQ SCALING SW POWER POWER POWER REDUCTION NO PERF. DECREASE PERFORMANCE SW PERF Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

15 Adaptive Computing Past & Present: Warp Processing
Low-Power Warp Processing Goal: Reduce overall power consumption without any degradation in performance Leverage dynamic voltage/frequency scaling of processor and FPGA Average reduction in power consumption of 74% µP On-chip CAD I$ D$ Profiler W-FPGA Avg Reduction: 74% Max Reduction: 97% Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

16 Adaptive Computing Present: Warp Processing
Maintains ecosystem supported by standard software binary Optimize software binary execution without developer effort – or even knowledge thereof Builds upon software binary concept to leverage benefits of FPGAs For those applications where FPGAs are beneficial Optimized at runtime, where additional information may be known µP On-chip CAD I$ D$ Profiler W-FPGA Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

17 Adaptive Computing Present: Warp Processing
Maintains ecosystem supported by standard software binary Optimize software binary execution without developer effort – or even knowledge thereof Builds upon software binary concept to leverage benefits of FPGAs For those applications where FPGAs are beneficial Optimized at runtime, where additional information may be known µP On-chip CAD I$ D$ Profiler W-FPGA Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

18 Compile/Optimize/ Reconfigure
Adaptive Computing Future: How can adaptive computing be used to solve future challenges? Adaptive Computing Any computing systems that can modify its execution at runtime How the system modifies its execution defines a systems adaptability How can adaptive computing be used to solve future challenges? Modified Application Dynamically Monitor Compile/Optimize/ Reconfigure Adapt system? Application Binary SW Application Compiler SW Binary Processor Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

19 Adaptive Computing Computing Challenges: Multicore and Many Core Systems
Computing systems have been and will continue to incorporate multiple processor cores How can applications developers best reap the rewards of theses parallel implementations? How can compilers and architectures optimize an application? Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

20 Adaptive Computing Computing Challenges: Multicore and Many Core Systems
As the number of cores increases, a many core systems looks more and more like a reconfigurable computing device The biggest challenge moving forward is programmability Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

21 Adaptive Computing Computing Challenges: Multicore and Many Core Systems
Adaptive computing for multicore and many core systems Adaptive computing can be utilized to optimize an application execution on a many core device Application execution can be optimized to utilize the best resources for the current operations Runtime re-compilation, optimization, and synthesis Can optimize the application utilizing additional information gather at runtime FPGA DSP VLIW Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

22 Adaptive Computing Computing Challenges: Dynamism
Application behavior is increasingly becoming more complex and dynamic Statically optimizing an application at compile time may lead to sub-optimal results Analyzing both the computational aspects and data characteristics being processed are needed in order to fully optimize the application execution Data being processed is not known a priori Data Characteristics Application Binary Dynamically Monitor Adapt system? Compile/Optimize/ Reconfigure Modified Application Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

23 Adaptive Computing Computing Challenges: Reliability
Permanent and transient device failures are expected to increase as we continue to shrink Electro migration NBTI Individual hardware components may fail intermittently over time Applications should execute as if no failures occurred Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

24 Adaptive Computing Computing Challenges: Reliability
Self-healing Systems Computing systems that can detect and recover from device failures autonomously Different from fault tolerant systems Application Dynamically Monitor Is system executing correctly? No Heal System Modified Application/Architecture Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

25 Summary Adaptive computing can autonomously optimize application execution at runtime Continuously and dynamically re-compile/synthesizes software implementations to hardware at runtime Can optimize performance, power, reliability, … Adaptive computing will be both beneficial and necessary for future computing systems Let developers focus on developing applications, not worrying about underlying architectures Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Download ppt "Autonomously Adaptive Computing: Coping with Scalability, Reliability, and Dynamism in Future Generations of Computing Roman Lysecky Department of Electrical."

Similar presentations

Ads by Google