Autonomously Adaptive Computing: Coping with Scalability, Reliability, and Dynamism in Future Generations of Computing Roman Lysecky Department of Electrical.

Slides:



Advertisements
Similar presentations
Embedded System, A Brief Introduction
Advertisements

1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
FAULT TOLERANCE IN FPGA BASED SPACE-BORNE COMPUTING SYSTEMS Niharika Chatla Vibhav Kundalia
Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök.
The Warp Processor Dynamic SW/HW Partitioning David Mirabito A presentation based on the published works of Dr. Frank Vahid - Principal Investigator Dr.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department of Computer Science and Engineering.
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department.
Dynamic FPGA Routing for Just-in-Time Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer Science and Engineering.
The New Software: Invisible Ubiquitous FPGAs that Enable Next-Generation Embedded Systems Frank Vahid Professor Department of Computer Science and Engineering.
Warp Processing – Towards FPGA Ubiquity Frank Vahid Professor Department of Computer Science and Engineering University of California, Riverside Associate.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.
Dynamic Hardware/Software Partitioning: A First Approach Greg Stitt, Roman Lysecky, Frank Vahid* Department of Computer Science and Engineering University.
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Exploiting Parallelism
Codesigned On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also.
WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.
Roman Lysecky Department of Electrical and Computer Engineering University of Arizona Dynamic.
On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the.
ASIC/FPGA design flow. Design Flow Detailed Design Detailed Design Ideas Design Ideas Device Programming Device Programming Timing Simulation Timing Simulation.
Non-Intrusive Dynamic Application Profiling for Detailed Loop Execution Characterization Ajay Nair, Roman Lysecky Department of Electrical and Computer.
Warp Processing: Making FPGAs Ubiquitous via Invisible Synthesis Greg Stitt Department of Electrical and Computer Engineering University of Florida.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Reconfigurable Computing1 Reconfigurable Computing Part II.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Automated Software Generation and Hardware Coprocessor Synthesis for Data Adaptable Reconfigurable Systems Andrew Milakovich, Vijay Shankar Gopinath, Roman.
A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer.
Kandemir224/MAPLD Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering.
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
Introduction to the FPGA and Labs
DDC 2223 SYSTEM SOFTWARE DDC2223 SYSTEM SOFTWARE.
Programmable Hardware: Hardware or Software?
Dynamo: A Runtime Codesign Environment
ECE354 Embedded Systems Introduction C Andras Moritz.
ELEC 7770 Advanced VLSI Design Spring 2016 Introduction
Intermediate Fabrics: Virtual FPGA Architectures for Circuit Portability and Fast Placement and Routing on FPGAs James Coole PhD student, University of.
LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.
A Methodology for System-on-a-Programmable-Chip Resources Utilization
Instructor: Dr. Phillip Jones
Logic and Computer Design Fundamentals
FPGA: Real needs and limits
FPGAs in AWS and First Use Cases, Kees Vissers
Chapter 1: Introduction
Introduction to Reconfigurable Computing
Greg Stitt ECE Department University of Florida
ELEC 7770 Advanced VLSI Design Spring 2014 Introduction
Anne Pratoomtong ECE734, Spring2002
Introduction to cosynthesis Rabi Mahapatra CSCE617
EEL4930/5934 Reconfigurable Computing
Dynamically Reconfigurable Architectures: An Overview
ELEC 7770 Advanced VLSI Design Spring 2012 Introduction
ELEC 7770 Advanced VLSI Design Spring 2010 Introduction
Chapter 1 Introduction.
FPGA Glitch Power Analysis and Reduction
Dynamic FPGA Routing for Just-in-Time Compilation
A Self-Tuning Configurable Cache
Department of Electrical Engineering Joint work with Jiong Luo
Dynamic Hardware/Software Partitioning: A First Approach
Warp Processor: A Dynamically Reconfigurable Coprocessor
Karthik Shankar, Roman Lysecky
Karthik Shankar, Roman Lysecky
Automatic Tuning of Two-Level Caches to Embedded Applications
EEL4930/5934 Reconfigurable Computing
Presentation transcript:

Autonomously Adaptive Computing: Coping with Scalability, Reliability, and Dynamism in Future Generations of Computing Roman Lysecky Department of Electrical and Computer Engineering University of Arizona rlysecky@ece.arizona.edu http://www.ece.arizona.edu/~embedded Collaborators: Frank Vahid, Greg Stitt

Introduction Past & Present: Standard Software Binaries Software Binaries of the Past Binary directly related to processor’s ISA Limited portability Instructions were executed as specified and in-order Current Software Binary Specifies application functionality but not specific to underlying processor architecture Develop new architectures for existing applications SW Application Compiler SW Binary Processor Architecture Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Introduction Past & Present: Standard Software Binaries Standard SW Binary Enabled ecosystem of applications, compilers, and architectures Provided separation of concerns Applications: Developers can focus on application Choose appropriate programming language to capture functionality Architectures: Focus on improving and developing new architectures to execute SW binary more better Compilers: Focus on optimizing application for specific architecture SW Application Compiler SW Binary Processor Architecture Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Introduction Past & Present: Standard Software Binaries Processor Architectures Many alternative architectures can exists for a given standard binary Current Software Binary Specifies application functionality in well defined manner but not specific to underlying processor architecture Develop new architectures for existing applications SW Application Compiler SW Binary VLIW Processor Architecture SuperScalar Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Reconfigurable Computing Past & Present: FPGAs Field Programmable Gate Arrays (FPGAs) Reconfigurable device that can implement any circuit simply by downloading bits Basic logic elements are N-input look up table (LUT) and flip-flops Small N-address memory that stores truth table for any N-input logic function Arrays of LUTs and routing elements (called switch matrices (SM)) LUT a b c d e f o1 o2 o3 o4 FPGA SM CLB Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Reconfigurable Computing Past & Present: FPGAs FPGAs are sometimes better than microprocessors Provide concurrency from the bit-level to the application-level C Code for Bit Reversal Circuit for Bit Reversal Bit Reversed X Value Original X Value Requires only 1 cycle (speedup of 32x to 128x) Synthesis FPGA … x = (x >>16) | (x <<16); x = ((x >> 8) & 0x00ff00ff) | ((x << 8) & 0xff00ff00); x = ((x >> 4) & 0x0f0f0f0f) | ((x << 4) & 0xf0f0f0f0); x = ((x >> 2) & 0x33333333) | ((x << 2) & 0xcccccccc); x = ((x >> 1) & 0x55555555) | ((x << 1) & 0xaaaaaaaa); Compilation sll $v1[3],$v0[2],0x10 srl $v0[2],$v0[2],0x10 or $v0[2],$v1[3],$v0[2] srl $v1[3],$v0[2],0x8 and $v1[3],$v1[3],$t5[13] sll $v0[2],$v0[2],0x8 and $v0[2],$v0[2],$t4[12] srl $v1[3],$v0[2],0x4 and $v1[3],$v1[3],$t3[11] sll $v0[2],$v0[2],0x4 and $v0[2],$v0[2],$t2[10] ... Processor SW Binary Requires between 32 and 128 cycles Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Reconfigurable Computing Past & Present: FPGAs SW Application Software HW Circuit Compiler Synthesis SW Binary Software Bitstream Processor FPGA Hardware FPGA can implement “circuits” simply by download “software” bits Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Reconfigurable Computing Past & Present: FPGAs FPGAs can be combined with microprocessors Benefits of HW/SW Partitioning Speedup of 2X to 10X 1000X possible for some highly parallelizable applications Energy reduction of 25% to 95% Software Application (C/C++) Application Profiling Critical Kernels Partitioning µP I$ D$ COPROCESSOR (FPGA) HW SW Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Reconfigurable Computing Past & Present: FPGAs Why aren’t FPGAs common? Programmability Bitstream not standardized Solution: Hide FPGA from application developer Just like the underlying processor architecture is hidden Software Application (C/C++) Application Profiling Critical Kernels Partitioning µP I$ D$ COPROCESSOR (FPGA) HW SW Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Adaptive Computing Present: Warp Processing PROFILER DYNAMICALLY DETECTS APPLICATION’S KERNELS 2 Profiler APPLICATION INITIALLY EXECUTES ON MICROPROCESSOR 1 µP I$ D$ ON-CHIP CAD MAPS KERNELS ONTO FPGA 3 WARPED EXECUTION IS 2-100X FASTER – OR – CONSUMES 75% LESS POWER 5 W-FPGA On-chip CAD CONFIGURE FPGA AND UPDATE APPLICATION BINARY 4 Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Adaptive Computing Present: Warp Processing SW Binary Bitstream Circuit Partitioning Updated SW Binary Decompilation RT Synthesis Binary Update JIT FPGA Compilation Logic Synthesis Tech. Mapping/Packing Placement Routing µP On-chip CAD I$ D$ Profiler W-FPGA Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Adaptive Computing Present: Warp Processing Warp Processing – Adaptive Computing Embeds compiler/synthesis within architecture Autonomously adapts/optimizes software binary at runtime to improve performance or reduce power consumption Performance-Driven Warp Processing (Low End) Goal: Maximize application performance over software execution Target: Low to mid-range embedded processors E.g., 100-200 MHz ARM processor Average speedup of 7.4X across several embedded benchmark applications µP On-chip CAD I$ D$ Profiler W-FPGA Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Adaptive Computing Present: Warp Processing Performance-Driven Warp Processing (High End) Target: High-end 624 MHz XScale processor Average speedup of 2.5X compared to 624 MHz XScale processor µP On-chip CAD I$ D$ Profiler W-FPGA Max Speedup: 6X Avg Speedup: 2.5X Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Adaptive Computing Past & Present: Warp Processing Low-Power Warp Processing Goal: Reduce overall power consumption without any degradation in performance Leverage dynamic voltage/frequency scaling of processor and FPGA µP On-chip CAD I$ D$ Profiler W-FPGA 1 4 7 APPLICATION EXEC (µP/FPGA) SW EXECUTION (µP) WARPED HW/SW EXECUTION (µP/FPGA) LOW-POWER WARPED EXECUTION 2 3 5 6 ON-CHIP CAD PROFILE ON-CHIP CAD µP V/FREQ SCALING FPGA FREQ SCALING SW POWER POWER POWER REDUCTION NO PERF. DECREASE PERFORMANCE SW PERF Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Adaptive Computing Past & Present: Warp Processing Low-Power Warp Processing Goal: Reduce overall power consumption without any degradation in performance Leverage dynamic voltage/frequency scaling of processor and FPGA Average reduction in power consumption of 74% µP On-chip CAD I$ D$ Profiler W-FPGA Avg Reduction: 74% Max Reduction: 97% Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Adaptive Computing Present: Warp Processing Maintains ecosystem supported by standard software binary Optimize software binary execution without developer effort – or even knowledge thereof Builds upon software binary concept to leverage benefits of FPGAs For those applications where FPGAs are beneficial Optimized at runtime, where additional information may be known µP On-chip CAD I$ D$ Profiler W-FPGA Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Adaptive Computing Present: Warp Processing Maintains ecosystem supported by standard software binary Optimize software binary execution without developer effort – or even knowledge thereof Builds upon software binary concept to leverage benefits of FPGAs For those applications where FPGAs are beneficial Optimized at runtime, where additional information may be known µP On-chip CAD I$ D$ Profiler W-FPGA Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Compile/Optimize/ Reconfigure Adaptive Computing Future: How can adaptive computing be used to solve future challenges? Adaptive Computing Any computing systems that can modify its execution at runtime How the system modifies its execution defines a systems adaptability How can adaptive computing be used to solve future challenges? Modified Application Dynamically Monitor Compile/Optimize/ Reconfigure Adapt system? Application Binary SW Application Compiler SW Binary Processor Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Adaptive Computing Computing Challenges: Multicore and Many Core Systems Computing systems have been and will continue to incorporate multiple processor cores How can applications developers best reap the rewards of theses parallel implementations? How can compilers and architectures optimize an application? Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Adaptive Computing Computing Challenges: Multicore and Many Core Systems As the number of cores increases, a many core systems looks more and more like a reconfigurable computing device The biggest challenge moving forward is programmability Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Adaptive Computing Computing Challenges: Multicore and Many Core Systems Adaptive computing for multicore and many core systems Adaptive computing can be utilized to optimize an application execution on a many core device Application execution can be optimized to utilize the best resources for the current operations Runtime re-compilation, optimization, and synthesis Can optimize the application utilizing additional information gather at runtime FPGA DSP VLIW Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Adaptive Computing Computing Challenges: Dynamism Application behavior is increasingly becoming more complex and dynamic Statically optimizing an application at compile time may lead to sub-optimal results Analyzing both the computational aspects and data characteristics being processed are needed in order to fully optimize the application execution Data being processed is not known a priori Data Characteristics Application Binary Dynamically Monitor Adapt system? Compile/Optimize/ Reconfigure Modified Application Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Adaptive Computing Computing Challenges: Reliability Permanent and transient device failures are expected to increase as we continue to shrink Electro migration NBTI Individual hardware components may fail intermittently over time Applications should execute as if no failures occurred Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Adaptive Computing Computing Challenges: Reliability Self-healing Systems Computing systems that can detect and recover from device failures autonomously Different from fault tolerant systems Application Dynamically Monitor Is system executing correctly? No Heal System Modified Application/Architecture Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008

Summary Adaptive computing can autonomously optimize application execution at runtime Continuously and dynamically re-compile/synthesizes software implementations to hardware at runtime Can optimize performance, power, reliability, … Adaptive computing will be both beneficial and necessary for future computing systems Let developers focus on developing applications, not worrying about underlying architectures Kavli-NNIN Symposium on Computing Challenges, Cornell, October 12-14, 2008