Targeting Tiled Architectures in Design Exploration

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

10th Reconfigurable Architecture Workshop, RAW’03, Nice, France, Tuesday, April 22,
The Microprocessor is no more General Purpose. Design Gap.
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
5th International Conference, HiPEAC 2010 MEMORY-AWARE APPLICATION MAPPING ON COARSE-GRAINED RECONFIGURABLE ARRAYS Yongjoo Kim, Jongeun Lee *, Aviral Shrivastava.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
Preventing Piracy and Reverse Engineering of SRAM FPGAs Bitstream Lilian Bossuet 1,
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
Department of Electrical and Computer Engineering Configurable computing for high-security/high-performance ambient systems 1 Guy Gogniat, Lilian Bossuet,
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Dynamically Reconfigurable Architectures: An Overview Juanjo Noguera Dept. Computer Architecture (DAC-UPC)
Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
ECE669: Lecture 24 aSoC: A Scalable On-Chip Communication Architecture Russell Tessier, Jian Liang, Andrew Laffely, and Wayne Burleson University of Massachusetts,
Yongjoo Kim*, Jongeun Lee**, Jinyong Lee*, Toan Mai**, Ingoo Heo* and Yunheung Paek* *Seoul National University **UNIST (Ulsan National Institute of Science.
B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Automated Design of Custom Architecture Tulika Mitra
1 Rapid Estimation of Power Consumption for Hybrid FPGAs Chun Hok Ho 1, Philip Leong 2, Wayne Luk 1, Steve Wilton 3 1 Department of Computing, Imperial.
Efficient Mapping onto Coarse-Grained Reconfigurable Architectures using Graph Drawing based Algorithm Jonghee Yoon, Aviral Shrivastava *, Minwook Ahn,
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION 03/26/
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
Jason Li Jeremy Fowers 1. Speedups and Energy Reductions From Mapping DSP Applications on an Embedded Reconfigurable System Michalis D. Galanis, Gregory.
Metrics for Reconfigurable Architectures Characterization: Remanence and Scalability Pascal BENOIT G. Sassatelli – L. Torres – D. Demigny M. Robert – G.
High Performance, Low Power Reconfigurable Processor for Embedded Systems Farhad Mehdipour, Hamid Noori, Koji Inoue, Kazuaki Murakami Kyushu University,
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group January 9,
Hyunchul Park†, Kevin Fan†, Scott Mahlke†,
1 Power-Aware System on a Chip A. Laffely, J. Liang, R. Tessier, C. A. Moritz, W. Burleson University of Massachusetts Amherst Boston Area Architecture.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.
Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro
Fang Fang James C. Hoe Markus Püschel Smarahara Misra
System-on-Chip Design
Topics Coarse-grained FPGAs. Reconfigurable systems.
Dynamo: A Runtime Codesign Environment
Ph.D. in Computer Science
ELEC 7770 Advanced VLSI Design Spring 2016 Introduction
Conception of parallel algorithms
SmartCell: A Coarse-Grained Reconfigurable Architecture for High Performance and Low Power Embedded Computing Xinming Huang Depart. Of Electrical and Computer.
Instructor: Dr. Phillip Jones
CGRA Express: Accelerating Execution using Dynamic Operation Fusion
Improving java performance using Dynamic Method Migration on FPGAs
CSE-591 Compilers for Embedded Systems Code transformations and compile time data management techniques for application mapping onto SIMD-style Coarse-grained.
ELEC 7770 Advanced VLSI Design Spring 2014 Introduction
Dynamically Reconfigurable Architectures: An Overview
Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke
ELEC 7770 Advanced VLSI Design Spring 2012 Introduction
Embedded systems, Lab 1: notes
ELEC 7770 Advanced VLSI Design Spring 2010 Introduction
Characteristics of Reconfigurable Hardware
Final Project presentation
Department of Electrical Engineering Joint work with Jiong Luo
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Presentation transcript:

Targeting Tiled Architectures in Design Exploration Approche de conception d’interface de communication pour les systèmes sur puce Targeting Tiled Architectures in Design Exploration Lilian Bossuet1, Wayne Burleson2, Guy Gogniat1, Vikas Anand2, Andrew Laffely2, Jean-Luc Philippe1 1 LESTER Lab Université de Bretagne Sud Lorient, France {lilian.bossuet, guy.gogniat, jean-luc.philippe}@univ-ubs.fr 2 Department of Electrical and Computer Engineering University of Massachusetts, Amherst, USA {burleson, vanand, alaffely}@ecs.umass.edu Approche de conception d’interface de communication pour les systèmes sur puce

Outline Introduction: Design Space Exploration Design Space of Reconfigurable Architecture A Target Architecture: aSoC Proposition of Design Space Exploration Flow Results Conclusion and Future Work

Design Space Exploration: Motivations Approche de conception d’interface de communication pour les systèmes sur puce Design Space Exploration: Motivations Design solutions for new telecommunication and multimedia applications targeting embedded systems Optimization and reduction of SoC power consumption Increase computing performance Increase parallelism Increase speed Be flexible Take into account run-time reconfiguration Targeting multi-granularity (heterogeneous) architectures Approche de conception d’interface de communication pour les systèmes sur puce

Design Space Exploration: Flow Approche de conception d’interface de communication pour les systèmes sur puce Design Space Exploration: Flow Progressive design space reduction: iterative exploration refinement of architecture model increase of performance estimation accuracy One level of abstraction for one level of estimation accuracy Approche de conception d’interface de communication pour les systèmes sur puce

Outline Introduction: Design Exploration Flow Principe Design Space of Reconfigurable Architecture A Target Architecture: aSoC Proposition of Design Space Exploration Flow Results Conclusion and Future Works

Reconfigurable Architectures Bridging the flexibility gap between ASICs and microprocessor [Hartenstein DATE 2001] Energy efficient and solution to low power programmable DSP [Rabaey ICASSP 1997, FPL 2000] Run Time Reconfigurable [Compton & Hauck 1999] => A key ingredient for future silicon platforms [Schaumont & all. DAC 2001]

Design Space of Reconfigurable Architecture RECONFIGURABLE ARCHITECTURES (R-SOC) FINE GRAIN (FPGA) MULTI GRANULARITY (Heterogeneous) COARSE GRAIN (Systolic) Processor + Coprocessor Tile-Based Architecture Island Topology Hierarchical Topology Coarse Grain Coprocessor Fine Grain Coprocessor Mesh Topology Linear Topology Hierarchical Topology Xilinx Virtex Xilinx Spartran Atmel AT40K Lattice ispXPGA Altera Stratix Altera Apex Altera Cyclone Chameleon REMARC Morphosys Pleiades Garp FIPSOC Triscend E5 Triscend A7 Xilinx Virtex-II Pro Altera Excalibur Atmel FPSIC aSoC E-FPFA RAW CHESS MATRIX KressArray Systolix Pulsedsp Systolic Ring RaPiD PipeRench DART FPFA

Outline Introduction: Design Exploration Flow Principe Design Space of Reconfigurable Architecture A Target Architecture: aSoC Proposition of Design Space Exploration Flow Results Conclusion and Future Works

A Target Architecture: aSoC Adaptive System-on-a-Chip (aSoC) Tiled architecture containing many heterogeneous processing cores (RISC, DSP, FPGA, Motion Estimation, Viterbi Decoder) Mesh communication network controlled with statically determined communication schedule A scalable architecture.

aSoC Architecture tile Heterogeneous Cores Point-to-point connections ctrl South Core West North East Communication Interface tile Heterogeneous Cores Point-to-point connections uProc MUL FPGA MUL

aSoC Communications Interface Interface Crossbar inter-tile transfer tile to core transfer Interconnect/Instruction Memory contains instructions to configure the interface crossbar (cycle-by-cycle) Interface Controller selects the instruction Coreports data interface and storage for transfers with the tile IP core Dynamic Voltage and Frequency Selection Dynamic Power Management Core Coreports Interface Crossbar North North South South East East West West Outputs Inputs Local Config . Local Decoder Controller Frequency & Voltage North to South & East PC Instruction Memory

aSoC Exploration ... Type of tiles Number of each type of tile Placement of the tiles Intern architecture of reconfigurable tiles (FPGA core) Communication scheduling

Outline Introduction: Design Exploration Flow Principe Design Space of Reconfigurable Architecture A Target Architecture: aSoC Proposition of Design Space Exploration Flow Results Conclusion and Future Work

Design Space Exploration: Goals Approche de conception d’interface de communication pour les systèmes sur puce Design Space Exploration: Goals Goal: Rapid exploration of various architectural solutions to be implemented on heterogeneous reconfigurable architectures (aSoC) in order to select the most efficient architecture for one or several applications Take place before architectural synthesis (algorithmic specification with high level abstraction language) Estimations are based on a functional architecture model (generic, technology-independent) Iterative exploration flow to progressively refine the architecture definition, from a coarse model to a dedicated model Approche de conception d’interface de communication pour les systèmes sur puce

Design Exploration Flow Targeting Tiled Architecture SPECIFICATION C to HCDFG parser Function F 2 HCDFG Graphs of the application Application App 1 Model of the aSOC Architectures Tile T aSOC A Analysis Tile Exploration Results of the Tile exploration step Performance 11 , C , Occ 21 12 22 Builder Static Communication Scheduling Final model of aSOC architecture THF Model HF Model

Application Analysis Use of algorithmic metrics and dedicated scheduling algorithms to highlight the target architectures Algorithmic metrics: Characterize the application orientation Processing Memory Control Characterize the application potential parallelism

Tile Exploration: with 3 steps Projection: Link between necessary resources (application) and available resources (tile) Use of an allocation algorithm based on communication costs reduction Composition: Take into account of the function scheduling to estimate additional resources (register, mux, …) Estimation: performance interval computation (lower and upper bounds) speed/resource utilization/power characterization

aSoC Builder Environment AppMapper Partition and assignment Approche de conception d’interface de communication pour les systèmes sur puce aSoC Builder Environment AppMapper Partition and assignment based on Run Time Estimation Compilation Communication Scheduling Core compilation Generate tiles configuration Communications instructions Bitstreams (for reconfigurable tile) RISC instructions Approche de conception d’interface de communication pour les systèmes sur puce

aSoC Analysis Use the results of previous steps Approche de conception d’interface de communication pour les systèmes sur puce aSoC Analysis Use the results of previous steps Functions scheduling Tile allocation Communication scheduling Complete estimation of the proposed solution Global execution time Global power consumption Total area Approche de conception d’interface de communication pour les systèmes sur puce

Outline Introduction: Design Exploration Flow Principe Design Space of Reconfigurable Architecture A Target Architecture: aSoC Proposition of Design Space Exploration Flow Results Conclusion and Future Work

Results aSoC architecture (UMASS) AppMapper (UMASS) Prototype of aSoC interconnect Technology 0.18 µm Clock speed of 400 MHz AppMapper (UMASS) Several mapped applications Matrix operations Median Filter Viterbi decoder DCT Tile exploration (UBS) Application analysis Intelligent Camera (motion detection) Matching Pursuit Projection step Lee DCT Matrix operations

Outline Introduction: Design Exploration Flow Principe Design Space of Reconfigurable Architecture A Target Architecture: aSoC Proposition of Design Space Exploration Flow Results Conclusion and Future Work

Conclusion and future work Original design exploration flow working at a high level of abstraction Fast and flexible (use of functional view of the architectures) Targeting an efficient reconfigurable architecture: aSoC Statically-scheduled, point-to-point communications Future Work Development of larger set of design exploration benchmarks Exploration of other configurable systems

Thank you ...

Approche de conception d’interface de communication pour les systèmes sur puce Previous Work Xplorer - University of Kaiserslautern, Germany [Hartenstein PATMOS 2000] Targets a mesh coarse grain architecture: The KressArray a fast reconfigurable ALUs Gives design guidance concerning: the size of the array, the available operators, the communication architecture and the connection structure. Controlled by performance and power estimations. Starts with high level specification of application (ALE-X language). RAW - Massachusetts Institute of Technology, USA [Moritz FCCM 1998] Targets a reminiscent coarse grained FPGA: The MIT Raw Microprocessor Answers to the balance problem: to determine the best division of VLSI resources among computing, memory and communication. Answers to the grain problem: to determine the optimum size of each architecture tiles Use several models: architecture model, costs model and performance model Approche de conception d’interface de communication pour les systèmes sur puce

HCDFG: Hierarchical Control Data Flow Graph Loop CORE X Y# C Y MAC ALU A Task 1 Task 2 F2 F1 F5 F4 F3 HCDFG HDFG LOOP DFG CDFG

Application’s Metrics Y. Le Moullec, N. Ben Amor, J-Ph. Diguet, M. Abid and J-L. Philippe. Multi-Granularity Metrics for the Era of Strongly Personalized SOCs. In DATE 2002, Munich, Germany, March 2002 Average Parallelism metric (a lot of parallelism if γ is high) Nb of global memory accesses and processing operations Critical Path γ = Nb of global memory accesses Nb of processing operations + Nb of global memory accesses MOM = Memory Orientation Metric [0,1] Nb of test Nb of global mem. accesses + Nb of proc. op. + Nb of test COM = Control Orientation Metric [0,1]