CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 1 Ideas for the design of an ASIP for LQCD Target Compiler Technologies CASTNESS’11, Rome,

Slides:

Advertisements

Similar presentations

FPGA (Field Programmable Gate Array)

Advertisements

Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow

Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.

EECE **** Embedded System Design

Xtensa C and C++ Compiler Ding-Kai Chen

Introducing the ConnX D2 DSP Engine

TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.

A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.

 Understanding the Sources of Inefficiency in General-Purpose Chips.

SoC Subsystem Acceleration using Application-Specific Processors (ASIPs) Markus Willems Product Manager Synopsys.

Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.

Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.

Platforms, ASIPs and LISATek Federico Angiolini DEIS Università di Bologna.

Trevor Burton6/19/2015 Multiprocessors for DSP SYSC5603 Digital Signal Processing Microprocessors, Software and Applications.

2015/6/21\course\cpeg F\Topic-1.ppt1 CPEG 421/621 - Fall 2010 Topics I Fundamentals.

Processor Architectures and Program Mapping 5kk10 TU/e 2006 Henk Corporaal Jef van Meerbergen Bart Mesman.

Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.

Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis.

UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.

Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)

Unit VI. Keil µVision3/4 IDE for 8051 Tool for embedded firmware development Steps for using keil.

L29:Lower Power Embedded Architecture Design 성균관대학교 조 준 동 교수,

EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)

October 26, 2006 Parallel Image Processing Programming and Architecture IST PhD Lunch Seminar Wouter Caarls Quantitative Imaging Group.

Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.

1 3-Software Design Basics in Embedded Systems. 2 Development Environment Development processor  The processor on which we write and debug our programs.

A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,

Develop and Implementation of the Speex Vocoder on the TI C64+ DSP

Paper Review: XiSystem - A Reconfigurable Processor and System

A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.

Automated Design of Custom Architecture Tulika Mitra

Designing the WRAMP Dean Armstrong The University of Waikato.

Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.

ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.

Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.

SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.

1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.

Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation.

TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.

2015/10/22\course\cpeg323-08F\Final-Review F.ppt1 Midterm Review Introduction to Computer Systems Engineering (CPEG 323)

VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc

1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.

R2D2 team R2D2 team Reconfigurable and Retargetable Digital Devices  Application domains Mobile telecommunications  WCDMA/UMTS (Wideband Code Division.

Developing software and hardware in parallel Vladimir Rubanov ISP RAS.

- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.

ENG3050 Embedded Reconfigurable Computing Systems Application Specific Instruction Processors “ASIPS” Application Specific Instruction Processors “ASIPS”

Séminaire COSI-Roscoff’011 Séminaire COSI ’01 Power Driven Processor Array Partitionning for FPGA SoC S.Derrien, S. Rajopadhye.

EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)

A few issues on the design of future multicores André Seznec IRISA/INRIA.

Chapter 1 Introduction to the Systems Approach

Architecture Selection of a Flexible DSP Core Using Re- configurable System Software July 18, 1998 Jong-Yeol Lee Department of Electrical Engineering,

Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich

Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

NISC set computer no-instruction

Content Project Goals. Workflow Background. System configuration. Working environment. System simulation. System synthesis. Benchmark. Multicore.

3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,

Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.

Winter-Spring 2001Codesign of Embedded Systems1 Essential Issues in Codesign: Architectures Part of HW/SW Codesign of Embedded Systems Course (CE )

Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.

Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

SUBJECT : DIGITAL ELECTRONICS CLASS : SEM 3(B) TOPIC : INTRODUCTION OF VHDL.

Andreas Hoffmann Andreas Ropers Tim Kogel Stefan Pees Prof

ECE354 Embedded Systems Introduction C Andras Moritz.

Liquid computing – the rVEX approach

A Survey of Recent Media Processors

A High Performance SoC: PkunityTM

To DSP or Not to DSP? Chad Erven.

COMPUTER ORGANIZATION AND ARCHITECTURE

Martin Croome VP Business Development GreenWaves Technologies.

Presentation transcript:

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 1 Ideas for the design of an ASIP for LQCD Target Compiler Technologies CASTNESS’11, Rome, Italy

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 2 Agenda ASIPs and IP Designer EURETILE platform An ASIP for LQCD

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 3 ASIPs in Multi-Core SoC ASIP: Application-Specific Processor  Anything between general-purpose  P and hardwired data-path  Flexibility through programmability and design-time reconfigurability  High throughput, low energy through parallelism and specialization ASIP is foundation of heterogeneous multi-core SoC  Balanced SoC architecture offers best performance at lowest energy and lowest cost

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 4 Why ASIPs? Maximise performance  Specialisation  Parallelism: VLIW, SIMD, multi-core Minimise power dissipation  Specialisation  Parallelism: VLIW, SIMD, multi-core  Power-optimised RTL generation Leverage the benefits of programmability  React to changing requirements  Ship first for evolving standards  Remedy defects  Extend products to new markets without an SoC respin

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 5 IP Designer Tool Suite

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 6 nML – ASIP description language Structural skeleton reg V[4] ; trn vecr ; trn vecs ; trn vecd ; trn vect ; fu vec; fu vabs;... opn vec_adiff_opn(t:c2u, r:c2u) { action { stage E1: vecd = V[t] = vect = } syntax : "vadiff v"t ",v"r ",v"t; image : t::r; } Instruction-set grammar Example: architectural specialisation  Absolute-difference instruction in motion estimation Registers, busses, functional units Application specific data type ‘vector’ Primitive functions: vsub() vabs() Operation pattern: V  vabs()  vsub()  V, V

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 7 Agenda ASIPs and IP Designer EURETILE platform An ASIP for LQCD

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 8 EURETILE hardware platform  Communication  DNP  Control  RISC  Computation  DSP  ASIPs: specialised towards the application −Lattice quantum chromo dynamics (LQCD) −Neural network (Izhikevich)

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 9 Agenda ASIPs and IP Designer EURETILE platform An ASIP for LQCD

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 10 LQCD ASIP Goals  Increase performance  Decrease gate count or usage of FPGA blocks Means  Task level parallelism (multi tile architecture)  Data level parallelism  Instruction level parallelism  Architecture specialisation

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 11 LQCD ASIP Instruction level parallelism VU_1…VU_nLS_0…LS_m  VLIW instruction word  Arithmetic operations in parallel with load/store operations  Appropriate mix of n and m based on feedback from compilation of Qphi() function  n*m speed improvement over scalar architecture Data level parallelism c1c2c3  3-way SIMD fits with SU(3) matrix algebra  3x speed improvement over scalar architecture

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 12 LQCD ASIP Architecture specialisation: complex floating point operations: C + C, C + i*C→ 2x speedup over scalar architecture C – C, C – i*C C * R → 4x speedup over scalar architecture C * C → 8x speedup over scalar architecture …  Behaviour of floating point operations Defined in a C dialect intended for the modelling of functional units Translated into simulation and implementation (RTL) models Synthesis on standard cell library, mapping on FPGA primitives  Vector types and operators defined for the C compiler vector v1, va[4], vb[4]; v1 += va[0] * vb[1];

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 13 LQCD ASIP Architecture specialisation: address generation Goal: Vector units should be used every cycle, address generation must be done in parallel How: to be investigated, after feedback from C compiler! Deliverables  SDK (Compiler, Assembler, Linker, Simulator, Debugger) based on IP Designer  SystemC model  RTL Model + FPGA mapping

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 14

CASTNESS11, Rome Italy © 2011 Target Compiler Technologies L 15