Enabling Technologies for Reconfigurable Computing Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design Part 2: Data-Stream-based.

Slides:



Advertisements
Similar presentations
Embedded System, A Brief Introduction
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Enabling Technologies for Reconfigurable Computing Enabling Technologies for Reconfigurable Computing Part 2: Stream-based Computing for RC Wednesday,
The von Neumann Syndrome Reiner Hartenstein TU Kaiserslautern TU Delft, Sept 28, (v.2)
Reconfigurable Supercomputing means to brave the paradigm chasm Reiner Hartenstein HiPEAC Workshop on Reconfigurable Computing Ghent, Belgium January 28,
EEE226 MICROPROCESSORBY DR. ZAINI ABDUL HALIM School of Electrical & Electronic Engineering USM.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
ICECS 2002 IEEE 9th International Conference on Electronics, Circuits and Systems Trends in Reconfigurable Logic and Reconfigurable Computing (invited.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Enabling Technologies for Reconfigurable Computing Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design Part 2: Data-Stream-based.
Enabling Technologies for System-on-Chip Development Reconfigurable Computing Architectures and Methodologies for System-on-Chip Monday, November 19, 10:15.
The 5th IEEE Workshop on Design & Diagnosis of Electronic Circuits & Systems (DDECS'02)DDECS'02 Configware / Software Co-Design: be prepared for the Next.
Course-Grained Reconfigurable Devices. 2 Dataflow Machines General Structure:  ALU-computing elements,  Programmable interconnections,  I/O components.
Reconfigurable Supercomputing: Hindernisse und Chancen Reiner Hartenstein TU Kaiserslautern Universität Mannheim, 13. Dez
IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April , 2004.
From Organic Computing to Reconfigurable Computing Reiner Hartenstein TU Kaiserslautern PASA, Frankfurt, March 16, 2006.
Reconfigurable HPC Reconfigurable HPC part 1 Introduction Reiner Hartenstein TU Kaiserslautern May 14, 2004, TU Tallinn, Estonia.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
PSU CS 106 Computing Fundamentals II Introduction HM 1/3/2009.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Enabling Technologies for System-on-Chip Development Reconfigurable Computing Architectures and Methodologies for System-on-Chip Monday, November 19, 10:15.
Chapter 6 Memory and Programmable Logic Devices
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
Computer Organization
1 In Summary Need more computing power Improve the operating speed of processors & other components constrained by the speed of light, thermodynamic laws,
Computer System Architectures Computer System Software
The Transdisciplinary Responsibility of CS Curricula Reiner Hartenstein TU Kaiserslautern San Diego, CA, USA, June , 2006 THE NINTH WORLD CONFERENCE.
Computer Architecture
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Archs, VHDL 3 Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Extreme Makeover for EDA Industry
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
TRIPS – An EDGE Instruction Set Architecture Chirag Shah April 24, 2008.
Automated Design of Custom Architecture Tulika Mitra
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
DSD2001 Reconfigurable Computing: the Roadmap to a New Business Model – and its Impact on SoC Design TS4: Tuesday, hrs Reiner Hartenstein University.
OPERATING SYSTEMS Goals of the course Definitions of operating systems Operating system goals What is not an operating system Computer architecture O/S.
System Design with CoWare N2C - Overview. 2 Agenda q Overview –CoWare background and focus –Understanding current design flows –CoWare technology overview.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
COE 405 Design and Modeling of Digital Systems
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Computer Architecture And Organization UNIT-II General System Architecture.
VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,
EE3A1 Computer Hardware and Digital Design
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
Computer Architecture 2 nd year (computer and Information Sc.)
Reconfigurable HPC Notes on datastream-based FFT
Stored Program A stored-program digital computer is one that keeps its programmed instructions, as well as its data, in read-write,
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
Fundamentals of Programming Languages-II
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
M U N - February 15, Phil Bording1 Computer Engineering of Wave Machines for Seismic Modeling and Seismic Migration R. Phillip Bording February.
Vector computers.
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
DSD2001 Reconfigurable Computing: a New Business Model – and its Impact on SoC Design Reiner Hartenstein University of Kaiserslautern Warzaw, Sept. 4 -
Chapter Overview General Concepts IA-32 Processor Architecture
Computer Organization and Architecture Lecture 1 : Introduction
ECE354 Embedded Systems Introduction C Andras Moritz.
Mihir Awatramani Lakshmi kiran Tondehal Xinying Wang Y. Ravi Chandra
Embedded Architectures: Configurable, Re-configurable, or what?
A High Performance SoC: PkunityTM
Chapter 1 Introduction.
HIGH LEVEL SYNTHESIS.
Presentation transcript:

Enabling Technologies for Reconfigurable Computing Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design Part 2: Data-Stream-based Computing - Reiner Hartenstein University of Kaiserslautern July 8, 2002, ENST, Paris, France

© 2002, University of Kaiserslautern 2 Schedule timeslot – Reconfigurable Computing (RC) – coffee break – Data-Stream-based Computing – lunch break – Resources for RC and Data-Stream-based Computing – Recent developments – Discussion

© 2002, University of Kaiserslautern 3 Opportunities by new patent laws ? to clever guys being keen on patents: don‘t file for patent following details ! everything shown in this presentation has been published years ago

© 2002, University of Kaiserslautern 4 >> EDA revolution EDA revolution Dead Supercomputer Data-Stream-based Computing Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

© 2002, University of Kaiserslautern 5 Makimoto’s 3rd wave [Hartenstein] The next EDA Industry Revolution 1978 Transistor entry: Applicon, Calma, CV Synthesis: Cadence, Synopsys Schematics entry: Daisy, Mentor, Valid... [Keutzer / Newton] EDA industry paradigm switching every 7 years 1999 (Co-) Compilation Data-Stream-based DPU arrays 2006

© 2002, University of Kaiserslautern 6 [Richard Newton]

© 2002, University of Kaiserslautern 7 Biggest Mistake in History

© 2002, University of Kaiserslautern 8 © 2001, University of Kaiserslautern missing the next revolution Ignoring Reconfigurable Computing and Data-stream-based Computing by teaching computing fundamentals within our CS curricula causing the waste billions of dollars. is one of the biggest mistakes in the history of information technology application [Hartenstein]

© 2002, University of Kaiserslautern 9 >> Dead Supercomputer EDA revolution Dead Supercomputer Data-Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

© 2002, University of Kaiserslautern 10 Dead Supercomputer Society ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/ Stellar/Stardent DAPP Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP Gould NPL Guiltech ICL Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories [Gordon Bell, keynote at ISCA 2000]. MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics

© 2002, University of Kaiserslautern 11 Dying Parallel Computing Society

© 2002, University of Kaiserslautern 12 >> Stream-based Computing EDA revolution Dead Supercomputer Data-Stream-based Computing Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

© 2002, University of Kaiserslautern 13 anti particles 1956: anti neutron created on Bevatron 1928: Paul Dirac: „there should be an anti electron having positive charge“ (Nobel price 1933) 1932: Carl David Anderson detected this „positron“ in cosmic radiation (Nobel price 1936) 1955 Owen Chamberlain et al. create anti proton on Bevatron 1954: new accelerators: cyclotron, like Berkeley‘s Bevatron 1965: creation of a deuterium anti nucleus at CERN hydrogen anti hydrogen 1995: hydrogen anti atom created at CERN – by forcing positron and anti proton to merge by very low energy..... but there are asymmetries” “in the universe should be regions of anti matter …

© 2002, University of Kaiserslautern 14 Matter & Antimatter: Atom and Anti Atom For the World of Matter The machine paradigm: the Atom For Anti Matter the machine paradigm: Anti Atom + + Electron spinning Positron spinning +

© 2002, University of Kaiserslautern 15 Matter & Antimatter of Informatics : Anti Machine paradigm instruction stream spinning Machine and Anti Machine + CPU st electronic computer (Konrad Zuse) Machine paradigm: „von Neumann“ 1946 v. N. machine paradigm st microprocessor (Ted Hoff) data stream spinning 1979 „data streams“ ( systolic array: Kung / Leiserson) 1995 rDPA / DPSS ( supersystolic: Rainer Kress) data-procedural - DPU anti machine paradigm published

© 2002, University of Kaiserslautern 16 RAM-based + CPU Data Path instruction sequencer RAM + simple machine paradigm + scalability + relocatability + compatibility = secret of success of software industry CPU:

© 2002, University of Kaiserslautern 17 Nasty Matter + CPU Data Path instruction sequencer RAM Address Computation Overhead Instruction Fetch Overhead central von Neumann bottleneck extremely power hungry and area inefficient performance problems reconfigurable? the wrong machine paradigm alw. new instruction sequencer needed

© 2002, University of Kaiserslautern 18 Parallelism by Concurrency independent instruction streams

© 2002, University of Kaiserslautern 19 Concurrent Computing.... Bus (es) or switch box Data Path instruction sequencer Data Path instruction sequencer Data Path instruction sequencer Data Path instruction sequencer Data Path instruction sequencer Data Path instruction sequencer extremely inefficient CPU massive switching activity at runtime may affect far beyond Amdahl‘s law

© 2002, University of Kaiserslautern 20 Coarse Grain Reconfigurable Arrays vs. Parallel Processes I-Seq ALU Parallelism at Process Level Parallelism at Datapath Level hardwired no instruction sequencing ! Data Sequencer rDPA reconfigurable

© 2002, University of Kaiserslautern 21 Some differences: CPU versus DPU + CPU Data Path instruction sequencer transport- triggered - DPU Data Path Unit DPU data streams external signal, or nothing central no vN bottleneck: multiple ports instruction fetch not at run time: no overhead data streams scheduled elsewhere RAM data sequencer RAM data sequencer RAM data sequencer … instruction stream routed here

© 2002, University of Kaiserslautern 22 machine paradigm: some differences + CPU - - DPA DPU + matter antimatter no. of streams = 1 no. of streams  1

© 2002, University of Kaiserslautern 23 DPA = DPU array - DPA - DPU DPA coherent data streams spinning around

© 2002, University of Kaiserslautern 24 >>> extremely high efficiency avoiding address computation overhead avoiding instruction fetch and interpretation overhead high parallelism, massively multiple deep pipelines much less configuration memory no routing areas to configure functions from CLBs

© 2002, University of Kaiserslautern 25 computing in space Computing in space and time data streams y 1 0  y 2 0 y y 1 y 2 y x 1 x 2 x computing in time a 12 a 11 a 21 a 32 a 31 a 23 a 33 a 22 a 13 placement systolic arrays etc. and other transformations migration by re-timing this dichotomy is completely ignored by our CS curricula

© 2002, University of Kaiserslautern 26 2 General Stream-based Computing System heterogenous Array of rDPUs (reconf. data path units) Scheduler Mapper expression tree DPU architectures y + * x a 1 simultaneous placement & routing * * * sh * xf - - data streams 4 The same mapper for both: Reconfigurable, or hardwired Kress DPSS [1995] simulated annealing free form pipe network time space

© 2002, University of Kaiserslautern 27 Super Pipe Networks The key is mapping, rather than architecture * *) KressArray [1995]

© 2002, University of Kaiserslautern 28 >> Design Space Explorers EDA revolution Dead Supercomputer Data-Stream-based Computing Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

© 2002, University of Kaiserslautern 29 domain-specific Reconfigurable Platforms will be suitable to cope with the 2 nd Design Crisis just as the general purpose massively parallel computer system general purpose is unrealistic an Illusion... KressArray Explorer... fully general purpose reconfigurable sometimes is....

© 2002, University of Kaiserslautern 30 Universal RAs: is it feasible?... such as obviously also the Universal Massively Parallel Computer Architecture... counter-example: Application Domain of Image Processing The General Purpose (coarse grain) Reconfigurable Array appears to be an Illusion... Motivation

© 2002, University of Kaiserslautern 31 -> Design Space Exploration Exploration: –Design Space Explorer (DSEs) –Platform Space Explorers (PSEs) –Compiler / PSE symbiosis –Parallel computing vs. reconfigurable Design Space Explorers: –For VLSI design in general –for parallel Computer Systems –Xplorer the only one f. reconfigurable platforms

© 2002, University of Kaiserslautern 32 Design Space Exploration Systems

© 2002, University of Kaiserslautern 33 >> KressArray Xplorer EDA revolution Dead Supercomputer Data-Stream-based Computing Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

© 2002, University of Kaiserslautern 34 Architecture & Mapping Editor Statistics KressArray DPSS Datastream Generator HDL Generator Simulator Datapath Generator Delay & Power Estimator Improvement Proposal Generator User DPSS Source Input KressArray (Design Space) Platform Space Explorer Xplorer Application Set accessible by internet: runs best with Netscape 4.6.1

© 2002, University of Kaiserslautern 35 >> Machine paradigms EDA revolution Dead Supercomputer Data-Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

© 2002, University of Kaiserslautern 36 © 2001, University of Kaiserslautern instructions program counter : state register Compiler RAM Datapath hardwired Sequencer Computer tightly coupled by compact instruction code “von Neumann” does not support soft data paths does not support soft data paths Datapath reconfigurable Computer: the wrong Machine Paradigm “von Neumann”

© 2002, University of Kaiserslautern 37 © 2001, University of Kaiserslautern Xputer Scheduler Compiler RAM (multiple) sequencer Datapath Array “instructions” University of Kaiserslautern loosely coupled by decision data bits only Xputer: The Soft Machine Paradigm reconfigurable also for hardwired Computer: the wrong Machine Paradigm “von Neumann” data stream spec there are some differences s data counter (anti machine)

© 2002, University of Kaiserslautern 38 Machine Paradigms

© 2002, University of Kaiserslautern 39 All Fundamental Concepts available Data Sequencer Methodology Data-procedural Languages (Duality w. v. N.)... supporting memory bandwidth optimization Soft Data Path Synthesis Algorithms Parallelizing Loop Transformation Methods Compilers supporting Soft Machines SW / CW Partitioning Co-Compilers Part 3

© 2002, University of Kaiserslautern 40 >> Co-Compilation EDA revolution Dead Supercomputer Data-Stream-based Computing Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

© 2002, University of Kaiserslautern 41 Changing Models of Computing “von Neumann” downloading RAM downloading data path instruction sequencer I / O (procedural) Software contemporary host hardwired downloading accelerator(s) CAD RAM reconfigurable computing host re- downloading conf. accelerator(s) RAM Software Configware both done at customer site Hardware designer needed done at vendor site ASIC s

© 2002, University of Kaiserslautern 42 Co-Compilation partitioning compiler high level programming language source  Processor Reconfigurable Accelerators interface Anti Machine Paradigm: Reconfigurable Architecture (rDPA) no CAD ! Compilation instead ! Hardware / Software Co-Design turns to Configware / Software Co-Design We introduce: Co-Compilation Computer Machine Paradigm Software running on “Soft” Anti Machine Configware running on

© 2002, University of Kaiserslautern 43 Jürgen Becker’s Co-DE-X Co-Compiler Analyzer / Profiler Host (vN) GNU C compiler paradigm Computer machine DPSS KressArray (rDPA) X-C compiler Anti machine paradigm Partitioner Loop Transfor- mations X-C is C language extended by MoPL X-C Resource Parameters supporting different platforms supporting platform-based design

© 2002, University of Kaiserslautern 44 Loop Transformation Examples loop 1-8 body endloop loop 1-8 body endloop loop 9-16 body endloop fork join strip mining loop 1-4 trigger endloop loop 1-2 trigger endloop loop 1-8 trigger endloop reconf.array: host: loop 1-16 body endloop sequential processes: resource parameter driven Co-Compilation loop unrolling

© 2002, University of Kaiserslautern 45 Future Coarse Grain RA Development It is indispensable to operate within the Convergence Area of Compilers, Co-Compilers, Architecture and full- custom-style VLSI Design (array cells). It is a must, that Products come with a Development Platform which encourages users,especially also those with a limited Hardware Background.

© 2002, University of Kaiserslautern 46 >> Design Space Explorers EDA revolution Dead Supercomputer Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

© 2002, University of Kaiserslautern 47 END

© 2002, University of Kaiserslautern 48 data counter instructions program counter : state register Compiler Memory Datapath hardwired Sequencer Computer tightly coupled by compact instruction code “von Neumann” does not support soft data paths does not support soft data paths Datapath reconfigurable Xputer Scheduler Compiler Memory multiple sequencer Datapath Array “instructions” University of Kaiserslautern loosely coupled by decision data bits only Xputer: The Soft Machine Paradigm reconfigurable also for hardwired Computer: the wrong Machine Paradigm “von Neumann”

© 2002, University of Kaiserslautern 49 Anti Machine Paradigm Xputer Parallel Xputer reconfigurable Scheduler Compiler Memory Sequencer Datapath “instructions” data counter Scheduler Compiler Sequencer Datapath Sequencer “instructions” data counters reconfigurable memory multiple Decision data only; i, e, loose coupling

© 2002, University of Kaiserslautern 50  Processor Co-Compilation partitioning compiler Computer Machine Paradigm Software running on Xputer “Soft” Machine Paradigm Configware running on GNU C compiler Analyzer / Profiler Hardware / Software Co-Design turns to Configware / Software Co-Design supporting different platforms Resource Parameters interface X-C compiler Reconfigurable Accelerators KressArray DPSS high level programming language source X-C Partitioner Jürgen Becker’s Co-DE-X Co-Compiler [ASP-DAC’95]

© 2002, University of Kaiserslautern 51 Computer: the wrong Machine Paradigm Compiler Memory Sequencer Decoder Datapath instructions program counter hardwired tightly coupled by a compact instruction code “von Neumann” does not support soft data paths: does not support soft data paths: “von Neumann” at run time: no instruction fetch : Instruction Sequencer Datapath reconfigurable

© 2002, University of Kaiserslautern 52 KressArray DPSS Application Set DPSS published at ASP-DAC 1995 Architecture Editor Mapping Editor statist. Data Delay Estim. Analyzer Architecture Estimator interm. form 2 expr. tree ALE-X Compiler Power Estimator Power Data VHDL Verilog HDL Generator Simulator User ALEX Code Improvement Proposal Generator Suggestion Selection User Interface interm. form 3 Mapper Design Rules Datapath Generator Kress rDPU Layout data stream Schedule Scheduler KressArray Xplorer (Platform Design Space Explorer) Xplorer Inference Engine (FOX) Sug- gest- ion KressArray family parameters Compiler Mapper Scheduler

© 2002, University of Kaiserslautern 53 Changing Models of Computation contemporary host hardwired Compiler accelerator(s) CAD RAM reconfigurable computing host re- Co-Compiler conf. accelerator(s) RAM Software Configware Machine paradigm EDA tools needed* ASIC s *) even 80% hardware people hate their tools both done at customer site done at vendor site no hardware experts needed

© 2002, University of Kaiserslautern 54 Machine Paradigms

© 2002, University of Kaiserslautern 55 KressArray Design Space Xplorer DPSS-N Data Path Systhesis System Analyser HDL Generator HDL Description.v Module Generator.krs Kress IP Library other IP Editor / User Interface Architecture Estimation Intermediate Format.map ALE-X Compiler ALE-X Code.alex User Mapper Interm. Format.map including configware code Technology Mapping Scheduler Data.seq Sequencing Code Kress rDPU.krs Layout Placement & Routing M a p p i n g Statistical Data.stat to Synthesis Environment

© 2002, University of Kaiserslautern 56 FPGA-Style Mapping for coarse grain reconfigurable arrays Compiler Mapper Scheduler specifies and assembles the data streams from / to array DPSS KressArray DPSS (Datapath Synthesis System)

© 2002, University of Kaiserslautern 57 Design Flow of Domain-specific Architecture Optimization Nageldinger’s KressArray Design Space Xplorer: including a Fuzzy Logic Improvement Proposal Generator accessible by internet: runs best with Netscape 4.6.1

© 2002, University of Kaiserslautern 58 History of Loop Transformations David Loveman, 1977, Allen and Kennedy, et al. Loop Unrolling, Loop Fusion, Strip Mining.... (Parameter-driven) Time to Time/Space Partitioning 1995/97 [Karin Schmidt / Jürgen Becker] : downto Datapath Level: e. g.: Transformation from Sequential Process to Super-systolic Multi-dimensional Loop Unrolling / Storage Scheme Optimization supporting burst-mode & parallel Memory Banks 2000 [Michael Herz] : optimized RA to Memory Communication Bandwidth: 70ies - 80ies: at Process Level: Sequential to Parallel Processes, incl. Vectorization

© 2002, University of Kaiserslautern 59 History of Loop Transformations For Sequential Programs on Parallel Computers: David Loveman, 1977, Allen and Kennedy, etc.: Loop Unrolling, Loop Fusion, Strip Mining.... For memory communication: Michael Herz (2000): Multi-Level Loop Unrolling to reduce Memory Cycles needed to create RA Data Streams For parallel Datapaths: Jürgen Becker (1997): to Sequential to Super-Systolic Transformation Optimize Throughput of Reconfigurable Arrays (RAs) Instruction Code vs. Reconfiguration Code

© 2002, University of Kaiserslautern 60 Paradigm Shift Mainstream Tornado Development of Hypergrowth Markets Harper Business 1995

© 2002, University of Kaiserslautern 61 EDA: where Electronics begins [Richard Newton] 1k Dataquest Initiative New book NASDAQ index EDA index

© 2002, University of Kaiserslautern 62 What is next after VHDL ? Motivations HDL-savvy designers needed New Business Model Co-Design never ending HDLs ? Extended HDLs – how far ? Automatic Partitioning

© 2002, University of Kaiserslautern 63 Hot Research Topic: Memory Architectures High Performance Embedded Memory Architectures High Performance Memory Communication Architectures [Herz] Custom Memory Management Methodology [Cathoor] Data Reuse Transformations [Kougia et al.] Data Reuse Exploration [Soudris, Wuytak]

© 2002, University of Kaiserslautern 64 RAs: Cache does not help the memory bandwidth problem is often more dramatic then for microprocessors interleaving is not practicable, since based on sequential instruction streams classical caches do not help, since instruction sequencing is not used the problem: throughput of parallel data streams, not instruction streams super pipe networks, no parallel computers ! Stream-based arrays are a memory bandwidth problem

© 2002, University of Kaiserslautern 65 Memory Communication Architecture hot research topic in embedded systems storage context transformations [Herz, others] for low power for high performance startups provide memory IP or generators

© 2002, University of Kaiserslautern 66 Stream-based Soft Machine Scheduler Memory (data memory) memory bank... “instructions” rDPA Compiler Sequencers (data stream generator)

© 2002, University of Kaiserslautern 67 Stream-based Computing DPU driven by data stream from / to memory or, from / to peripheral interface transport-triggered execution no instruction sequencer inside !

© 2002, University of Kaiserslautern 68 Stream-based Computing: (r) DPU array for both, reconfigurable, and, hardwired DPU driven by data streams

© 2002, University of Kaiserslautern 69 Systolic Stream-based Computing System Systolic Array [ H. T. Kung, 1980 ] : an array of DPUs (Data Path Units) DPU architecture y + * x a data streams equations placement linear projection or algebraic mapping The Mathematician’s Synthesis Method linear pipelines and uniform arrays only no routing!

© 2002, University of Kaiserslautern 70 Converging Design Flows this synthesis method is a generalization of systolic array synthesis: super systolic synthesis and DPA [Broderson, 2000]: terms: DPU: datpath unit DPA: data path array rDPU: reconfigurable DPU rDPA: reconfigurable DPA the same synthesis method may be used for mapping an algorithm onto both: rDPA [Kress, 1995],

© 2002, University of Kaiserslautern 71 Innovation Stalled ? [Richard Newton] What is next after VHDL ?