DSD2001 Reconfigurable Computing: the Roadmap to a New Business Model – and its Impact on SoC Design TS4: Tuesday, 14.00 hrs Reiner Hartenstein University.

Slides:

Advertisements

Similar presentations

VHDL Design of Multifunctional RISC Processor on FPGA

Advertisements

Field Programmable Gate Array

FPGA (Field Programmable Gate Array)

Hao wang and Jyh-Charn (Steve) Liu

Enabling Technologies for Reconfigurable Computing Enabling Technologies for Reconfigurable Computing Part 2: Stream-based Computing for RC Wednesday,

Stonewalled Progress of Computing Efficiency 1 Reiner Hartenstein (keynote) SA - Sep 1 16: :50

Enabling Technologies for Reconfigurable Computing Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design Part 2: Data-Stream-based.

EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.

Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.

Enabling Technologies for System-on-Chip Development Reconfigurable Computing Architectures and Methodologies for System-on-Chip Monday, November 19, 10:15.

The 5th IEEE Workshop on Design & Diagnosis of Electronic Circuits & Systems (DDECS'02)DDECS'02 Configware / Software Co-Design: be prepared for the Next.

Hardwired networks on chip for FPGAs and their applications

Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.

Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.

Some Thoughts on Technology and Strategies for Petaflops.

Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.

Configurable System-on-Chip: Xilinx EDK

Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.

UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.

SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu

Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.

Enabling Technologies for System-on-Chip Development Reconfigurable Computing Architectures and Methodologies for System-on-Chip Monday, November 19, 10:15.

Foundation and XACTstepTM Software

CS 151 Digital Systems Design Lecture 38 Programmable Logic.

Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.

GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.

Study of AES Encryption/Decription Optimizations Nathan Windels.

Xilinx at Work in Hot New Technologies ® Spartan-II 64- and 32-bit PCI Solutions Below ASSP Prices January

L29:Lower Power Embedded Architecture Design 성균관대학교 조 준 동 교수,

EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)

Section I Introduction to Xilinx

Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.

1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,

Reconfigurable Devices Presentation for Advanced Digital Electronics (ECNG3011) by Calixte George.

Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.

COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.

Automated Design of Custom Architecture Tulika Mitra

VLSI & ECAD LAB Introduction.

Xilinx Programmable Logic Design Solutions Version 2.1i Designing the Industry’s First 2 Million Gate FPGA Drop-In 64 Bit / 66 MHz PCI Design.

Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.

Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.

System Design with CoWare N2C - Overview. 2 Agenda q Overview –CoWare background and focus –Understanding current design flows –CoWare technology overview.

SPREE RTL Generator RTL Simulator RTL CAD Flow 3. Area 4. Frequency 5. Power Correctness1. 2. Cycle count SPREE Benchmarks Verilog Results 3. Architecture.

CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.

Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.

J. Christiansen, CERN - EP/MIC

COE 405 Design and Modeling of Digital Systems

Field Programmable Gate Arrays (FPGAs) An Enabling Technology.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

“Politehnica” University of Timisoara Course No. 2: Static and Dynamic Configurable Systems (paper by Sanchez, Sipper, Haenni, Beuchat, Stauffer, Uribe)

VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

EE3A1 Computer Hardware and Digital Design

DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.

1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.

An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.

EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)

COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY

Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,

Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.

Enabling Technologies for Reconfigurable Computing Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design Part 2: Data-Stream-based.

Reconfigurable Computing1 Reconfigurable Computing Part II.

DSD2001 Reconfigurable Computing: a New Business Model – and its Impact on SoC Design Reiner Hartenstein University of Kaiserslautern Warzaw, Sept. 4 -

ECE354 Embedded Systems Introduction C Andras Moritz.

Memory Organisation for Datastream-based Reconfigurable Computing

Dynamically Reconfigurable Architectures: An Overview

Embedded systems, Lab 1: notes

Embedded Architectures: Configurable, Re-configurable, or what?

The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.

HIGH LEVEL SYNTHESIS.

Presentation transcript:

DSD2001 Reconfigurable Computing: the Roadmap to a New Business Model – and its Impact on SoC Design TS4: Tuesday, hrs Reiner Hartenstein University of Kaiserslautern Pirenópolis, GO, Brazil, Sept , 2001

© 2001, University of Kaiserslautern 2 Conferences on Reconfigurable Logic topic adoption by congresses: ASP-DAC, DAC, DATE, ISCAS, SPIE …. FCCM, FPGA (founded 1992), and FPL (founded 1991 at Oxford, UK): FPL 2002, La Grande Motte (Montpellier, France), Sept. 2 – 4 Paper Submission deadline : 15th March 2002 Notification of Acceptance : 20th May 2002 The International Conference on Field- programmable Logic and Applications Laboratoire d‘ Informatique, de Robotique et de Microélectronique de Montpellier Montpellier de

© 2001, University of Kaiserslautern 3 >> Introduction Introduction FPGA boom Coarse Grain Architectures Fascinating Paradigm Shift Programming Coarse Grain rDPAs Principles of Soft Computing Machines Future developments expected Conclusions fine grain coarse grain fundamental issues

© 2001, University of Kaiserslautern 4 Logic Gate Price Trend Source:Altera Price (Normalized to Q1/1993) Q1 '93 Q1 '94 Q1 '95 Q1 '96 Q1 '97 Q1 '98 Q1 '99 Q1 '00 Price per Logic Element 40% lower per Year

© 2001, University of Kaiserslautern 5 The Impact of Reconfigurable Logic Reconfigurable platforms bring a new dimension to digital system development and have a strong impact on SoC design. A rapidly growing large user base of HDL-savvy designers with FPGA experience. Flexibility supports turn-around times of minutes instead of months for real time in-system debugging, profiling, verification, tuning, field-maintenance, and field upgrades However, completely ignored by CS & CSE Curricula

© 2001, University of Kaiserslautern 6 ? What’s coming next ? The History of Paradigm Shifts “Mainstream Silicon Application is switching every 10 Years” TTL µproc., memory “The Programmable System-on-a-Chip is the next wave“ custom standard Makimoto’s Wave ASICs, accel’s LSI, MSI 1 st Design Crisis 2 nd Design Crisis ? reconfigurable Published in 1989

© 2001, University of Kaiserslautern 7 How’s next Wave ? 2007 FPGAs custom standard Tredennick’s Paradigm Shifts procedural programming algorithm: variable resources: fixed hardwired algorithm: fixed resources: fixed 2007 ? structural programming algorithm: variable resources: variable Coarse grain RAs no further wave ! Hartenstein’s Curve ? 4 th wave ?

© 2001, University of Kaiserslautern 8 The Impact of Makimoto’s Paradigm Shifts TTL µproc., memory custom standard ASICs, accel’s LSI, MSI reconfigurable Procedural personalization via RAM-based Machine Paradigm Personalization (CAD) before fabrication structural personalization: RAM-based before run time Dr. Makimoto: FPL 2000 keynote Software Industry’s Secret of Success Repeat Success Story by new Machine Paradigm !

© 2001, University of Kaiserslautern 9 Terminology

© 2001, University of Kaiserslautern 10 Reconfigurable Logic going Mainstream Please, Lobby for New Curricula. Comprehensive Methodology One of the goals of this talk: to motivate You by Key Issues and Visionary Highlights. Fine grain: FPGAs killing the ASIC market Coarse grain: several startups Substantially improved design flow and libraries Fastest growing segment of semiconductor market

© 2001, University of Kaiserslautern 11 >> FPGA boom Introduction FPGA boom Coarse Grain Architectures Fascinating Paradigm Shift Programming Coarse Grain RAs Principles of Soft Computing Machines Future development expected Conclusions

© 2001, University of Kaiserslautern 12 What is an FPGA ? single-length lines double-length lines S S S S L LL LL L LLL longlines S = Switch Box L = Logic Block Xilinx XC400E L LL LL L LLL

© 2001, University of Kaiserslautern 13 Top 4 FPGA Manufacturers 2000 Xilinx 42% Altera 37% Lattice 15% Actel 6% Top 4 PLD Manufacturers 2000 $3.7 Bio

© 2001, University of Kaiserslautern 14 FPGA market 1998 / rank global sales (mio $) Xilinx Altera Lattice Actel Lucent Cypress4143 7Quicklogic3040 8Atmel3238 Source: IC Insights Inc. Meanwhile, Xilinx acquired Philips' MOS PLD business, Lattice purchased Vantis..

© 2001, University of Kaiserslautern 15 FPGAs going Mainstream [Dataquest] PLD market > $7 billion by IP reuse and "pre-fabricated" components for the efficiency of design and use for PLDs FPGAs are going into every type of application. FPGA, from an IP standpoint, starting to look like an ASIC. PLD vendors provide libraries to support their products. today Altera and Xilinx own >65% of PLD business. FPGAs soon reach 50 million system gates

© 2001, University of Kaiserslautern 16 Away from complex design flow Place and Route Netlist Schematics/ HDL Netlister Bitstream Compiler HLL [S. Guccione] EDA trends....

© 2001, University of Kaiserslautern 17 Drop traditional separate design flow User Code Compiler Executable Netlister Netlist Place and Route. Bitstream Schematics/ HDL [S. Guccione] HLL Compiler [S. Guccione] EDA trends....

© 2001, University of Kaiserslautern 18 embedded hardw. CPU & memory cores HLL Compiler CPU core FPGA core Memory core [S. Guccione] embedded CPU and memory available HLL Compiler [S. Guccione] memory

© 2001, University of Kaiserslautern 19 CPU for configuration management on-board microprocessor CPU is available anyhow - even along with a little RTOS HLL Compiler [S. Guccione] Compiler HLL [S. Guccione] EDA trends....

© 2001, University of Kaiserslautern 20 Configuration Architectures host Compiler, Mapper, RTOS etc. Soft Data Path RAM multi-context: Soft Data Path RAM host Compiler, Mapper, RTOS etc. straight forward: host Compiler, Mapper, RTOS etc. Config. Cache RAM Soft Data Path RAM Configuration caching*: Configuration Loading Resources: separate configuration fabrics (e.g. FPGA) wormhole routing (KressArray, Colt, PipeRench) RA part computes code for other RA part (self reconfiguration) (dynamic vs. static configuration) Dynamic ( RTR ) *) no cache as usual !

© 2001, University of Kaiserslautern 21 million gate FPGAs and co-processing with standard microprocessor are commonplace direct implementation of complex algorithms new tools like Xilinx Jbits tool suite directly support coprocessing and Run Time Reconfiguration (RTR) Converging factors for RTR [S. Guccione] CPU core FPGA core Memory core User Java Code Java Compiler JBits API Executable [S. Guccione]

© 2001, University of Kaiserslautern 22 (5) static vs. dynamic reconfiguration 15 min supports ASAT, adaptable devices requires disciplined implementation to avoid a testing nightmare supported by on-board / on-chip CPU core supports in-field debugging and upgrading (new business model) supported by on-board / on-chip CPU core Revenue / month Time / months Update 1 Product Update ASIC Product reconfigurable Product with download 30 [Kean] page 109

© 2001, University of Kaiserslautern 23 Configware as the Key Enabler Configware market is taking off for mainstream FPGA-based designs more complex, even SoC No design productivity and quality without good configware libraries (soft IP cores) from various application areas. Growing no. of independent configware houses (soft IP core vendors) and design services Xilinx AllianceCORE & Reference Design Alliance et al. Currently the top FPGA vendors are key innovators and meet most configware demand.

© 2001, University of Kaiserslautern 24 „Driver“ & „OS“ for FPGAs separate EDA software market, comparable to the compiler / OS market in computers, Cadence, Mentor, Synopsys just jumped in. Xilinx and Altera are fabless FPGA vendors < 5% Xilinx / Altera income from EDA software > 50% Xilinx people work on support, EDA & Configware

© 2001, University of Kaiserslautern 25 >> Coarse Grain Architectures Introduction FPGA boom Coarse Grain Architectures Fascinating Paradigm Shift Programming Coarse Grain rDPAs Principles of Soft Computing Machines Future developments expected Conclusions for detailed overview see proceedings

© 2001, University of Kaiserslautern 26 Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld microprocessor / DSP Normalized processor speed battery performance Algorithmic Complexity (Shannon’s Law) memory Transistors/chip G 3G 4G Why coarse grain ? 1G wireless mA/ MIP computational efficiency StrongARM SH7752

© 2001, University of Kaiserslautern 27 Fine-grained vs. coarse-grained Fine-grained reconfiguration versus coarse-grained reconfiguration. fine grain is general purpose slow and area-inefficient, but high parallelism coarse grain is application domain-specific coarse grain is highly area-efficient extremely high performance

© 2001, University of Kaiserslautern 28 Reconfigurability Overhead S S S S resources needed for reconfigurability partly for configuration code storage L LL LL L LLL area used by application “hidden RAM” not shown

© 2001, University of Kaiserslautern 29 Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld Why Coarse Grain instead of FPGA ? physical logical supersystolic FPGA logical FPGA physical Transistors / chip ~ 10 ~ drastically smaller configuration memory a lot of more benefits much faster loading FPGA routed memory microprocessor reduced reconfigurability overhead by up to ~ 1000

© 2001, University of Kaiserslautern 30 Commercial RAs XPU family (IP cores): PACT corp., Munich XPU128 flexible array: MorphICs CALISTO: Silicon Spice* CS2000 family: Chameleon Systems MECA family: Malleable* FIPSOC: SIDSA ACM: Quicksilver Tech CHESS array: Elixent *) bought

© 2001, University of Kaiserslautern 31 Universal RAs are not feasible... often Functional Resources are not the Throughput Bottleneck Some Application Areas, such as e. g. Wireless Communication, need extremely rich Communication Resources Use Domain-specific Platform Generators ! The General Purpose (coarse grain) Reconfigurable Array appears to be an Illusion...

© 2001, University of Kaiserslautern 32 KressArray Family generic Fabrics: a few examples Examples of 2 nd Level Interconnect: layouted over rDPU cell - no separate routing areas ! + rout-through and function rout- through only more NNports: rich Rout Resources Select Function Repertory select Nearest Neighbour (NN) Interconnect: an example rDPU Select mode, number, width of NNports

© 2001, University of Kaiserslautern 33 array size: 10 x 16 = 160 rDPUs SNN filter KressArray Mapping Example rout thru only not used backbus connect

© 2001, University of Kaiserslautern 34 route-thru-only rDPU 3 vert. NNports, 32 bit Xplorer Plot: SNN Filter Example + [13] 2 hor. NNports, 32 bit operator result operand route thru backbus connect

© 2001, University of Kaiserslautern 35 >> Fascinating Paradigm Shift Introduction FPGA boom Coarse Grain Architectures Fascinating Paradigm Shift Programming Coarse Grain rDPAs Principles of Soft Computing Machines Future development expected Conclusions

© 2001, University of Kaiserslautern 36 Paradigm Shift Mainstream Tornado Development of Hypergrowth Markets Harper Business 1995

© 2001, University of Kaiserslautern 37 Makimoto’s 3rd wave The next EDA Industry Revolution 1978 Transistor entry: Applicon, Calma, CV Synthesis: Cadence, Synopsys Schematics entry: Daisy, Mentor, Valid... [Keutzer / Newton] EDA industry paradigm switching every 7 years 1999 (Co-) Compilation Stream-based DPU arrays [Hartenstein] 2006

© 2001, University of Kaiserslautern 38 It’s a General Paradigm Shift ! Using FPGAs (fine grain reconfigurable): just Logic Synthesis on a strange platform Coarse Grain Reconfigurable Arrays (Reconfigurable Computing): a fundamental Paradigm Shift ignored by Curricula & most R&D scenes Replacing Concurrent Processes by much more efficient parallelism: Stream-based ComputingArrays systolic array* [1980] KressArray** [1995] chip-on-a-day* [2000] ____ *) hardwired **) reconfigurable

© 2001, University of Kaiserslautern 39 Stream-based Computing (2) terms: DPU: datapath unit DPA: datapath array rDPU: reconfigurable DPU rDPA: reconfigurable DPA stream-based computing: using complex pipe network (super-systolic: Kress et al.)

© 2001, University of Kaiserslautern 40 Converging Design Flows this synthesis method is a generalization of systolic array synthesis: super systolic synthesis and DPA [Broderson, 2000]: terms: DPU: datpath unit DPA: data path array rDPU: reconfigurable DPU rDPA: reconfigurable DPA the same synthesis method may be used for mapping an algorithm onto both: rDPA [Kress, 1995],

© 2001, University of Kaiserslautern 41 Concurrent Computing DPU instruction sequencer DPU instruction sequencer DPU instruction sequencer DPU instruction sequencer.... Bus (es) or switch box CPU extremely inefficient

© 2001, University of Kaiserslautern 42 Stream-based Computing DPU driven by data stream from / to memory or, from / to peripheral interface transport-triggered execution no instruction sequencer inside !

© 2001, University of Kaiserslautern 43 Stream-based Computing: (r) DPU array for both, reconfigurable, and, hardwired DPU driven by data streams

© 2001, University of Kaiserslautern 44 >>> extremely high efficiency avoiding address computation overhead avoiding instruction fetch and interpretation overhead high parallelism, massively multiple deep pipelines much less configuration memory no routing areas to configure functions from CLBs

© 2001, University of Kaiserslautern 45 >> Programming Coarse Grain RAs Introduction FPGA boom Coarse Grain Architectures Fascinating Paradigm Shift Programming Coarse Grain rDPAs Principles of Soft Computing Machines Future development expected Conclusions

© 2001, University of Kaiserslautern 46 Systolic Stream-based Computing System Systolic Array [ H. T. Kung, 1980 ] : an array of DPUs (Data Path Units) DPU architecture y + * x a data streams equations placement linear projection or algebraic mapping The Mathematician’s Synthesis Method linear pipelines and uniform arrays only no routing!

© 2001, University of Kaiserslautern 47 computing in space Computing in space and time data streams y 1 0  y 2 0 y y 1 y 2 y x 1 x 2 x computing in time a 12 a 11 a 21 a 32 a 31 a 23 a 33 a 22 a 13 placement systolic arrays etc. and other transformations migration by re-timing this dichotomy is completely ignored by our CS curricula

© 2001, University of Kaiserslautern 48 2 General Stream-based Computing System heterogenous Array of DPUs (data path units) Scheduler Mapper expression tree DPU architectures y + * x a 1 simultaneous placement & routing * * * sh * xf - - data streams 4 The same mapper for both: Reconfigurable, or hardwired Kress DPSS [1995] simulated annealing free form pipe network

© 2001, University of Kaiserslautern 49 Super Pipe Networks The key is mapping, rather than architecture * *) KressArray [1995]

© 2001, University of Kaiserslautern 50 Processor Memory Performance Gap

© 2001, University of Kaiserslautern 51 Efficient Memory Communication should be directly supported by the Mapper Tools sequencers memory ports application not used Legend: Optimized Parallel Memory Controller An example by Nageldinger’s KressArray Xplorer Synthesizable Memory Communication

© 2001, University of Kaiserslautern 52 Memory Communication Architecture hot research topic in embedded systems storage context transformations [Herz, others] for low power for high performance startups provide memory IP or generators

© 2001, University of Kaiserslautern 53 Stream-based Soft Machine Scheduler Memory (data memory) memory bank... “instructions” rDPA Compiler Sequencers (data stream generator)

© 2001, University of Kaiserslautern 54 Hot Research Topic: Memory Architectures High Performance Embedded Memory Architectures High Performance Memory Communication Architectures [Herz] Custom Memory Management Methodology [Cathoor] Data Reuse Transformations [Kougia et al.] Data Reuse Exploration [Soudris, Wuytak]

© 2001, University of Kaiserslautern 55 >> Principles of Soft Computing Machines Introduction FPGA boom Coarse Grain Architectures Fascinating Paradigm Shift Programming Coarse Grain rDPAs Principles of Soft Computing Machines Future development expected Conclusions

© 2001, University of Kaiserslautern 56 KressArray DPSS Application Set DPSS published at ASP-DAC 1995 Architecture Editor Mapping Editor statist. Data Delay Estim. Analyzer Architecture Estimator interm. form 2 expr. tree ALE-X Compiler Power Estimator Power Data VHDL Verilog HDL Generator Simulator User ALEX Code Improvement Proposal Generator Suggestion Selection User Interface interm. form 3 Mapper Design Rules Datapath Generator Kress rDPU Layout data stream Schedule Scheduler KressArray Xplorer (Platform Design Space Explorer) Xplorer Inference Engine (FOX) Sug- gest- ion KressArray family parameters Compiler Mapper Scheduler

© 2001, University of Kaiserslautern 57 Architecture & Mapping Editor Statistics KressArray DPSS Datastream Generator HDL Generator Simulator Datapath Generator Delay & Power Estimator Improvement Proposal Generator User DPSS Source Input KressArray (Design Space) Platform Space Explorer Xplorer Application Set

© 2001, University of Kaiserslautern 58 Design Flow of Domain-specific Architecture Optimization Nageldinger’s KressArray Design Space Xplorer: including a Fuzzy Logic Improvement Proposal Generator accessible by internet: runs best with Netscape 4.6.1

© 2001, University of Kaiserslautern 59 data counter instructions program counter : state register Compiler Memory Datapath hardwired Sequencer Computer tightly coupled by compact instruction code “von Neumann” does not support soft data paths does not support soft data paths Datapath reconfigurable Xputer Scheduler Compiler Memory multiple sequencer Datapath Array “instructions” University of Kaiserslautern loosely coupled by decision data bits only Xputer: The Soft Machine Paradigm reconfigurable also for hardwired Computer: the wrong Machine Paradigm “von Neumann”

© 2001, University of Kaiserslautern 60 Machine Paradigms

© 2001, University of Kaiserslautern 61 Fundamental Ideas available Data Sequencer Methodology Data-procedural Languages (Duality w. v. N.)... supporting memory bandwidth optimization Soft Data Path Synthesis Algorithms Parallelizing Loop Transformation Methods Compilers supporting Soft Machines SW / CW Partitioning Co-Compilers

© 2001, University of Kaiserslautern 62 JPEG zigzag scan pattern x y EastScan is step by [1,0] end EastScan; SouthScan is step by [0,1] endSouthScan; *> Declarations NorthEastScan is loop 8 times until [*,1] step by [1,-1] endloop end NorthEastScan; SouthWestScan is loop 8 times until [1,*] step by [-1,1] endloop end SouthWestScan; HalfZigZag is EastScan loop 3 times SouthWestScan SouthScan NorthEastScan EastScan endloop end HalfZigZag; goto PixMap[1,1] HalfZigZag; SouthWestScan uturn (HalfZigZag) HalfZigZag data counter

© 2001, University of Kaiserslautern 63 Similar Programming Language Paradigms very easy to learn

© 2001, University of Kaiserslautern 64 GAG = Address Generator Generic GAG Scheme Limit Stepper Base Stepper GAG Address Stepper B0B0 AA L0L0 A  A L B 0 [] | || | limit

© 2001, University of Kaiserslautern 65 GAG: Address Stepper GAG = Address Generator Generic + / – Escape Clause End Detect Step Counter =o LA  A init tag A Address endExec maxStepCount 0 B Limit BasestepVector []| |  A L B 0 [] | || | limit GAG: Address Stepper

© 2001, University of Kaiserslautern 66 Generic Sequence Examples Limit Slider Base Slider GAG Address Stepper B0B0 AA L0L0 A

© 2001, University of Kaiserslautern 67 floor F address Slider Operation Demo Example B 0

© 2001, University of Kaiserslautern 68 Changing Models of Computation contemporary host hardwired Compiler accelerator(s) CAD RAM reconfigurable computing host re- Co-Compiler conf. accelerator(s) RAM Software Configware Machine paradigm EDA tools needed* ASIC s *) even 80% hardware people hate their tools both done at customer site done at vendor site no hardware experts needed

© 2001, University of Kaiserslautern 69 Co-Compilation Xputer “Soft” Machine Paradigm Configware running on partitioning compiler high level programming language source  Processor Reconfigurable Accelerators interface Reconfigurable Architecture (RA) -- instead of hardwired no CAD ! Compilation instead ! Hardware / Software Co-Design turns to Configware / Software Co-Design We introduce: Co-Compilation Computer Machine Paradigm Software running on Xputer “Soft” Machine Paradigm Configware running on

© 2001, University of Kaiserslautern 70 Jürgen Becker’s Co-DE-X Co-Compiler Analyzer / Profiler host GNU C compiler paradigm Computer machine DPSS KressArray X-C compiler Xputer machine paradigm Partitioner Loop Transfor- mations X-C is C language extended by MoPL X-C Resource Parameters supporting different platforms supporting platform-based design

© 2001, University of Kaiserslautern 71 Loop Transformation Examples loop 1-8 body endloop loop 1-8 body endloop loop 9-16 body endloop fork join strip mining loop 1-4 trigger endloop loop 1-2 trigger endloop loop 1-8 trigger endloop reconf.array: host: loop 1-16 body endloop sequential processes: resource parameter driven Co-Compilation loop unrolling

© 2001, University of Kaiserslautern 72 History of Loop Transformations David Loveman, 1977, Allen and Kennedy, et al. Loop Unrolling, Loop Fusion, Strip Mining.... (Parameter-driven) Time to Time/Space Partitioning 1995/97 [Karin Schmidt / Jürgen Becker] : downto Datapath Level: e. g.: Transformation from Sequential Process to Super-systolic Multi-dimensional Loop Unrolling / Storage Scheme Optimization supporting burst-mode & parallel Memory Banks 2000 [Michael Herz] : optimized RA to Memory Communication Bandwidth: 70ies - 80ies: at Process Level: Sequential to Parallel Processes, incl. Vectorization

© 2001, University of Kaiserslautern 73 >> Future developments expected Introduction FPGA boom Coarse Grain Architectures Fascinating Paradigm Shift Programming Coarse Grain rDPAs Principles of Soft Computing Machines Future developments expected Conclusions

© 2001, University of Kaiserslautern 74 EH conferences "Evolvable Hardware" (EH), "Evolutionary Methods" (EM), "Darwinistic Methods", and biologically inspired electronics new FPGA application [genetic FPGA] „the „DNA“ metaphor EH(NASA/DoD Workshop on Evolvable Hardware), ICES(Evolvable Systems), EuroGP and GP (Genetic Programming), CEC(Congress on Evolutionary Computation), GECCO(Genetic and Evolutionary Computation), EvoWorkshops 2002 (Evolutionary Computing Workshops), MAPLD (Military and Aerospace Applications of Programmable Logic Devices and Technologies) ICGA (Genetic Algorithms).

© 2001, University of Kaiserslautern 75 EH - What is it? What is the relation between Reconfigurable Computing and Evolvable Computing/Hardware? *) by crossing chromosomes Currently: research on darwinistic methods to generate or optimize IT systems by electronic sex*. "chromosome": a synonym for "configuration code". YAFA Evolvable Hardware and Computing - What is it? - yet another FPGA application

© 2001, University of Kaiserslautern 76 How important is evolvable computing ? new conferences in their visionary phase some NASA / DoD expectations look unrealistic Coming shake-out: future is hard to guess reminds me to past AI daze partly a revival of cybernetics, bionics, etc. genetic algorithms people dominate the scene (who do not talk to EDA people) GA suck

© 2001, University of Kaiserslautern 77 Embedded Soft IP Cores soft CPU FPGA Memory core FPGA Compiler HLL

© 2001, University of Kaiserslautern 78 Some soft CPU core examples corearchitectureplatform MicroBlaze 125 MHz 70 D-MIPS 32 bit standard RISC 32 reg. by 32 LUT RAM- based reg. Xilinx up to 100 on one FPGA Nios16-bit instr. set Altera Mercury Nios 50 MHz 32-bit instr. set Altera 22 D-MIPS Nios8 bitAltera – Mercury gr bit gr bit My80i8080AFLEX10K30 or EPF6016 DSPuva1616 bit DSPSpartan-II corearchitectureplatform Leon 25 Mhz SPARC ARM7 cloneARM uP bitCISC, 32 reg.200 XC4000E CLBs REGIS8 bits Instr. + ext. ROM 2 XILINX 3020 LCA Reliance-112 bit DSPLattice 4 isp30256, 4 isp1016 1Popcorn-18 bit CISCAltera, Lattice, Xilinx Acorn-11 Flex 10K20 YARD-1A16-bit RISC, 2 opd. Instr. old Xilinx FPGA Board xr16RISC integer CSpartanXL

© 2001, University of Kaiserslautern 79 FPGA CPUs in teaching and academic research UCSC: 1990! Märaldalen University, Eskilstuna, Sweden Chalmers University, Göteborg, Sweden Cornell University Gray Research Georgia Tech Hiroshima City University, Japan Michigan State Universidad de Valladolid, Spain Virginia Tech Washington University, St. Louis New Mexico Tech UC Riverside Tokai University, Japan

© 2001, University of Kaiserslautern 80 Soft rDPA Hardware Design Memory soft CPU miscellanous softDPUarraysoftDPUarray HLL Compiler

© 2001, University of Kaiserslautern 81 Area efficiency: still relevant to-day Rapid technology progress 50 mio system gates soon FPGAs for relocateble configware code ? Compatibility at configuration code level ? Slower clock: compensated by more parellelism Even large rDPAs as a soft IP become feasible By >2005: don’t care about area efficiency ?

© 2001, University of Kaiserslautern 82 >> Conclusions Introduction FPGA boom Coarse Grain Architectures Fascinating Paradigm Shift Programming Coarse Grain rDPAs Principles of Soft Computing Machines Future development expected Conclusions

© 2001, University of Kaiserslautern 83 Main problems to be solved (1) Main EDA tools required: De facto standard soft IP core libraries Tools for much better designer productivity Configuration code compatibility by a de facto standard RC platform family Compilers accepting high level programming language Scalable FPGA architectures supporting relocatable configuration code

© 2001, University of Kaiserslautern 84 Main problems to be solved (2) object code compatibility for new µP products Needed to become the dominant FPGA vendor: accepted OS, compilers, development tools available most software written for it: many application areas most configware (soft IP cores) written for it object code compatibility for new FPGA products widely accepted „OS“, compilers, development tools Compare the most successful microprocessor

© 2001, University of Kaiserslautern 85 Main problems to be solved (3) computing in space computing in time systolic arrays etc. and other transformations migration by re-timing this dichotomy is completely ignored by our CS curricula Easy to use C or Java based compilers needed Each programmer and each MBA should have qualified awareness on dichotomy and FPGAs curricular innovations are urgently needed Needing HDL-savvy users is a severe limitation Lobbying urgently needed

© 2001, University of Kaiserslautern 86 However, current CS Education …. Hardware invisible: under the surface … is based on the Submarine Model Brain usage: procedural-only Software Faculty Colleagues shy away from the Paradigm Shift: their Brain hurts? - can’t be: this Half has been amputated Algorithm Assembly Language procedural high level Programming Language Hardware Software This model disables...

© 2001, University of Kaiserslautern 87 Hardware, Configware Hardware and Software as Alternatives Algorithm Software partitioning Software only Software & Hardw/Configw procedural structural Brain Usage: both Hemispheres Hardw/Configw only

© 2001, University of Kaiserslautern 88 The Dominance of the Submarine Model Hardware.. indicates, that our CS Education System produces Zillions of Mentally Disabled Persons (procedural) structurally disabled … completely disabled to cope with Solutions other than Software only

© 2001, University of Kaiserslautern 89 It’s time to crush the Submarine Model Co-Compilation structural programmin g Xputer machine paradigm Computing in Space: von Neumann book already in the 50ies Computing in Space: von Neumann book already in the 50ies Now Fundamentals and Technology are available Now Fundamentals and Technology are available It’s time to innovate CS&E Curricula... It’s time to innovate CS&E Curricula..... toward a Dichotomy of Computing Science.. toward a Dichotomy of Computing Science procedural programming “von Neumann” paradigm Computer machine computing in space computing in time systolic arrays etc. and other transformations migration by re-timing

© 2001, University of Kaiserslautern 90 >>> thank you thank you for listening

© 2001, University of Kaiserslautern 91 >>> END END