Download presentation
Presentation is loading. Please wait.
Published byKaley Viney Modified over 9 years ago
1
Enabling Technologies for Reconfigurable Computing Enabling Technologies for Reconfigurable Computing Part 2: Stream-based Computing for RC Wednesday, November 21, 10.30 – 12.00 hrs. Reiner Hartenstein University of Kaiserslautern November 21, 2001, Tampere, Finland
2
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 2 Schedule timeslot 08.30 – 10.00Reconfigurable Computing (RC) 10.00 – 10.30coffee break 10.30 – 12.00Stream-based Computing for RC 12.00 – 14.00lunch break 14.00 – 15.30Resources for RC 15.30 – 16.00coffee break 16.00 – 17.30FPGAs: recent developments
3
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 3 >> EDA revolution EDA revolution Dead Supercomputer Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation http://www.uni-kl.de
4
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 4 EDA: where Electronics begins [Richard Newton] 1k Dataquest Initiative New book NASDAQ index EDA index
5
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 5 [Richard Newton]
6
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 6 The End is near year to market 10 0 3 6 9 12 10 15 19601970198019902000201020202030 2040 transistors/chip x1.6/year The end of Hypergrowth ? x100/decade
7
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 7 Paradigm Shift Mainstream Tornado Development of Hypergrowth Markets Harper Business 1995
8
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 8 Makimoto’s 3rd wave The next EDA Industry Revolution 1978 Transistor entry: Applicon, Calma, CV... 1992 Synthesis: Cadence, Synopsys... 1985 Schematics entry: Daisy, Mentor, Valid... [Keutzer / Newton] EDA industry paradigm switching every 7 years 1999 (Co-) Compilation Stream-based DPU arrays [Hartenstein] 2006
9
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 9 Biggest Mistake in History
10
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 10 Innovation Stalled ? [Richard Newton] What is next after VHDL ?
11
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 11 What is next after VHDL ? Motivations HDL-savvy designers needed New Business Model Co-Design never ending HDLs ? Extended HDLs – how far ? Automatic Partitioning
12
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 12 >> Dead Supercomputer EDA revolution Dead Supercomputer Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation http://www.uni-kl.de
13
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 13 Dead Supercomputer Society 37 university and corporate R&D projects: 2 or 3 successes… All the rest failed to work or to be successful (Research 1985-1995)
14
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 14 Dead Supercomputer Society ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/ Stellar/Stardent DAPP Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP Gould NPL Guiltech ICL Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories [Gordon Bell, keynote at ISCA 2000]. MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics
15
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 15 Dead Supercomputer Society ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/Stellar/Stardent DAP (ICL) Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP Gould NPL Guiltech Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics
16
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 16 >> Stream-based Computing EDA revolution Dead Supercomputer Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation http://www.uni-kl.de
17
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 17 Coarse Grain Reconfigurable Arrays vs. Parallel Processes I-Seq ALU Data Sequencer rALU Paralellität auf Prozeß-EbeneParalellität auf Datenpfad-Ebene Parallelism at Process Level Parallelism at Datapath Level reconfigurable hardwired no instruction sequencing !
18
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 18 Concurrent Computing DPU instruction sequencer DPU instruction sequencer DPU instruction sequencer DPU instruction sequencer.... Bus (es) or switch box CPU extremely inefficient
19
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 19 Stream-based Computing DPU driven by data stream from / to memory or, from / to peripheral interface transport-triggered execution no instruction sequencer inside !
20
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 20 Stream-based Computing: (r) DPU array for both, reconfigurable, and, hardwired DPU driven by data streams
21
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 21 >>> extremely high efficiency avoiding address computation overhead avoiding instruction fetch and interpretation overhead high parallelism, massively multiple deep pipelines much less configuration memory no routing areas to configure functions from CLBs
22
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 22 Systolic Stream-based Computing System Systolic Array [ H. T. Kung, 1980 ] : an array of DPUs (Data Path Units) DPU architecture y + * x a data streams equations placement linear projection or algebraic mapping The Mathematician’s Synthesis Method linear pipelines and uniform arrays only no routing!
23
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 23 computing in space Computing in space and time data streams y 1 0 y 2 0 y 3 0 - - - y 1 y 2 y 3 - - - x 1 x 2 x 3 - - - computing in time a 12 a 11 a 21 a 32 a 31 a 23 a 33 a 22 a 13 placement systolic arrays etc. and other transformations migration by re-timing this dichotomy is completely ignored by our CS curricula
24
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 24 2 General Stream-based Computing System heterogenous Array of DPUs (data path units) Scheduler Mapper expression tree DPU architectures y + * x a 1 simultaneous placement & routing 3 + ++ + * * * sh * xf - - data streams 4 The same mapper for both: Reconfigurable, or hardwired Kress DPSS [1995] simulated annealing free form pipe network
25
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 25 Converging Design Flows this synthesis method is a generalization of systolic array synthesis: super systolic synthesis and DPA [Broderson, 2000]: terms: DPU: datpath unit DPA: data path array rDPU: reconfigurable DPU rDPA: reconfigurable DPA the same synthesis method may be used for mapping an algorithm onto both: rDPA [Kress, 1995],
26
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 26 Super Pipe Networks The key is mapping, rather than architecture * *) KressArray [1995]
27
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 27 >> Stream-based Memory Architecture EDA revolution Dead Supercomputer Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation http://www.uni-kl.de
28
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 28 Hot Research Topic: Memory Architectures High Performance Embedded Memory Architectures High Performance Memory Communication Architectures [Herz] Custom Memory Management Methodology [Cathoor] Data Reuse Transformations [Kougia et al.] Data Reuse Exploration [Soudris, Wuytak]
29
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 29 Processor Memory Performance Gap
30
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 30 RAs: Cache does not help the memory bandwidth problem is often more dramatic then for microprocessors interleaving is not practicable, since based on sequential instruction streams classical caches do not help, since instruction sequencing is not used the problem: throughput of parallel data streams, not instruction streams super pipe networks, no parallel computers ! Stream-based arrays are a memory bandwidth problem
31
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 31 http://kressarray.de Efficient Memory Communication should be directly supported by the Mapper Tools sequencers memory ports application not used Legend: Optimized Parallel Memory Controller An example by Nageldinger’s KressArray Xplorer Synthesizable Memory Communication
32
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 32 The Disk Farm? or a System On a Card? The 500GB disc card LOTS of bandwidth A few disks replaced by >10s Gbytes RAM and a processor 14" MicroDrive:1.7” x 1.4” x 0.2” 2006: ? 1999: 340 MB, 5400 RPM, 5 MB/s, 15 ms seek 2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW) Integrated IRAM processor 2x height Connected via crossbar switch growing like Moore’s law 16 Mbytes; ; 1.6 Gflops; 6.4 Gops 10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tflops [Gordon Bell, Jim Gray, ISCA2000]
33
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 33 Memory Communication Architecture hot research topic in embedded systems storage context transformations [Herz, others] for low power for high performance startups provide memory IP or generators
34
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 34 Stream-based Soft Machine Scheduler Memory (data memory) memory bank... “instructions” rDPA Compiler Sequencers (data stream generator)
35
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 35 >> Design Space Explorers EDA revolution Dead Supercomputer Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation http://www.uni-kl.de
36
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 36 domain-specific Reconfigurable Platforms will be suitable to cope with the 2 nd Design Crisis just as the general purpose massively parallel computer system general purpose is unrealistic an Illusion... KressArray Explorer... fully general purpose reconfigurable sometimes is....
37
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 37 Universal RAs: is it feasible?... such as obviously also the Universal Massively Parallel Computer Architecture... counter-example: Application Domain of Image Processing The General Purpose (coarse grain) Reconfigurable Array appears to be an Illusion... Motivation
38
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 38 -> Design Space Exploration Design Space Exploration –Design Space Explorer (DSEs) –Platform Space Explorers (PSEs) –Compiler / PSE symbiosis –Parallel computing vs. reconfigurable
39
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 39 Design Space Exploration Systems
40
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 40 DSEs: an overview For VLSI design in general for parallel Computer Systems Xplorer the only one for reconfigurable platforms (auch MATRIX ?)
41
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 41 >> KressArray Xplorer EDA revolution Dead Supercomputer Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation http://www.uni-kl.de
42
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 42 KressArray DPSS Application Set DPSS published at ASP-DAC 1995 Architecture Editor Mapping Editor statist. Data Delay Estim. Analyzer Architecture Estimator interm. form 2 expr. tree ALE-X Compiler Power Estimator Power Data VHDL Verilog HDL Generator Simulator User ALEX Code Improvement Proposal Generator Suggestion Selection User Interface interm. form 3 Mapper Design Rules Datapath Generator Kress rDPU Layout data stream Schedule Scheduler KressArray Xplorer (Platform Design Space Explorer) Xplorer Inference Engine (FOX) Sug- gest- ion KressArray family parameters Compiler Mapper Scheduler
43
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 43 Architecture & Mapping Editor Statistics KressArray DPSS Datastream Generator HDL Generator Simulator Datapath Generator Delay & Power Estimator Improvement Proposal Generator User DPSS Source Input KressArray (Design Space) Platform Space Explorer http://kressarray.de Xplorer Application Set
44
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 44 Design Flow of Domain-specific Architecture Optimization Nageldinger’s KressArray Design Space Xplorer: including a Fuzzy Logic Improvement Proposal Generator accessible by internet: http://kressarray.de runs best with Netscape 4.6.1
45
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 45 KressArray Design Space Xplorer DPSS-N Data Path Systhesis System Analyser HDL Generator HDL Description.v Module Generator.krs Kress IP Library other IP Editor / User Interface Architecture Estimation Intermediate Format.map ALE-X Compiler ALE-X Code.alex User Mapper Interm. Format.map including configware code Technology Mapping Scheduler Data.seq Sequencing Code Kress rDPU.krs Layout Placement & Routing M a p p i n g Statistical Data.stat to Synthesis Environment
46
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 46 >> Machine paradigms EDA revolution Dead Supercomputer Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation http://www.uni-kl.de
47
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 47 data counter instructions program counter : state register Compiler Memory Datapath hardwired Sequencer Computer tightly coupled by compact instruction code “von Neumann” does not support soft data paths does not support soft data paths Datapath reconfigurable Xputer Scheduler Compiler Memory multiple sequencer Datapath Array “instructions” University of Kaiserslautern loosely coupled by decision data bits only Xputer: The Soft Machine Paradigm reconfigurable also for hardwired Computer: the wrong Machine Paradigm “von Neumann”
48
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 48 Soft Machine Paradigm Xputer Parallel Xputer reconfigurable Scheduler Compiler Memory Sequencer Datapath “instructions” data counter Scheduler Compiler Sequencer Datapath Sequencer “instructions” data counters reconfigurable memory multiple Decision data only; i, e, loose coupling
49
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 49 Computer: the wrong Machine Paradigm Compiler Memory Sequencer Decoder Datapath instructions program counter hardwired tightly coupled by a compact instruction code “von Neumann” does not support soft data paths: does not support soft data paths: “von Neumann” at run time: no instruction fetch : Instruction Sequencer Datapath reconfigurable
50
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 50 Machine Paradigms
51
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 51 Machine Paradigms
52
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 52 Fundamental Ideas available Data Sequencer Methodology Data-procedural Languages (Duality w. v. N.)... supporting memory bandwidth optimization Soft Data Path Synthesis Algorithms Parallelizing Loop Transformation Methods Compilers supporting Soft Machines SW / CW Partitioning Co-Compilers
53
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 53 >> Co-Compilation EDA revolution Dead Supercomputer Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation http://www.uni-kl.de
54
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 54 FPGA-Style Mapping for coarse grain reconfigurable arrays Compiler Mapper Scheduler specifies and assembles the data streams from / to array DPSS KressArray DPSS (Datapath Synthesis System)
55
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 55 Changing Models of Computing “von Neumann” downloading RAM downloading data path instruction sequencer I / O (procedural) Software contemporary host hardwired downloading accelerator(s) CAD RAM reconfigurable computing host re- downloading conf. accelerator(s) RAM Software Configware both done at customer site Hardware designer needed done at vendor site ASIC s
56
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 56 Changing Models of Computation contemporary host hardwired Compiler accelerator(s) CAD RAM reconfigurable computing host re- Co-Compiler conf. accelerator(s) RAM Software Configware Machine paradigm EDA tools needed* ASIC s *) even 80% hardware people hate their tools both done at customer site done at vendor site no hardware experts needed
57
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 57 Processor Co-Compilation partitioning compiler Computer Machine Paradigm Software running on Xputer “Soft” Machine Paradigm Configware running on GNU C compiler Analyzer / Profiler Hardware / Software Co-Design turns to Configware / Software Co-Design supporting different platforms Resource Parameters interface X-C compiler Reconfigurable Accelerators KressArray DPSS high level programming language source X-C Partitioner Jürgen Becker’s Co-DE-X Co-Compiler [ASP-DAC’95]
58
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 58 Co-Compilation Xputer “Soft” Machine Paradigm Configware running on partitioning compiler high level programming language source Processor Reconfigurable Accelerators interface Reconfigurable Architecture (RA) -- instead of hardwired no CAD ! Compilation instead ! Hardware / Software Co-Design turns to Configware / Software Co-Design We introduce: Co-Compilation Computer Machine Paradigm Software running on Xputer “Soft” Machine Paradigm Configware running on
59
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 59 Jürgen Becker’s Co-DE-X Co-Compiler Analyzer / Profiler host GNU C compiler paradigm Computer machine DPSS KressArray X-C compiler Xputer machine paradigm Partitioner Loop Transfor- mations X-C is C language extended by MoPL X-C Resource Parameters supporting different platforms supporting platform-based design
60
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 60 Loop Transformation Examples loop 1-8 body endloop loop 1-8 body endloop loop 9-16 body endloop fork join strip mining loop 1-4 trigger endloop loop 1-2 trigger endloop loop 1-8 trigger endloop reconf.array: host: loop 1-16 body endloop sequential processes: resource parameter driven Co-Compilation loop unrolling
61
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 61 History of Loop Transformations David Loveman, 1977, Allen and Kennedy, et al. Loop Unrolling, Loop Fusion, Strip Mining.... (Parameter-driven) Time to Time/Space Partitioning 1995/97 [Karin Schmidt / Jürgen Becker] : downto Datapath Level: e. g.: Transformation from Sequential Process to Super-systolic Multi-dimensional Loop Unrolling / Storage Scheme Optimization supporting burst-mode & parallel Memory Banks 2000 [Michael Herz] : optimized RA to Memory Communication Bandwidth: 70ies - 80ies: at Process Level: Sequential to Parallel Processes, incl. Vectorization
62
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 62 History of Loop Transformations For Sequential Programs on Parallel Computers: David Loveman, 1977, Allen and Kennedy, etc.: Loop Unrolling, Loop Fusion, Strip Mining.... For memory communication: Michael Herz (2000): Multi-Level Loop Unrolling to reduce Memory Cycles needed to create RA Data Streams For parallel Datapaths: Jürgen Becker (1997): to Sequential to Super-Systolic Transformation Optimize Throughput of Reconfigurable Arrays (RAs) Instruction Code vs. Reconfiguration Code
63
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 63 Future Coarse Grain RA Development It is indispensable to operate within the Convergence Area of Compilers, Co-Compilers, Architecture and full- custom-style VLSI Design (array cells). It is a must, that Products come with a Development Platform which encourages users,especially also those with a limited Hardware Background.
64
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 64 >> Design Space Explorers EDA revolution Dead Supercomputer Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation http://www.uni-kl.de
65
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 65 Schedule timeslot 08.30 – 10.00Reconfigurable Computing (RC) 10.00 – 10.30coffee break 10.30 – 12.00Stream-based Computing for RC 12.00 – 14.00lunch break 14.00 – 15.30Resources forRC 15.30 – 16.00coffee break 16.00 – 17.30FPGAs: recent developments
66
© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 66 END
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.