The von Neumann Syndrome calls for a Revolution

Slides:

Advertisements

Similar presentations

The von Neumann Syndrome Reiner Hartenstein TU Kaiserslautern TU Delft, Sept 28, (v.2)

Advertisements

Reconfigurable Supercomputing means to brave the paradigm chasm Reiner Hartenstein HiPEAC Workshop on Reconfigurable Computing Ghent, Belgium January 28,

Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.

Associative Cache Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word (or sub-address in line) Tag.

SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu

CS curricula update proposed: by adding Reconfigurable Computing Reiner Hartenstein TU Kaiserslautern EAB meeting, Philadelphia,1 Nov 2005.

C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.

The Transdisciplinary Responsibility of CS Curricula Reiner Hartenstein TU Kaiserslautern San Diego, CA, USA, June , 2006 THE NINTH WORLD CONFERENCE.

CCSE251 Introduction to Computer Organization

Introduction CSE 410, Spring 2008 Computer Systems

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.

Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.

CS/ECE 3330 Computer Architecture Kim Hazelwood Fall 2009.

J. Christiansen, CERN - EP/MIC

Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,

Computer Organization and Design Computer Abstractions and Technology

Part 1.  Intel x86/Pentium family  32-bit CISC processor  SUN SPARC and UltraSPARC  32- and 64-bit RISC processors  Java  C  C++  Java  Why Java?

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,

EE3A1 Computer Hardware and Digital Design

Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.

The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, HPRCTA'07 - First.

Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.

VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.

Introduction CSE 410, Spring 2005 Computer Systems

Distributed and Parallel Processing George Wells.

Internal hardware of a computer Learning Objectives Learn how the processor works Learn about the different types of memory and what memory is used for.

Computer Organization and Architecture Lecture 1 : Introduction

These slides are based on the book:

Conclusions on CS3014 David Gregg Department of Computer Science

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Microprocessors Personal Computers Embedded Systems Programmable Logic

ECE354 Embedded Systems Introduction C Andras Moritz.

ELEC 7770 Advanced VLSI Design Spring 2016 Introduction

Complex Programmable Logic Device (CPLD) Architecture and Its Applications

CSE 410, Spring 2006 Computer Systems

Enabling machine learning in embedded systems

Advanced Topic: Alternative Architectures Chapter 9 Objectives

The University of Adelaide, School of Computer Science

Constructing a system with multiple computers or processors

Architecture & Organization 1

Cache Memory Presentation I

Introduction to Reconfigurable Computing

Cache memory Direct Cache Memory Associate Cache Memory

Hyperthreading Technology

CS775: Computer Architecture

ELEC 7770 Advanced VLSI Design Spring 2014 Introduction

The University of Texas at Austin

What is Parallel and Distributed computing?

Architecture & Organization 1

Central Processing Unit

ELEC 7770 Advanced VLSI Design Spring 2012 Introduction

T Computer Architecture, Autumn 2005

ELEC 7770 Advanced VLSI Design Spring 2010 Introduction

Embedded Architectures: Configurable, Re-configurable, or what?

CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

Chapter 1 Introduction.

1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.

Computer Evolution and Performance

Overview 1. Inside a PC 2. The Motherboard 3. RAM the 'brains' 4. ROM

Virtual Memory: Working Sets

Vrije Universiteit Amsterdam

Types of Parallel Computers

Husky Energy Chair in Oil and Gas Research

EEL4930/5934 Reconfigurable Computing

Martin Croome VP Business Development GreenWaves Technologies.

Presentation transcript:

The von Neumann Syndrome calls for a Revolution 9 November 2018 HPRCTA'07 - First International Workshop on High-Performance Reconfigurable Computing Technology and Applications - in conjunction with SC07 - Reno, NV, November 11, 2007 Reiner Hartenstein TU Kaiserslautern The von Neumann Syndrome calls for a Revolution http://hartenstein.de

About Scientific Revolutions 9 November 2018 About Scientific Revolutions Thomas S. Kuhn: The Structure of Scientific Revolutions Ludwik Fleck: Genesis and Developent of a Scientific Fact 2

What is the von Neumann Syndrome 9 November 2018 What is the von Neumann Syndrome Computing the von Neumann style is tremendously inefficient. Multiple layers of massive overhead phenomena at run time often lead to code sizes of astronomic dimensions: resident at drastically slower off-chip memory. The manycore programming crisis requires complete re-mapping and re-implementation of applications. A sufficiently large population of programmers qualified to program applications for 4 and more cores is far from being available. 3

Education for multi-core 9 November 2018 Education for multi-core Mateo Valero I programming multicores Multicore-based pacifier 4

Will Computing be affordable in the Future? 9 November 2018 Will Computing be affordable in the Future? Another problem is a high priority political issue: the very high energy consumption of von-Neumann-based Systems. The electricity consumption of all visible and hidden computers reaches more than 20% of our total electricity consumption. A study predicts 35 - 50% for the US by the year 2020. 5

Reconfigurable Computing highly promising 9 November 2018 Reconfigurable Computing highly promising Fundamental concepts from Reconfigurable Computing promise a speed-up by almost one order of magnitude, for some application areas by up to 2 or 3 orders of magnitude, at the same time slashing the electricity bill down to 10% or less. It is really time to fully exploit the most disruptive revolution since the mainframe: Reconfigurable Computing - also to reverse the down trend in CS enrolment. Reconfigurable Computing shows us the road map to the personal desktop supercomputer making HPC affordable also for small firms and for individuals, and, to a drastic reduction of energy consumption. Contracts between microprocessor firms and Reconfigurable Computing system vendors are on the way but not yet published. The technology is ready, but most users are not. Why? 6

A Revolution is overdue 9 November 2018 A Revolution is overdue The talk sketches a road map requiring a redefinition of the entire discipline, inspired by the mind set of Reconfigurable Computing. 7

much more saved by coarse-grain 9 November 2018 much more saved by coarse-grain platform examle energy W / Gflops energy factor MDgrape-3* (domain-specific 2004) 0.2 1 Pentium 4 14 70 Earth Simulator (supercomputer 2003) 128 640 http://www.thomaslfriedman.com *) feasible also with rDPA 8

(3) Power-aware Applications 2020 100 200 (3) Power-aware Applications Cyber infrastructure energy consumption: several predictions. most pessimistic: almost 50% by 2025 in the USA Mobile Computation, Communication, Entertainment, etc. (high volume market) 2003 and later PCs and servers (high volume) HPC and Supercomputing, 9

An Example: FPGAs in Oil and Gas .... (1) 9 November 2018 An Example: FPGAs in Oil and Gas .... (1) [Herb Riley, R. Associates] „Application migration [from supercomputer] has resulted in a 17-to-1 increase in performance" For this example speed-up is not my key issue (Jürgen Becker‘s tutorial showed much higher speed-ups - going upto a factor of 6000) http://www.thomaslfriedman.com For this oil and gas example a side effect is much more interesting than the speed-up 10

An Example: FPGAs in Oil and Gas .... (2) 9 November 2018 An Example: FPGAs in Oil and Gas .... (2) [Herb Riley, R. Associates] „Application migration [from supercomputer] has resulted in a 17-to-1 increase in performance" Saves more than $10,000 in electricity bills per year (7¢ / kWh) - .... per 64-processor 19" rack did you know … This is a strategic issue … 25% of Amsterdam‘s electric energy consumption goes into server farms ? http://www.thomaslfriedman.com … a quarter square-kilometer of office floor space within New York City is occupied by server farms ? 11

Oil and Gas as a strategic issue 9 November 2018 Oil and Gas as a strategic issue Low power design: not only to keep the chips cool You know the amount of Google’ s electricity bill? It should be investigated, how far the migrational achievements obtained for computationally intensive applications, can also be utilized for servers http://www.thomaslfriedman.com Recently the US senate ordered a study on the energy consumption of servers 12

Flag ship conference series: IEEE ISCA migration of the lemings Other: cache coherence ? speculative scheduling? 98.5 % von Neumann [David Padua, John Hennessy, et al.] Parallelism faded away (2001: 84%) Jean-Loup Baer 13

Unqualified for RC ? hiring a student from the EE dept. ? Using FPGAs for scientific computation? hiring a student from the EE dept. ? application disciplines use their own trick boxes: transdisciplinary fragmentation of methodology CS is responsible to provide a RC common model for transdisciplinary education and, to fix its intradisciplinary fragmentation 14

Computing Curricula 2004 fully ignores Reconfigurable Computing 9 November 2018 Joint Task Force for Computing Curricula 2004 fully ignores Reconfigurable Computing Curricula ? FPGA & synonyma: 0 hits (Google: 10 million hits) http://www.thomaslfriedman.com not even here 15

Curriculum Recommendations, v. 2005 Upon my complaints the only change: including to the last paragraph of the survey volume: "programmable hardware (including FPGAs, PGAs, PALs, GALs, etc.)." However, no structural changes at all v. 2005 intended to be the final version (?) torpedoing the transdisciplinary responsibility of CS curricula This is criminal ! 16

fine-grained vs. coarse-grained reconfigurability “fine-grained” means: data path width ~1 bit “coarse-grained”: path width = many bits (e.g. 32 bits) CPU w. extensible instruction set (partially reconfigurable) Domain-specific CPU design (not reconfigurable) Soft core CPU (reconfigurable) instruction-stream-based Domain-specific rDPU design rDPU with extensible „instruction„ set data-stream-based 17

coarse-grained: terminology program counter execution triggered by paradigm CPU yes instruction fetch instruction-stream-based DPU** no data arrival* data-stream-based program counter DPU CPU DPU **) does not have a program counter *) “transport-triggered” 18

coarse-grained: terminology program counter execution triggered by paradigm CPU yes instruction fetch instruction-stream-based DPU** no data arrival* data-stream-based program counter DPU CPU DPU **) does not have a program counter *) “transport-triggered” PACT Corp, Munich, offers rDPU arrays rDPAs 19

The Paradigm Shift to Data-Stream-Based 9 November 2018 The Paradigm Shift to Data-Stream-Based The Method of Communication and Data Transport by Software the von Neumann syndrome by Configware complex pipe network on rDPA 20

(generalization of the systolic array model) The Anti Machine A kind of trans(sub)disciplinary effort: the fusion of paradigms Interpreation [Thomas S.Kuhn]: cleanup the terminology! non-von-Neumann machine paradigm (generalization of the systolic array model) Twin paradigm ? split up into 2 paradigms? Like mater & anti matter: one elementary particles physics 21

Languages turned into Religions 9 November 2018 Languages turned into Religions Java is a religion – not a language [Yale Patt] Teaching to students the tunnel view of language designers falling in love with the subtleties of formalismes instead of meeting the needs of the user 22

The language and tool disaster 9 November 2018 The language and tool disaster End of April a DARPA brainstorming conference Software people do not speak VHDL Hardware people do not speak MPI Bad quality of the application development tools A poll at FCCM’98 revealed, that 86% hardware designers hate their tools 23

The first Reconfigurable Computer 9 November 2018 The first Reconfigurable Computer prototyped 1884 by Herman Hollerith a century before FPGA introduction data-stream-based Herman Hollerith *29 Feb 1860 Buffalo 60 years later the von Neumann (vN) model took over instruction-stream-based 24

Reconfigurable Computing came back 9 November 2018 Reconfigurable Computing came back As a separate community – the clash of paradigms 1960 „fixed plus variable structure computer“ proposed by G. Estrin 1970 PLD (programmable logic device*) 1985 FPGA (Field Programmable Gate Array) 1989 Anti Machine Model – counterpart of von Neumann 1990 Coarse-grained Reconfigurable Datapath Array Wann? Foundation of PACT Wann reconfigurable address generator – 1994 MoPL ....... does not support massive parallelism in large systems ...... *) Boolean equations in sum of products form implemented by AND matrix and OR matrix AND matrix OR matrix PLA reconfigurable ePROM fixed PAL structured VLSI design like memory chips: integration density very close to Moore curve 25

Outline von Neumann overhead hits the memory wall 9 November 2018 Outline von Neumann overhead hits the memory wall The manycore programming crisis Reconfigurable Computing is the solution We need a twin paradigm approach Conclusions 26

The spirit of the Mainframe Age 9 November 2018 The spirit of the Mainframe Age For decades, we’ve trained programmers to think sequentially, breaking complex parallelism down into atomic instruction steps … … finally tending to code sizes of astronomic dimensions Even in “hardware” courses (unloved child of CS scenes) we often teach von Neumann machine design – deepening this tunnel view 1951: Hardware Design going von Neumann (Microprogramming) 27

von Neumann: array of massive overhead phenomena 9 November 2018 von Neumann: array of massive overhead phenomena … piling up to code sizes of astronomic dimensions overhead von Neumann machine instruction fetch instruction stream state address computation data address computation data meet PU i/o - to / from off-chip RAM multi-threading overhead … other overhead 28

von Neumann: array of massive overhead phenomena 9 November 2018 von Neumann: array of massive overhead phenomena piling up to code sizes of astronomic dimensions overhead von Neumann machine instruction fetch instruction stream state address computation data address computation data meet PU i/o - to / from off-chip RAM multi-threading overhead … other overhead temptations by von Neumann style software engineering [Dijkstra 1968] the “go to” considered harmful massive communication congestion [R.H. 1975] universal bus considered harmful Backus, 1978: Can programming be liberated from the von Neumann style? Arvind et al., 1983: A critique of Multiprocessing the von Neumann Style 29

von Neumann: array of massive overhead phenomena 9 November 2018 von Neumann: array of massive overhead phenomena piling up to code sizes of astronomic dimensions Dijkstra 1968 R.H., Koch 1975 Backus 1978 Arvind 1983 overhead von Neumann machine instruction fetch instruction stream state address computation data address computation data meet PU i/o - to / from off-chip RAM multi-threading overhead … other overhead temptations by von Neumann style software engineering [Dijkstra 1968] the “go to” considered harmful massive communication congestion [R.H. 1975] universal bus considered harmful 30

von Neumann overhead: just one example 9 November 2018 von Neumann overhead: just one example 94% computation load only for moving this window overhead von Neumann machine instruction fetch instruction stream state address computation data address computation data meet PU i/o - to / from off-chip RAM multi-threading overhead … other overhead [1989]: 94% computation load (image processing example) 31

instruction stream code size of astronomic dimensions ….. 9 November 2018 instruction stream code size of astronomic dimensions ….. … needs off-chip RAM which fully hits ends in 2005 2005: ~1000 the Memory Wall 1 10 100 1000 Performance 1980 1990 2000 DRAM CPU µProc 60%/yr.. CPU clock speed ≠ performance: processor’s silicon is mostly cache Dave Patterson’s Law - “Performance” Gap: better compare off-chip vs. fast on-chip memory processors are not that good growth 50% / year DRAM 7%/yr.. 32

Benchmarked Computational Density DEC alpha 9 November 2018 Benchmarked Computational Density stolen from Bob Colwell CPU caches ... CPU clock speed ≠ performance: processor’s silicon is mostly cache [BWRC, UC Berkeley, 2004] 1990 1995 2000 2005 200 100 50 150 75 25 125 175 SPECfp2000/MHz/Billion Transistors alpha: down by 100 in 6 yrs IBM: down by 20 in 6 yrs SUN intel curve removed, meanwhile all curves removed from RAMP website IBM HP 33

Outline von Neumann overhead hits the memory wall 9 November 2018 Outline von Neumann overhead hits the memory wall The manycore programming crisis Reconfigurable Computing is the solution We need a twin paradigm approach Conclusions 34

9 November 2018 The Manycore future we are embarking on a new computing age -- the age of massive parallelism [Burton Smith] everyone will have multiple parallel computers [B.S.] Even mobile devices will exploit multicore processors, also to extend battery life [B.S.] multiple von Neumann CPUs on the same µprocessor chip lead to exploding (vN) instruction stream overhead [R.H.] 35

von Neumann parallelism the sprinkler head has only a single whole: the von Neumann bottleneck the watering pot model [Hartenstein] 36

Several overhead phenomena The instruction-stream-based parallel von Neumann approach: the watering pot model [Hartenstein] per CPU! has several von Neumann overhead phenomena CPU 37

Explosion of overhead by von Neumann parallelism CPU overhead von Neumann machine monoprocessor local overhead instruction fetch instruction stream state address computation data address computation data meet PU i / o to / from off-chip RAM … other overhead parallel global inter PU communication message passing proportionate to the number of processors disproportionate to the number of processors [R.H. 2006] MPI considered harmful 38

Rewriting Applications CPU more processors means rewriting applications we need to map an application onto different size manycore configurations most applications are not readily mappable onto a regular array. rDPU Mapping is much less problematic with Reconfigurable Computing 39

Disruptive Development 9 November 2018 Disruptive Development Computer industry is probably going to be disrupted by some very fundamental changes. [Ian Barron] We must reinvent computing. [Burton J. Smith] A parallel [vN] programming model for manycore machines will not emerge for five to 10 years [experts from Microsoft Corp]. ....... does not support massive parallelism in large systems ...... I don‘t agree: we have a model. Reconfigurable Computing: Technology is Ready, Users are Not The Education Wall It‘s mainly an education problem 40

Outline von Neumann overhead hits the memory wall 9 November 2018 Outline von Neumann overhead hits the memory wall The manycore programming crisis Reconfigurable Computing is the solution We need a twin paradigm approach Conclusions 41

The Reconfigurable Computing Paradox 9 November 2018 The Reconfigurable Computing Paradox Bad FPGA technology: reconfigurability overhead, wiring overhead, routing congestion, slow clock speed Up to 4 orders of magnitude speedup + tremendously slashing the electricity bill by migration to FPGA The reason of this paradox ? There is something fundamentally wrong in using the von Neumann paradigm The spirit from the Mainframe Age is collapsing under the von Neumann syndrome 42

beyond von Neumann Parallelism the watering pot model [Hartenstein] The instruction-stream-based von Neumann approach: We need an approach like this: per CPU! it’s data-stream-based RC* has several von Neumann overhead phenomena *) “RC” = Reconfigurable Computing 43

beyond von Neumann Parallelism 9 November 2018 beyond von Neumann Parallelism the watering pot model [Hartenstein] instead of this instruction-stream-based parallelism we need an approach like this: per CPU! several von Neumann overhead phenomena it’s data-stream-based Recondigurable Computing 44

von Neumann overhead vs. Reconfigurable Computing rDPU rDPA: reconfigurable datapath array (coarse-grained rec.) von Neumann overhead vs. Reconfigurable Computing using program counter using data counters using reconfigurable data counters overhead von Neumann machine hardwired anti machine reconfigurable anti machine instruction fetch instruction stream none* state address computation data address computation data meet PU + other overh. i / o to / from off-chip RAM Inter PU communication message passing overhead no instruction fetch at run time *) configured before run time 45

von Neumann overhead vs. Reconfigurable Computing rDPU von Neumann overhead vs. Reconfigurable Computing (coarse-grained rec.) using program counter using data counters using reconfigurable data counters rDPA: reconfigurable datapath array overhead von Neumann machine hardwired anti machine reconfigurable anti machine instruction fetch instruction stream none* state address computation data address computation data meet P + other overh. i / o to / from off-chip RAM Inter PU communication message passing overhead [1989]: x 17 speedup by GAG** (image processing example) [1989]: x 15,000 total speedup from this migration project *) configured before run time **) just by reconfigurable address generator 46

Reconfigurable Computing means … 9 November 2018 Reconfigurable Computing means … For HPC run time is more precious than compiletime http://www.tnt-factory.de/videos_hamster_im_laufrad.htm Reconfigurable Computing means moving overhead from run time to compile time** Reconfigurable Computing replaces “looping” at run time* … … by configuration before run time **) or, loading time *) e. g. complex address computation 47

Reconfigurable Computing means … 9 November 2018 Reconfigurable Computing means … For HPC run time is more precious than compiletime Reconfigurable Computing means moving overhead from run time to compile time** Reconfigurable Computing replaces “looping” at run time* … … by configuration before run time **) or, loading time *) e. g. complex address computation 48

Data meeting the Processing Unit (PU) ... explaining the RC advantage We have 2 choices routing the data by memory-cycle-hungry instruction streams thru shared memory by Software by Configware (data) data-stream-based: placement* of the execution locality ... (PU) pipe network generated by configware compilation *) before run time 49

Generalization* of the systolic array What pipe network ? array port receiving or sending a data stream rDPA rDPU pipe network, organized at compile time depending on connect fabrics Generalization* of the systolic array rDPA rDPA = rDPU array, i. e. coarse-grained rDPU [R. Kress, 1995] *) supporting non-linear pipes on free form hetero arrays rDPU = reconf. datapath unit (no program counter) 50

Migration benefit by on-chip RAM Some RC chips have hundreds of on-chip RAM blocks, orders of magnitude faster than off-chip RAM so that the drastic code size reduction by software to configware migration can beat the memory wall multiple on-chip RAM blocks are the enabling technology for ultra-fast anti machine solutions GAGs inside ASMs generate the data streams ASM data counter GAG RAM ASM: Auto-Sequencing Memory rDPA rDPU ASM GAG = generic address generator rDPA = rDPU array, i. e. coarse-grained rDPU = reconf. datapath unit (no program counter) 51

Coarse-grained Reconfigurable Array example 9 November 2018 Coarse-grained Reconfigurable Array example image processing: SNN filter ( mainly a pipe network) coming close to programmer‘s mind set (much closer than FPGA) compiled by Nageldinger‘s KressArray Xplorer (Juergen Becker‘s CoDe-X inside) array size: 10 x 16 = 160 such rDPUs rout thru only not used backbus connect ASM rDPU . . . . . 32 bits wide mesh-connected; exceptions: see 3 x 3 fast on-chip RAM note: kind of software perspective, but without instruction streams  datastreams+ pipelining 52

Outline von Neumann overhead hits the memory wall 9 November 2018 Outline von Neumann overhead hits the memory wall The manycore programming crisis Reconfigurable Computing is the solution We need a twin paradigm approach Conclusions 53

Software / Configware Co-Compilation 9 November 2018 apropos compilation: Software / Configware Co-Compilation Analyzer / Profiler SW code SW compiler para d igm “vN" machine CW Code CW anti machine paradigm Partitioner C language source FW Code Juergen Becker 1996 The CoDe-X co-compiler But we need a dual paradigm approach: to run legacy software together w. configware Reconfigurable Computing: Technology is Ready. -- Users are Not ? Both, partitioner and DPSS, use simulated annealing for mapping and optimization.. 54

Curricula from the mainframe age (procedural) structurally disabled the education wall the main problem non-von-Neumann accelerators (this is not a lecture on brain regions) no common model the common model is ready, but users are not not really taught 55

We need a twin paradigm education each side needs its own common model procedural structural (this is not a lecture on brain regions) Brain Usage: both Hemispheres 56

RCeducation 2008 http://fpl.org/RCeducation/ teaching RC ? 9 November 2018 RCeducation 2008 teaching RC ? The 3rd International Workshop on Reconfigurable Computing Education April 10, 2008, Montpellier, France http://fpl.org/RCeducation/ 57

9 November 2018 We need new courses 2007 We need undergraduate lab courses with HW / CW / SW partitioning We need new courses with extended scope on parallelism and algorithmic cleverness for HW / CW / SW co-design “We urgently need a Mead-&-Conway-like text book “ [R. H., Dagstuhl Seminar 03301,Germany, 2003] Here it is ! 58

Outline von Neumann overhead hits the memory wall 9 November 2018 Outline von Neumann overhead hits the memory wall The manycore programming crisis Reconfigurable Computing is the solution We need a twin paradigm approach Conclusions 59

9 November 2018 Conclusions We need to increase the population of HPC-competent people [B.S.] We need to increase the population of RC-competent people [R.H.] Data streaming is the key model of parallel computation – not vN Von-Neumann-type instruction streams considered harmful [RH] But we need it for some small code sizes, old legacy software, etc. … The twin paradigm approach is inevitable, also in education [R. H.]. 60

An Open Question Which effect is delaying the break-through? Coarse-grained arrays: technology ready*, users not ready *) offered by startups (PACT Corp. and others) **) “FPGAs? Do we need to learn hardware design?” Much closer to programmer’s mind set: really much closer than FPGAs** Which effect is delaying the break-through? please, reply to: 61

9 November 2018 thank you 62

9 November 2018 END 63