Seminar at Kyushu University Reconfigurable Technologies (1) Reiner Hartenstein TU Kaiserslautern July 23, 2004, Fukuoka, Japan.

Slides:



Advertisements
Similar presentations
CASES 2002 Intl Conference on Compilers, Architectures and Synthesis for Embedded Systems Embedded Architectures: Configurable, Re-configurable, or what?
Advertisements

Reconfigurable Computing After a Decade: A New Perspective and Challenges For Hardware-Software Co-Design and Development Tirumale K Ramesh, Ph.D. Boeing.
FPGA (Field Programmable Gate Array)
Dr. Rabie A. Ramadan Al-Azhar University Lecture 3
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
The von Neumann Syndrome Reiner Hartenstein TU Kaiserslautern TU Delft, Sept 28, (v.2)
Reconfigurable Supercomputing means to brave the paradigm chasm Reiner Hartenstein HiPEAC Workshop on Reconfigurable Computing Ghent, Belgium January 28,
An Introduction to Reconfigurable Computing Mitch Sukalski and Craig Ulmer Dean R&D Seminar 11 December 2003.
The 5th IEEE Workshop on Design & Diagnosis of Electronic Circuits & Systems (DDECS'02)DDECS'02 Configware / Software Co-Design: be prepared for the Next.
Reconfigurable Supercomputing: Hindernisse und Chancen Reiner Hartenstein TU Kaiserslautern Universität Mannheim, 13. Dez
ISICT 2005 Supercomputing going Reconfigurable Reiner Hartenstein TU Kaiserslautern Jan. 4-6, 2005, Capetown, South Africa.
EECE579: Digital Design Flows
MSE 2005 Reconfigurable Computing (RC) being Mainstream: Torpedoed by Education Reiner Hartenstein TU Kaiserslautern International Conference on Microelectronic.
© 2006, Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006.
IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April , 2004.
From Organic Computing to Reconfigurable Computing Reiner Hartenstein TU Kaiserslautern PASA, Frankfurt, March 16, 2006.
Reconfigurable HPC Reconfigurable HPC part 1 Introduction Reiner Hartenstein TU Kaiserslautern May 14, 2004, TU Tallinn, Estonia.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
ECE 232 L2 Basics.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 2 Computer.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu
CS curricula update proposed: by adding Reconfigurable Computing Reiner Hartenstein TU Kaiserslautern EAB meeting, Philadelphia,1 Nov 2005.
Chapter 6 Memory and Programmable Logic Devices
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Computer Organization
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
The Transdisciplinary Responsibility of CS Curricula Reiner Hartenstein TU Kaiserslautern San Diego, CA, USA, June , 2006 THE NINTH WORLD CONFERENCE.
Intro to CS Chapt 2 Data Manipualtion 1 Data Manipulation How is data manipulated inside a computer? –How is data input? –How is it stored? –How is it.
DRRA Dynamically Reconfigurable Resource Array
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
Levels of Architecture & Language CHAPTER 1 © copyright Bobby Hoggard / material may not be redistributed without permission.
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Automated Design of Custom Architecture Tulika Mitra
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
J. Christiansen, CERN - EP/MIC
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
The variety Of Processors And Computational Engines CS – 355 Chapter- 4 `
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 12 Overview and Concluding Remarks.
Computer Organization and Design Computer Abstractions and Technology
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,
EE3A1 Computer Hardware and Digital Design
Computer Organization & Assembly Language © by DR. M. Amer.
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
Computer Architecture CPSC 350
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
FPL Sept. 2, 2003 Software Decelerators Eric Keller, Gordon Brebner and Phil James-Roxby Xilinx Research Labs.
Reconfigurable HPC Notes on datastream-based FFT
The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, HPRCTA'07 - First.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Enabling Technologies for Reconfigurable Computing Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design Part 2: Data-Stream-based.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Copyright © 2005 – Curt Hill MicroProgramming Programming at a different level.
IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April , 2004.
Introduction to Computers - Hardware
ECE354 Embedded Systems Introduction C Andras Moritz.
Embedded Systems Design
Architecture & Organization 1
Architecture & Organization 1
Embedded Architectures: Configurable, Re-configurable, or what?
Chapter 4 The Von Neumann Model
Presentation transcript:

Seminar at Kyushu University Reconfigurable Technologies (1) Reiner Hartenstein TU Kaiserslautern July 23, 2004, Fukuoka, Japan

© 2004, TU Kaiserslautern 2 The new machine paradigm Configware is going mainstream Hardware / Configware /Software do-design is the new mind set for digital systems engineering not only in embedded systems a co-education for a symbiosis of instruction-stream-based and data-stream-based concepts a dichotomy of machine paradigms is neded for qualification

© 2004, TU Kaiserslautern 3 Software to Configware Migration this talk will illustrate the performance benfit which may be obtained from Reconfigurable Computing stressing coarse grain Reconfigurable Computing (RC), point of view, so this talk hardly mentions FPGAs (But coarse grain may be mapped onto FPGAs) Software to Configware Migration is the most important source of speed-up model: Hardware is just frozen Configware

© 2004, TU Kaiserslautern 4 Terminology: „soft hardware“ ? soft hardware morphware [DARPA] Software: for scheduling instruction streams Flowware: for scheduling data streams Configware: for configuring morphware Programming sources: von Neumann primarily non-von Neumann

© 2004, TU Kaiserslautern 5 >> HPC << HPC Embedded Computing The wrong Roadmap Configware Engineering Dual Machine Paradigms Speed-up Examples Final Remarks

© 2004, TU Kaiserslautern 6 Earth Simulator 5120 Processors, 5000 pins each ES 20: TFLOPS Crossbar weight: 220 t, 3000 km of cable, moving data around inside the

© 2004, TU Kaiserslautern 7 data are moved around by software (slower than CPU clock by 2 orders of magnitude) i.e. by memory-cycle-hungry instruction streams which fully hit the memory wall extremely unbalanced stolen from Bob Colwell CPU

© 2004, TU Kaiserslautern 8 path of least resistance * : avoiding a paradigm shift Many researchers seem never to stop working on sophisticated solutions for marginal improvements continously ignoring methodologies promising speed-ups by orders of magnitude.... blinders to ignore the impact of morphware... continue to bang their heads against the memory wall instead of *) [Michel Dubois]

© 2004, TU Kaiserslautern 9 the data-stream-based approach has no von Neumann bottle- neck … understand only this parallelism solution: the instruction-stream-based approach von Neumann bottle- necks... cannot cope with this one

© 2004, TU Kaiserslautern 10 >> Embedded Computing << HPC Embedded Computing The wrong Roadmap Configware Engineering Dual Machine Paradigms Speed-up Examples Final Remarks

© 2004, TU Kaiserslautern 11 ? What’s coming next ? The History of Paradigm Shifts “Mainstream Silicon Application is switching every 10 Years” TTL µproc., memory “The Programmable System-on-a-Chip is the next wave“ custom standard Makimoto’s Wave ASICs, accel’s LSI, MSI 1 st Design Crisis 2 nd Design Crisis ? reconfigurable Published in 1989

© 2004, TU Kaiserslautern 12 Makimoto’s 3rd Wave Fine Grain Subsystems (FPGAs): –1st half of 3rd wave –universal (but less efficient) Coarse Grain Subsystems: –2nd half of 3rd wave –domain-specific –much more flexible than 2nd half of 2rd wave

© 2004, TU Kaiserslautern 13 How’s next Wave ? 2007 FPGAs custom standard Tredennick’s Paradigm Shifts procedural programming algorithm: variable resources: fixed hardwired algorithm: fixed resources: fixed 2007 ? structural programming algorithm: variable resources: variable Coarse grain RAs no further wave ! Hartenstein’s Curve ? 4 th wave ?

© 2004, TU Kaiserslautern 14 History of Silicon Application TTL µproc., memory ASICs, accel’s LSI, MSI FPGAs coarse grain soft CPUs hardware people CS people new breed needed Common terminology needed 3 different mind sets

© 2004, TU Kaiserslautern 15 History of Machine Models mainframe age main frame. compile procedural mind set: instruction-stream-based (coordinates by Makimtos wave) computer age (PC age) accel. µ Proc. compile users: RIKEN institute, ARI, Heidelberg, etc. MD-GRAPE-2 PCI board [1997] 4 chips for N-body simulation converts a PC to 64 GFlops scientific computing example: molecular dynamics, astrophysics, plasma physics, hydrodynamics:

© 2004, TU Kaiserslautern 16 History of Machine Models mainframe age main frame. compile procedural mind set: instruction-stream-based (coordinates by Makimtos wave) computer age (PC age) accel. µ Proc. compile structural mind set: data-stream-based by hardware guys design

© 2004, TU Kaiserslautern 17 History of Machine Models mainframe age main frame. compile procedural mind set: instruction-stream-based (coordinates by Makimtos wave) computer age (PC age) accel. µ Proc. compile structural mind set: data-stream-based by hardware guys design e. g. GRAPE RIKEN institute

© 2004, TU Kaiserslautern 18 the hardware / Software Chasm: typical programmers don‘t understand function evaluation without machine mechanisms (counters, state registers) It‘s the gap between procedural (instruction-stream- based) and structural (datastream-based) mind set accelerators µ processor

© 2004, TU Kaiserslautern 19 Growth Rate of Embedded Software months factor (1.4/year) [Moore ’ s law] >10 times more programmers will write embedded applications than computer software by 2010 *) Department of Trade and Industry, London Embedded software [DTI*] (~2.5/yr) already to-day, more than 98% of all microprocessors are used within embedded systems

© 2004, TU Kaiserslautern 20 typical CS graduates: the „havenots“ To-day, „ typical “ CS graduates are unqualified for this labor market … cannot cope with Hardware / Configware / Software partitioning issues … cannot implement Configware

© 2004, TU Kaiserslautern 21 Hardware / Configware / Software Partitioning skills urgently needed Algorithm partitioning HW CW SW to cope with each of it: SW, CW, HW. SW / HW SW / CW / HW SW / CW CW / HW or: to cope with any combination of co-design. Software to Configware Migration is the most important source of speed-up model: Hardware is just frozen Configware

© 2004, TU Kaiserslautern 22 By the way... International Conference on Field-Programmable Logic and Applications (FPL) Aug. 20 – Sept 1, 2004, Antwerp, Belgium 288 submissions !... the oldest and largest conference in the field: accel. µ Proc.... going into every type of application they all work on high performance

© 2004, TU Kaiserslautern 23 CS Education procedural have not You cannot * teach Hardware to a Programmer *) efficiently But to a Hardware Guy you always can teach Programming structural have natural

© 2004, TU Kaiserslautern 24 >> the wrong Roadmap << HPC Embedded Computing the wrong Roadmap Configware Engineering Dual Machine Paradigms Speed-up Examples Final Remarks

© 2004, TU Kaiserslautern 25 future HPC: completely wrong mind set The key problem, the memory wall, cannot be solved by new CPU technology We need a 2 nd machine paradigm (a 2 nd mind set...) The vN paradigm is not a communication paradigm Its monopoly creates a completely wrong mind set We need an architectural communication paradigm But we need both paradigms: a dichotomy beef up old architecture principles by new technology? communication is the problem – not execution!

© 2004, TU Kaiserslautern 26 3 rd machine model became mainstream computer age (PC age) accel. design µ Proc. compile (Makimtos wave) mainframe age main frame compile instruction- stream-based DPA r r µ Proc. programmable most CS curricula & HPC are still here morphware age

© 2004, TU Kaiserslautern 27 >> Configware Engineering << Supercomputing (HPC) Embedded Computing The wrong Roadmap Configware Engineering Dual Machine Paradigms Speed-up Examples Final Remarks

© 2004, TU Kaiserslautern 28 de facto Duality of RAM-based platforms traditionalnew RAM-based platformCPUmorphware (FPGA, rDPA..) „running“ on it: softwareconfigware machine paradigmvon Neumann etc.: instruction-stream-based anti machine: data-stream-based 2 nd paradigm We now have 2 types of programmable platforms hardware viewed as frozen configware: just earlier binding

© 2004, TU Kaiserslautern 29 [Gordon Bell]... going into every type of application [Gordon Bell].... the brain hurts CW has become mainstream... Others experienced, that the brain hurts, when trying the paradigm shift The HPC scene believed to be smart, when smiling about us CW guys morphware: fastest growing sector of the IC market

© 2004, TU Kaiserslautern 30 DPA morphware age r r From Software to Configware Industry structural personalization: RAM-based Repeat Success Story by a 2 nd Machine Paradigm ! Growing Configware Industry computer age (PC age) µ Proc. compile Procedural personalization via RAM-based. Machine Paradigm Software Industry 1) 2) Software Industry’s Secret of Success anti machine

© 2004, TU Kaiserslautern 31 benefit from RAM-based & 2 nd paradigm RAM-based platform needed for: flexibility, programmability avoiding the need of specific silicon mask cost: currently 2 mio $ - rapidly growing 1) simple 2nd machine paradigm needed as a common model: to avoid the need of circuit expertize needed to to educate zillions of programmers 2)

© 2004, TU Kaiserslautern 32 configware resources: variable Nick Tredennick’s Paradigm Shifts explain the differences 2 programming sources needed flowware algorithm: variable Configware Engineering Software Engineering 1 programming source needed algorithm: variable resources: fixed software CPU

© 2004, TU Kaiserslautern 33 Compilation: Software vs. Configware source program software compiler software code Software Engineering configware code mapper configware compiler scheduler flowware code source „ program “ Configware Engineering placement & routing data

© 2004, TU Kaiserslautern 34 Compilation: Software vs. Flowware source program software compiler software code Software Engineering flowware compiler scheduler flowware code source „ program “ Flowware Engineering data for hardwired anti machine

© 2004, TU Kaiserslautern 35 DPA x x x x x x x x x | || xx x x x x xx x -- - input data streams xx x x x x xx x x x x x x x x x x | | | | | | | | | | | | | | output data streams „ data streams “ time port # time port # time port # Flowware defines:... which data item at which time at which port Flowware programs data streams

© 2004, TU Kaiserslautern 36 Flowware: not new computer age (PC age) accel. design µ Proc. compile (Makimtos wave) mainframe age main frame compile DPA r r µ Proc. morphware age *) no confusion, please: no „ dataflow machine “ !!! data stream*... Flowware: around

© 2004, TU Kaiserslautern 37 data streams * : not new 1980: data streams (Kung, Leiserson: systolic arrays) 1989: data-stream-based Xputer architecture 1990: rDPU (Rabaey) 1994: Flowware Language MoPL (Becker et al.) 1995: super systolic array (rDPA) + DPSS tool (Kress) 1996+: Streams-C language, SCCC (Los Alamos), SCORE, ASPRC, Bee (UC Berkeley), DSP-C, Brook, : configware / software partitioning compiler (Becker) *) please, don ‘ t confuse with „ data flow “

© 2004, TU Kaiserslautern 38 >> Dual Machine Paradigms << HPC Embedded Computing The wrong Roadmap Configware Engineering Dual Machine Paradigms Speed-up Examples Final Remarks

© 2004, TU Kaiserslautern 39 Why a new machine paradigm ??? The anti machine as the 2 nd paradigm is the key to curricular innovation rDPA µ processor... a Troyan horse to introduce the structural domain to the procedural-only mind set of programmers Programming by flowware instead of software is very easy to learn Flowware education: no fully fledged hardware expert needed to program embedded systems (... same language primitives)

© 2004, TU Kaiserslautern 40 von Neumann vs. anti machine progra m counter DPU CPU RAM memory von Neumann bottleneck (r) DPA without sequencer no CPU ! asMA: auto-sequencing Memory Array asM (r) DPA data stream machine (anti machine) data counter memory bank asM asM: auto-sequencing Memory instruction stream machine (von Neumann etc.)

© 2004, TU Kaiserslautern 41 Behavior of the Counter data counter memory bank asM progra m counter DPU CPU programmed by Flowware data streams programmed by Software (r) DPA programmed by Flowware a Communication Paradigm an Execution Paradigm

© 2004, TU Kaiserslautern 42 Counters: the same micro architecture ? data stream machine (anti machine) data counter memory bank asM progra m counter DPU CPU instruction stream machine: (von Neumann etc.) yes, is possible, but for data counters... *) for history of AGUs see Herz et al.: Proc. ICECS 2002, Dubrovnik, Croatia... a much better AGU methodology is available* AGU: address generator unit asM not new -- history: DMA „DMA controls memory without using the CPU“

© 2004, TU Kaiserslautern 43 commercial rDPA example: PACT XPP - XPU128 XPP128 rDPA Evaluation Board available, and XDS Development Tool with Simulator buses not shown rDPU Full 32 or 24 Bit Design working silicon 2 Configuration Hierarchies © PACT AG, (r) DPA

© 2004, TU Kaiserslautern 44 XPP64A: Platform Development Board - SDR Board In Debug Phase -> XPP64A Chips from STMicro Fab - Assembly & Test / Available March 2003

© 2004, TU Kaiserslautern 45 mapping algorithms efficently onto rDPA: by DPSS: based on simulated annealing [Ulrich Nageldinger] not used backbus connect rout thru only SNN filter on KressArray array size: 10 x 16 = 160 rDPUs à 32 bit Compilers play a key role in mapping a problem to a platform. [ Bill Dally, WCAE’04 ] Not moving data to operator inefficiently at run time, but place rDPU into data stream at compile time Many problems are better solved at compile time [ Bill Dally ] coarse grain morphware (no FPGA): area-efficient

© 2004, TU Kaiserslautern 46 symbiosis of machine models computer age (PC age) accel. design µ Proc. compile (Makimtos wave) mainframe age main frame compile morphware age DPA r r µ Proc. replace PC by PS co-compiler symbiosis

© 2004, TU Kaiserslautern 47 Software / Configware Co-Compilation Analyzer / Profiler SW code SW compiler paradigm “vN" machine CW Code CW compiler anti machine paradigm Partitioner Resource Parameters supporting different platforms Juergen Becker’s CoDe-X, 1996 High level PL source FW Code

© 2004, TU Kaiserslautern 48 >> Speed-up Examples << HPC Embedded Computing The wrong Roadmap Configware Engineering Dual Machine Paradigms Speed-up Examples Final Remarks

© 2004, TU Kaiserslautern 49 Better solutions by Configware Memory cycles minimized e.g.: no instruction fetch at run time & other effects Memory access for data: caches do not help anyhow Loop xforms: no intra-stream data memory cycles Complex address computation: no memory cycles No cache misses! instead of software methodologies not new: high level synthesis (1980+) loop transformations (1970+) many other areas

© 2004, TU Kaiserslautern 50 speed-up examples platformapplication examplespeed-up factormethod PACT Xtreme 4-by-4 array [2003] 16 tap FIR filterx16 MOPS/mW straight forward MoM anti machine with DPLA* [1983] grid-based DRC** 1-metal 1-poly nMOS *** 256 reference patterns > x1000 (computation time) multiple aspects *) DPLA: MPC fabr. via E.I.S. multi univ. project key issue: algorithmic cleverness **) Design Rule Check CPU 2 FPGA [FPGA 2004] migrate several simple application exampes x7 – x46 (compute time) hi level synthesis ***) for 10-metal 3-poly cMOS expected: >> x10,000 DSP 2 FPGA [Xilinx ] from fastest DSP: 10 gMACs to 1 teraMAC X 100 (compute time) not spec. 2) Wim Roelandts

© 2004, TU Kaiserslautern 51 hypothetical branching example to illustrate time-to-space migration *) if no intermed. storage in register file C = 1 simple conservative CPU example memory cycles nano seconds if C then read A read instruction1100 instruction decoding read operand*1100 operate & reg. transfers if not C then read B read instruction1100 instruction decoding add & store read instruction1100 instruction decoding operate & reg. transfers store result1100 total 5500 S = R + (if C then A else B endif); S + ABR C clock 200 MHz (5 nanosec) =1 section of a major pipe network on rDPU no memory cycles: speed-up factor = 100

© 2004, TU Kaiserslautern 52 rDPA (coarse grain) vs. FPGA (fine grain) roughly: area efficiency (transistors/chip, orders of magnitude) hardwired 4 FPGA 2 µProc 0 rDPA 4 roughly: performanc e (MOPS/mW, orders of magnitude) hardwired 3 FPGA 2 µProc 0 rDPA 3 DSP 1 Status: ~1998 commodity

© 2004, TU Kaiserslautern 53 Why the speed-up although FPGA is clock slower by x 3 or even more (most know-how from „ high level synthesis “ discipline) moving operator to the data stream (before run time) support operations: no clock nor memory cycle decisions without memory cycles nor clock cycles most „ data fetch “ without memory cycle

© 2004, TU Kaiserslautern 54 >> Final Remarks << HPC Embedded Computing The wrong Roadmap Configware Engineering Dual Machine Paradigms Speed-up Examples Final Remarks

© 2004, TU Kaiserslautern 55 First Indications of Change 10th RAW at IPDPS, Nice, France, April 2003: after a decade of non-overlap: first IPDPS people coming HPC Asia th Int‘l Conference on High Performance Computing, July 20-22, 2004 Omiya Sonic City, Tokyo Area, Japan: Workshop on Reconfigurable Systems f. HPC (RHPC) + keynote address * HPCA-11, 11th International Symposium on High-Performance Computer Architecture, San Francisco, Febr , 2005: topic area explicitely: Embedded and reconfigurable architectures SBAC-PAD th Symposium on Computer Architecture and High Performance Computing, Foz do Iguacu, PR, Brazil, October 27-29, 2004: topic area explicitely: Reconfigurable Systems *) keynote speaker: PARS & Speed-up, Basel, Switzerland, March 2003: keynote address * IPDPS, Santa Fe, NM, USA, April 2004: keynote address * PDP’04, La Coruna, Spain, Febr. 2004: keynote address *

© 2004, TU Kaiserslautern 56 HPC experts coming... Simulation of Star Clusters: x10 speed-up by supercomputer-to-morphware migration (also molecular biology et al.) Rainer Spurzem, University of Heidelberg Reinhard Maenner, University of Mannheim HPC pioneer since 1976 (Physics Dept Heidelberg) Configware by Astrophysics by example: N-body problem going configware paper already at FPL ARI, Astrononisches Rechen-Institut, founded 1700 in Berlin, moved 1945 to Heidelberg by August Kopff Gottfried Kirch August Kopff

© 2004, TU Kaiserslautern 57 August Kopff 18 th Director, Astrononisches Rechen-Institut (ARI) discovered the Kopff comet, Koenigstuhl Observatory, Heidelberg, Germany, 1906 Copyright © 1996 by Masayuki Suzuki discovered the asteriod 631 Philippina, 21 March 1907, which became the first asteroid ever visited by spacecraft - on the Galileo mission to Jupiter The Galileo spacecraft's 14-year odyssey came to an end on Sunday, Sept. 21, 2003

© 2004, TU Kaiserslautern 58 Conclusions We need an academic grass roots movement, for.... RC has become mainstream in all kinds of applications... by a merger with the embedded systems mind set CS education deficits: a curricular revision is overdue...free material & tools for undergraduate lab courses to program and emulate small SW/CW/HW examples all know-how needed readily available: get involved !

© 2004, TU Kaiserslautern 59 END

© 2004, TU Kaiserslautern 60 Edholm‘s Law of Bandwidth wireless wireline nomadic year bits per second Ethernet b g MIMO UMTS 56 kBit modem 28.8 kBit modem GSM pager 9.6 kBit modem [ IEEE Spectrum July 2004 ]