Reiner Hartenstein University of Kaiserslautern

Reiner Hartenstein University of Kaiserslautern
July 8, 2002, ENST, Paris, France Reiner Hartenstein University of Kaiserslautern Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design Part 3: Resources for RC and Data-Stream-based Computing -

Schedule time slot Reconfigurable Computing (RC) coffee break
10.00 – 11.00 Reconfigurable Computing (RC) 11.00 – 11.30 coffee break 11.30 – 12.30 Data-Stream-based Computing 12.30 – 14.00 lunch break 14.00 – 15.00 Resources for RC and Data-Stream-based Computing 15.00 – 15.30 Recent developments 15.30 – 16.00 Discussion 2

>> Configware Industry
Terminology MoPL data-procedural language Anti architecture and circuitry Stream-based Memory Architecture 3

Configware heading for mainstream
Configware market taking off for mainstream FPGA-based designs more complex, even SoC No design productivity and quality without good configware libraries (soft IP cores) from various application areas. FPGA vendors and a growing no of independent configware houses (soft IP core vendors) and design services . 4

Cadence, Mentor, Synopsys just jumped in.
OS for PLDs separate EDA software market, comparable to the compiler / OS market in computers, Cadence, Mentor, Synopsys just jumped in. < 5% Xilinx / Altera income from EDA software Alliances with hundreds of partners providing hundreds of IP cores, synthesizable (hopefully) (WWW sites difficult to navigate) 5

>> Terminology Configware Industry Terminology
MoPL data-procedural language Anti architecture and circuitry Stream-based Memory Architecture 6

Terminology 7

Terminology & Acronyms
DPU: datapath unit DPA: datapath array rDPU: reconfigurable DPU rDPA: reconfigurable DPA RC: reconfigurable computing RL: reconfigurable logic RA: reconfigurable array Software (SW): procedural sources* Configware (CW): structural sources Hardware (HW): hardwired platforms ASIC: customizable hardwired platforms Flexware (FW): reconfigurable platforms FPGA: field-programmable gate array FPL: field-programmable logic *) note: firmware is SW ! 8

Babylonial Confusion Communication between areas, and between abstraction levels – mainly because of non-intuitive, misleading or ambiguos terminology 9

>> MoPL data-procedural language
Configware Industry Terminology MoPL data-procedural language Anti architecture and circuitry Stream-based Memory Architecture 10

Fundamental Ideas available (1)
Data Sequencer Methodology Data-procedural Languages (Duality with v N) ... supporting memory bandwidth optimization Soft Data Path Synthesis Algorithms Parallelizing Loop Transformation Methods Compilers supporting Soft Machines SW / CW Partitioning Co-Compilers 11

Fundamental Ideas available (2)
Programming Xputers Similarities to programming computers How not to get confused by similarities What benefits vs. Computers ? 12

Programming Language Paradigms
easy to learn 13

Similar Programming Language Paradigms
very easy to learn 14

JPEG zigzag scan pattern
EastScan is step by [1,0] end EastScan; SouthScan is step by [0,1] endSouthScan; *> Declarations JPEG zigzag scan pattern goto PixMap[1,1] HalfZigZag; SouthWestScan uturn (HalfZigZag) HalfZigZag 1 3 2 4 published in 1993 HalfZigZag x y NorthEastScan is loop 8 times until [*,1] step by [1,-1] endloop end NorthEastScan; data counter data counter SouthWestScan is loop 8 times until [1,*] step by [-1,1] endloop end SouthWestScan; HalfZigZag is EastScan loop 3 times SouthWestScan SouthScan NorthEastScan endloop end HalfZigZag; data counter data counter 15

>> Anti architecture and circuitry

GAU generic address unit Scheme
GAG = Address Generatorc Generic B [ D A | L ] limit DA Limit Slider L0 Base Slider B0 Address Stepper all 3 are copies of the same BSU stepper circuit A GAU 17

BSU: Basic Stepper Unit
L Limit ] init tag B Base [ D A stepVector | Step Counter =o maxStepCount + / – stepper D A L B [ ] | limit A Address sequencing Escape Clause End Detect endExec GAG = Address Generator Generic BSU = Stepper Unit Basic 18

GAG Complex Sequencer Implementation
VLIW stack GAG GAU GAU SDS Limit Slider Base GAU Address Stepper B0 DA L0 A Limit Slider Base GAU Address Stepper B0 DA L0 A Limit Slider Base GAU Address Stepper B0 DA L0 A Generic Address Generator 19

Generic Sequence Examples
Limit Slider Base GAU Address Stepper B0 DA L0 A atomic scan linear scan a) b) video scan -90º rotated video scan c) -45º rotated (mirx (v scan)) until sheared video scan non-rectangular video scan zigzag video scan f) g) spiral scan d) e) feed-back-driven scans perfect shuffle 20

Slider Demo 21 DA A address floor F B B0 L0 Base Address Limit Slider
GAU floor F B0 Base Slider Address Stepper DA A L0 Limit Slider address B 21

XMDS Scan Pattern Editor GUI
22

>> Stream-based Memory Architecture

MoM Xputer Architecture
published in 1990 rDPA Multiple RAM banks Smart memory interface Scan Window „Cache“ 24

Antimachine: MoM architecture
25

Linear Filter Application 11 x 22: initial
[Dissertation Michael Herz] 9 x 20 = 180 1620 26

Linear Filter: scanline unrolling
3 x 20 = 60 900 27

90o Rotation of Scan Pattern
3 x 10 = 30 600 28

Linear Filter Application: final
Parallelized Merged Buffer Linear Filter Application with example image of x=22 by y=11 pixel after inner scan line loop unrolling final design after scan line unrolling hardw. level access optim. initial design Speed-up factor: 11,2 29

MoM Application Examples
Image Processing Grid-based design rule check [1983*] 4 by 4 word scan cache Pattern-matching based Our own nMOS „DPLA“ design design rule violation pixel map automatically generated from textual design rules 256 M&C nMOS, 800 single metal CMOS Speed-up > vs. Motorola 68000 *) „machine“ not yet discovered 30

MoM Architecture Features
Scan Cache Size adjustable at run time Any other shape than square supported 2-dimensional memory space Supports generic „scan patterns“ Subject of parallel access transformations compare Francky Cathoor et al . Supports visualization 31

Hot Research Topic: Memory Architectures
High Performance Embedded Memory Architectures [Cathoor et al.] High Performance Memory Communication Architectures [Herz] Custom Memory Management Methodology [Cathoor et al] Data Reuse Transformations [Kougia et al.] Data Reuse Exploration [Soudris, Wuytak] Rapidly greowing market: IP cores, module generators ets. 32

Processor Memory Performance Gap
1 10 100 1000 Performance 1980 1990 2000 µProc 60%/yr.. DRAM 7%/yr.. Processor-Memory Performance Gap: (grows 50% / year) CPU von Neumann bottleneck 33

rDPAs: classical cache does not help
Stream-based arrays are a memory bandwidth problem super pipe networks, no parallel computers ! the memory bandwidth problem is often more dramatic then for microprocessors classical interleaving is not practicable, since based on sequential instruction streams classical caches do not help, since instruction sequencing is not used the problem: throughput of parallel data streams, not instruction streams 34

however, the anti machine has no v.N. bottleneck!
Cache does not help .... however, the anti machine has no v.N. bottleneck! 35

Data-Stream-based Soft Anti Machine
“instructions” Compiler rDPA Memory (data memory) memory bank ... Scheduler Sequencers (data stream generator) 36

The Disk Farm? or a System On a Card?
[Gordon Bell, Jim Gray, ISCA2000] The 500GB disc card LOTS of bandwidth A few disks replaced by >10s Gbytes RAM and a processor 14" MicroDrive:1.7” x 1.4” x 0.2” : ? 1999: 340 MB, 5400 RPM, 5 MB/s, 15 ms seek 2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW) Integrated IRAM processor 2x height Connected via crossbar switch growing like Moore’s law 16 Mbytes; ; 1.6 Gflops; 6.4 Gops 10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tflops 37

>>> Coarse Grain
- END - 38

Appendix - APPENDIX - 39

Alliances Alliances 40

Xilinx Alliances The Software AllianceEDA Program
... Xilinx Inc.'s Foundation... free WebPACK downloadable tool palette The Xilinx XtremeDSP Initiative (with Mentor Graphics) MathWorks / Xilinx Alliance. The Wind River / Xilinx alliance # 41

The Software Alliance EDA Program
provides a wide selection of EDA tools Acugen Software, Agilent EEsof EDA, Aldec, Aptix, Auspy Development, Cadence, Celoxica, Dolphin Integration, Elanix, Exemplar, Flynn Systems, Hyperlynx, IKOS Systems, Innoveda, Mentor Graphics, MiroTech, Model Technoloy, Protel International, Simucad, SynaptiCAD, Synopsys, Synplicity, Translogic, Virtual Computer Corporation. helps leading EDA vendors to integrate Xilinx Alliance software tightly into their tools 42

The Xilinx AllianceCORE program
a cooperation between Xilinx and third-party core developers, to produce a broad selection of industry-standard solutions for use in Xilinx platforms. - Partners are: Amphion Semiconductor, Ltd. ARC Cores CAST, Inc. DELTATEC Derivation Systems, Inc. Dolphin Integration (Grenoble) Eureka Technology Inc. Frontier Design Inc. GV & Associates, Inc. inSilicon Corporation iCODING Technology Inc. Loarant Corporation Mindspeed Technologies - A Conexant Business (formerly Applied Telecom) | MemecCore Mentor Graphics Inventra NewLogic Technologies, Inc. (Europe) NMI Electronics Paxonet Communications, Inc. Perigee, LLC Rapid Prototypes Inc. sci-worx GmbH (Hannover, Germany) SysOnChip TILAB (Telecom Italia Lab) VAutomation Virtual IP Group, Inc. XYLON. 43

The Xilinx Reference Design Alliance Program
The Xilinx Reference Design Alliance Program helps the development of multi-component reference designs that incorporate Xilinx devices and other semiconductors. The designs are fully functional, but no warranties, no liability. Partners are:. JK microsystems, Inc. LYR Technologies NetLogic Microsystems ADI Engineering Innovative Integration 44

The Xilinx University Program
The Xilinx University Program provides Xilinx Student Edition Software, Professor Workshops, a Xilinx University User Group, Presentation Materials and Lab Files, Course Examples, Research, Books, etc. 45

Altera offers over a hundred IP cores (1)
Altera offers over a hundred IP cores like, for example: modulator, synchronizer, DDR SDRAM controller, Hadamar transform, interrupt controller, Real86 16 bit microprocessor, floating point, FIR filter, discrete cosine, ATM cell processor, and many others. controller, UART, microprocessor, decoder, bus control, USB controller, PCI bus interface, viterbi controller, fast Ethernet MAC receiver or transmitter, 46

Altera offers over a hundred IP cores (2)
from Altera | AMIRIX Systems, Inc. Amphion Semiconductor, Ltd. Arasan Chip Systems, Inc. CAST, Inc. Digital Core Design Eureka Technology Inc. HammerCores Innocor Ktech Telecommunications, Inc. Lexra Computing Engines Mentor Graphics - Inventra Modelware Ncomm, Inc. NewLogic Technologies Northwest Logic Nova Engineering, Inc. Palmchip Corporation Paxonet Communications PLD Applications Sciworx Simple Silicon Tensilica TurboConcept. 47

Altera IP core design services
Altera IP core design services are available from: Northwest Logic 48

Altera Certified Design Center (CDC) Program
Barco Silex El Camino GmbH Excel Consultants Plextek Reflex Consulting Sci-worx Tality Zaiq Technologies. 49

The Altera Consultants Alliance Program (ACAP):
The Altera Consultants Alliance Program (ACAP): lists 41 offices in North America and 29 in the rest of the world. 50

Devlopment boards are offered from: Altera El Camino GmbH
Gid'el Limited Nova Engineering, Inc. PLD Applications Princeton Technology Group RPA Electronics Design, LLC Tensilica. 51

Consultants and services not listed by Xilinx nor Altera (index)
Flexibilis, Tampere, Finland, Geoff Bostock Designs, Wiltshire, England, Great River Technology, Alberquerque, NM, New Horizons GB Ltd, United Kingdom, North West Logic Silicon System Solutions, Canterbury, Australia, Smartech, Tampere, Finland, Tekmosv, Austin, Texas, The Rockland Group, Garden Valley, CA Nick Tredennick, Los Gatos, California, Vitesse, Algotronix, Edinburgh, Andraka Consulting Group Arkham Technology, Pasadena, CA Barco Silex, Louvain-la-Neuve, Belgium, Bottom Line Technologies, Milford, NJ Codelogic, Helderberg, South Africa, Coelacanth Engineering, Norwell, MASS Comit Systems, Inc., Santa Clara, CA EDTN Programmable Logic Design Center 52

Consultants and services not listed by Xilinx nor Altera (1)
Algotronix, Edinburgh, Reconfigurable Computing and FPL in software radio, communications and computer security Andraka Consulting Group high performance FPGA designs for DSP applications Arkham Technology, Pasadena, low cost IP cores for Xilinx and Atmel, embedded processor, DSP, wireless communication, COM / CORBA / DirectX, client-server database programming, software internationalization, PCB design Barco Silex, Louvain-la-Neuve, Belgium, IP integration boards for ASIC and FPGA, consultancy, design, sub-contracting 53

Bottom Line Technologies, Milford, New Jersey, FPGA design, training, designing Xilinx parts since 1985 Codelogic, Helderberg, South Africa, consulting, FPGA design services Coelacanth Engineering, Norwell, Massachusetts, design services, test development services, in wireless communication, DSP-based instrumentation, mixed-signal ATE Comit Systems, Inc., Santa Clara, California, DSP, ASIC, networking, embedded control in avionics -- FPGA / ASIC design and system software EDTN Programmable Logic Design Center 54

FirstPass, Castle Rock, Colorado Vitesse, ASIC design Flexibilis, Tampere, Finland, VHDL IP cores for Xilinx products Geoff Bostock Designs, Wiltshire, England, FPGA design services Great River Technology, Alberquerque, New Mexico, FPGA design services in digital video and point-to-point data transmission for aerospace, military, and commercial broadcasters New Horizons GB Ltd, United Kingdom, FPGA design and training, Xilinx specialist North West Logic; FPGA and embedded processor design in digital communications, digital video 55

Silicon System Solutions, Canterbury, Australia, VHDL IP cores for the ASIC and FPGA/CPLD/EPLD markets Smartech, Tampere, Finland, ASIC and FPGA design Tekmosv, Austin, Texas, Multiple Designs on a Single Gate Array, HDL synthesis, design conversions, chip debug, test generation The Rockland Group, Garden Valley, California, a TeleConsulting organization about logic design for FPGAs Nick Tredennick, Los Gatos, California, investor and consultant 56

Terms Terms 57

Confusing Terminology
Computer Science and EE as well as ist R&D and applicatgion areas suffer from a babylonial confusion. Communication not only between Computer Science and EE, but also between ist special areas, even between ist different abstraction levels is made difficult – mainly because of immature terminology in relation to reconfigurable circuits and their applications. Terms are rarely standardized and often used with drastically different meanings – even within then same special area. Often terms have been so badly coined, that they are not self-explanatory, but misleading. A demonstratory example is the comparizon of terms used used in VHDL and Verilog. Ideal are "intuitive" terms. But often Intuition yields the wrong idea. Whenever a new term appears in teaching, I often have to tell the students, that the term does not mean, what he believes. 58

Terms (1) Term Meaning Example Hardware hardwired Processor, ASIC
. [à la Ingo Kreuz] Terms (1) Term Meaning Example Hardware hardwired Processor, ASIC Flexware Reconfigurable (structurally programmable) FPLA, FPGA, KressArray Firmware Microprogramme (rarely used after introduction of RISC proc.) IBM 360 Computer Family Software procedural programs (sequentially executable by a CPU) Word, C, OS, Compiler, etc. Configware structural programs, soft IP cores, personalizing CPLD, FPGA, or other Flexware for rDPA FPGA configuration, e. g. as a logic circuit, state machine, datapath, function 59

Terms (2) 60 Term Meaning Example data
. [à la Ingo Kreuz] Terms (2) Term Meaning Example data objects of computing “data” property depends on the moment of watching Bits, numbers, operands, results, any text (also compiler input) lists, graphs, tables, images, ... data stream ordered, also parallel data word lists, obtained by scheduling I/O data streams for systolic or other arrays programming personalisation by loading programm code procedural code or structural code: for (re)configuration program source text or object code for programming procedural oder structural 60

Terms (3) Term Meaning Example boot program
. [à la Ingo Kreuz] Terms (3) Term Meaning Example boot program simple program to enable programming - usually saved in non-volatile memory comparable to the starter of the motor of a car booting load and execute a boot program 61

Hardware Terms (1) Term Meaning Example machine
[à la Ingo Kreuz] Hardware Terms (1) Term Meaning Example machine execution unit, driven by deterministic sequencer von Neumann machine „dataflow machine“ not a machine, since without a deterministic sequencer (exotic concept) (sleeping research area) CPU Instruction Set Processor ("von Neumann”): program counter (instruction sequencer) and DPU - mode of operation: deterministically instruction-driven ARM, Pentium core, 62

Hardware Terms (2) 63 Term Meaning Example DPU
[à la Ingo Kreuz] Hardware Terms (2) Term Meaning Example DPU data path unit, processes operands - no CPU since without sequencer - no maschine ALU with registers, multiplexers etc. Computer CPU with RAM and interfaces Parallel Computer ensemble of several Computers Xputer deterministically data-driven Machine, (transport-triggered) - data counter(s) used instead of a program counterm MoM architectures (Kaiserslautern) dataflow machine indeterministically data-driven (execution sequence unpredictable) (sleeping research area) 63

Terms on Parallelism (1)
[à la Ingo Kreuz] Terms on Parallelism (1) Term Meaning Example parallelism several levels of parallelism distinguished parallel processes, parallelism at instruction set level, pipelines, concurrent parallel processes run on different CPUs of a parallel computer - may occasionally exchange signals or data weather prognisis, complex simulations, etc. ISP (instruction set parallelism) several CPUs run in parallel by clocked synchronization VLIW (very long instruction word) computer 64

[à la Ingo Kreuz] Terms on Parallelism (2) Term Meaning Example pipelining several uniform or different DPUs running simultaneously - connected to a pipeline by buffer registers. pipelined CPUs, pipe networks, systolic, etc. chaining several uniform or different DPUs running simultaneously - connected to a pipeline without buffer registers Schaltnetze, komplexe arithmetische Operatoren Pipe network Ensemble of DPUs, also multiple pipelines, also with irregular or wild structures systolisc arrays, stream-based computing arrays 65

[à la Ingo Kreuz] Terms on Parallelism (3) Term Meaning Example Systolic Array Pipe network with only linear (straight-on, no branching), uniform pipelines (all DPUs hardwired and with same functionality) pipelines Matrix computation, DSP, DNA sequencing, etc. stream-based computing arrays (super-systolic arrays) pipe network, configured before fabrication image processing, DSP, complex functions and algorithms (coarse grain) reconf. stream-based arrays stream-based arrays, configurable after fabrication KressArray 66

Counterparts 67 category property counterpart programing mode
[à la Ingo Kreuz] Counterparts category property counterpart programing mode procedural (classical) structural (synthesis, design) - „field-programmable“, PLA „programming“, etc. machine: principle of operation controlflow-driven (instruction-driven): v. Neumann Data-driven: Xputer machine system: principle of operation instruction-flow-driven (parallel computer etc.) Data-stream-based (systolisc array, DPU array, KressArray) Set-up time (datapaths switched thru) during run time; (instruction-driven) before run time: FPGA (at compile time) Gate Array (at fabrication) 67

- - 68

Synthesizable Memory Communication
An example by Nageldinger’s KressArray Xplorer Efficient Memory Communication should be directly supported by the Mapper Tools sequencers memory ports application not used Legend: Optimized Parallel Memory Controller 69

Opportunities by new patent laws ?
to clever guys being keen on patents: don‘t file for patent following details ! everything shown in this presentation has been published years ago 70

Reiner Hartenstein University of Kaiserslautern

Similar presentations

Presentation on theme: "Reiner Hartenstein University of Kaiserslautern"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reiner Hartenstein University of Kaiserslautern

Similar presentations

Presentation on theme: "Reiner Hartenstein University of Kaiserslautern"— Presentation transcript:

Similar presentations

About project

Feedback