Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany.

Similar presentations


Presentation on theme: "Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany."— Presentation transcript:

1 Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany

2 - 2 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Embedded System Hardware Embedded system hardware is frequently used in a loop („hardware in a loop“): actuators

3 - 3 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Target technologies Processing units  Power efficiency of target technologies  ASICs  Processors Energy efficiency Code size efficiency and code compaction Run-time efficiency DSP processors Multimedia processors Very long instruction word (VLIW) & EPIC machines MPSoCs  Reconfigurable Hardware  Memory

4 - 4 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Energy Efficiency of FPGAs Courtesy: Philips© Hugo De Man, IMEC, 2007

5 - 5 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Reconfigurable Logic Full custom chips may be too expensive, software too slow. Combine the speed of HW with the flexibility of SW  HW with programmable functions and interconnect.  Use of configurable hardware; common form: field programmable gate arrays (FPGAs) Applications: bit-oriented algorithms like  encryption,  fast “object recognition“ (medical and military)  Adapting mobile phones to different standards. Include programmable interconnected logic blocks

6 - 6 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Types of logic blocks Multiplexer-based SRAM-based (volatile) EPROM-based (non-volatile) Inputs depend on configuration

7 - 7 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Types of interconnections  Anti-fuses: Isolation between conductions removed by short high-voltage peak Non-volatile, non-reversible.  RAM-controlled MOS- switches Volatile, reversible

8 - 8 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Generating a configuration Example: ISE design flow http://www.xilinx.com/support/ sw_manuals/xilinx8/download/qst.zip Design entry (VHDL, Verilog, State diagrams, pre-designed cores) Net lists Placement, routing

9 - 9 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Configuring a volatile FPGA with an EPROM Configuration can be copied from non-volatile memory at power-up © Xilinx

10 - 10 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund XILINX VIRTEX II FPGAs RAM-based FPGAs

11 - 11 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Virtex II Configurable Logic Block (CLB)

12 - 12 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Virtex II Slice (simplified) Look-up tables LUT F and G can be used to compute any Boolean function of  4 variables. abcdG 00000 00011 00101 00110 01001 01010 01100 01111 10001 10010 10100 10111 11000 11011 11101 11110 Example:

13 - 13 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Virtex II (Pro) Slice [© and source: Xilinx Inc.: Virtex-II Pro™ Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com]

14 - 14 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund 2 carry paths per CLB (Vertex II Pro) [© and source: Xilinx Inc.: Virtex-II Pro™ Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com] Enables efficient implementation of adders.

15 - 15 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Implementing sums of products Dedicated or chain for computing sum of products 16-input AND gate

16 - 16 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Number of resources available in Virtex II Pro devices [© and source: Xilinx Inc.: Virtex-II Pro™ Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com]

17 - 17 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Embedded Multipliers A Virtex-II Pro multiplier block is an 18-bit by 18- signed multiplier. DeviceColumnsMultipliers XC2VP2412 XC2VP4428 XC2VP7644 XC2VP20888 XC2VP308136 XC2VPX20888 XC2VP4010192 XC2VP5012232 XC2VP7014328 XC2VPX7014308 XC2VP10016444 Multipliers are connected to a switch matrix, share some bits with RAM (  MAC instruction).

18 - 18 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Interconnect Hierarchical Routing Resources

19 - 19 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Virtex II Pro Devices include up to 4 PowerPC processor cores [© and source: Xilinx Inc.: Virtex-II Pro™ Platform FPGAs: Functional Description, Sept. 2002, //www.xilinx.com]

20 - 20 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Summary Processing units  Power efficiency of target technologies  ASICs  Processors Energy efficiency Code size efficiency and code compaction Run-time efficiency DSP processors Multimedia processors Very long instruction word (VLIW) & EPIC machines MPSoCs  Reconfigurable Hardware  Memory

21 Peter Marwedel Informatik 12 Technische Universität Dortmund Memory Peter Marwedel Informatik 12 TU Dortmund Germany

22 - 22 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Memory For the memory, efficiency is again a concern:  speed (latency and throughput); predictable timing  energy efficiency  size  cost  other attributes (volatile vs. persistent, etc)

23 - 23 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Access times and energy consumption increases with the size of the memory Example (CACTI Model): "Currently, the size of some applications is doubling every 10 months" [STMicroelectronics, Medea+ Workshop, Stuttgart, Nov. 2003]

24 - 24 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Access times and energy consumption for multi-ported register files Rixner’s et al. model [HPCA’00], Technology of 0.18  m Cycle Time (ns) Area ( 2 x10 6 ) Power (W) Source and © H. Valero, 2001

25 - 25 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Reducing energy consumption with Sub-banking  shorter wires,  lower capacitances,  faster access http://www.ecse.rpi.edu/frisc/theses/ MaierThesis/FIG319.GIF

26 - 26 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund How much of the energy consumption of a system is memory-related? Mobile PC Thermal Design (TDP) System Power Note: Based on Actual Measurements 600/500 MHz uP 37% LCD 10" 19% HDD 9% Memory+Graphics 12% Power Supply 10% Other 13% Mobile PC Average System Power 600/500 MHz uP 13% LCD 10" 30% HDD 19% Memory+Graphics 15% Power Supply 10% Other 13% CPU Dominates Thermal Design Power Multiple Platform Components Comprise Average Power [Courtesy: N. Dutt; Source: V. Tiwari]

27 - 27 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund „CPU“ Power Dissipation IEEE Journal of SSC Nov. 96 Proceedings of ISSCC 94 Based on slide by and ©: Osman S. Unsal, Israel Koren, C. Mani Krishna, Csaba Andras Moritz, University of Massachusetts, Amherst, 2001 42%/40% again memory-related !

28 - 28 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Energy consumption in mobile devices [O. Vargas (Infineon Technologies): Minimum power consumption in mobile-phone memory subsystems; Pennwell Portable Design - September 2005;] Thanks to Thorsten Koch (Nokia/ Univ. Dortmund) for providing this source.

29 - 29 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Access-times will be a problem Speed gap between processor and main DRAM increases 2 4 8 245 Speed years CPU (1.5-2 p.a.) DRAM (1.07 p.a.) 31 early 60ties (Atlas): page fault ~ 2500 instructions 2002 (2 GHz µP): access to DRAM ~ 500 instructions  penalty for cache miss about same as for page fault in Atlas early 60ties (Atlas): page fault ~ 2500 instructions 2002 (2 GHz µP): access to DRAM ~ 500 instructions  penalty for cache miss about same as for page fault in Atlas [P. Machanik: Approaches to Addressing the Memory Wall, TR Nov. 2002, U. Brisbane]  2x every 2 years 1 0

30 - 30 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Predictability is a problem Embedded system systems are often real-time systems:  Have to guarantee meeting timing constraints. Predictability: For satisfying timing constraints in hard real- time systems, predictability is the most important concern; pre run-time scheduling is often the only practical means of providing predictability in a complex system [Xu, Parnas]  Time-triggered, statically scheduled operating systems Currently available caches don‘t solve the problem: Improve the average case behavior Use „non-deterministic“ cache replacement algorithms  Scratch-pad/tightly coupled memory based predictability

31 - 31 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Hierarchical memories using scratch pad memories (SPM) Address space ARM7TDMI cores, well- known for low power consumption scratch pad memory 0 FFF.. main SPM processor Hierarchy Example no tag memory SPM select Selection is by an appropriate address decoder (simple!) SPM is a small, physically separate memory mapped into the address space

32 - 32 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Comparison of currents using measurements E.g.: ATMEL board with ARM7TDMI and ext. SRAM

33 - 33 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Comparison of energy consumption Example: Atmel ARM-Evaluation board € Main memory access takes more cycles  savings (86%) larger than for current. energy reduction: / 7.06

34 - 34 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Why not just use a cache ? [P. Marwedel et al., ASPDAC, 2004] 1.Predictability? Worst case execution time (WCET) may be large

35 - 35 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Why not just use a cache ? (2) [R. Banakar, S. Steinke, B.-S. Lee, 2001] 2.Energy for parallel access of sets, in comparators, muxes.

36 - 36 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Influence of the associativity Parameters different from previous slides [P. Marwedel et al., ASPDAC, 2004]

37 - 37 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Flash Memory Based on EPROM/EEPROM-Memory floating gate FG can be charged by high voltage. Charged transistor non-conducting if row selected. Discharging with UV light. EPROM storage cell FG charged FG not charged Read amplifier Ground Row (j)

38 - 38 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund EEPROM storage cell EEPROM= electrically erasable read only memory. EEPROM needs additional transistor per bit Writing and erasing requires just electrical signals FG charged FG not charged Ground Row (j)

39 - 39 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund NOR- and NAND-Flash NOR: Transistor between bit line and ground NAND: Several transistor between bit line and ground [www.samsung.com/Products/Semiconductor/ Flash/FlashNews/FlashStructure.htm] contact

40 - 40 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Properties of NOR- and NAND- Flash memories Type/PropertyNORNAND Random accessYes No  Erase blockSlow  Fast Size of cellLarger Small ReliabilityLarger Smaller  Execute in placeYes No  Applications Code storage, boot flash, set top box Data storage, USB sticks, memory cards [ www.samsung.com/Products/Semiconduc tor/Flash/FlashNews/FlashStructure.htm]

41 - 41 -  P. Marwedel, TU Dortmund, Informatik 12, 2007 TU Dortmund Summary Processing units  Power efficiency of target technologies  ASICs  Processors Energy efficiency Code size efficiency and code compaction Run-time efficiency DSP processors Multimedia processors Very long instruction word (VLIW) & EPIC machines MPSoCs  Reconfigurable Hardware  Memory


Download ppt "Peter Marwedel Informatik 12 Technische Universität Dortmund Reconfigurable Hardware Peter Marwedel Informatik 12 TU Dortmund Germany."

Similar presentations


Ads by Google