Presentation is loading. Please wait.

Presentation is loading. Please wait.

Give qualifications of instructors: DAP

Similar presentations


Presentation on theme: "Give qualifications of instructors: DAP"— Presentation transcript:

1 ECE 636 Reconfigurable Computing Lecture 14 Power Reductions Techniques for FPGAs
Give qualifications of instructors: DAP teaching computer architecture at Berkeley since 1977 Co-athor of textbook used in class Best known for being one of pioneers of RISC currently author of article on future of microprocessors in SciAm Sept 1995 RY took 152 as student, TAed 152,instructor in 152 undergrad and grad work at Berkeley joined NextGen to design fact 80x86 microprocessors one of architects of UltraSPARC fastest SPARC mper shipping this Fall

2 Overview FPGAs generally considered power hungry compared to ASIC and processor counterparts Mostly due to unused interconnect Recent area of extensive research Device techniques Voltage scaling Sleep mode Software techniques Reduced switching Reduced capacitance

3 One cycle involves a rising and falling output.
Dynamic Power Dynamic power is required to charge and discharge load capacitances when transistors switch. One cycle involves a rising and falling output. On rising output, charge Q = CVDD is required On falling output, charge is dumped to GND Short circuit current Charge/discharge current Courtesy: Harris

4 Short circuit power <10% of dynamic power

5 FPGA Static Power Consumption
Junction leakage Gate oxide leakage Subthreshold leakage Handel-C is almost identical to ANSI-C and should be familiar to anyone that has done algorithm development. The extensions that have been put in not only control timing and parallelism, but also include constructs to interface to external logic, instantiate RAMs and define clock domains. Things that do not make sense in hardware (recursion, malloc, etc.) have been taken out of the language but can be used in simulation.

6 FPGA Static Power Consumption
Junction leakage Small fraction of leakage Gate oxide leakage When Vgs < Vt still some source-drain current Increases exponentially as Vt decreases Decreases exponentially as Vgs decreases Subthreshold leakage Increases exponentially as Vgs increases Technology trend Courtesy: Nowak

7 FPGA Power Reduction Goals
Dynamic power goals Reduce Vdd along non-critical paths Low swing signalling Use CAD approaches to limit long high-toggle paths Pdynamic = 0.5 * C * Vdd2 * f Static power goals Cut-off Vdd for unused transistors Use high Vt transistors for SRAM cells Various other voltage biasing techniques

8 Traditional Routing Switch
Courtesy: Anderson level-restoring buffer

9 Proposed Switch Designs: Anderson
Based on 3 observations: Routing switch inputs tolerant to weak-1 signals (level-restoring buffers). Considerable slack in FPGA designs  many switches can be slowed down. Most routing switches feed other routing switches. Can produce weak-1 logic signals.

10 “Basic” Switch Design VVD high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION:

11 output swing: rail-to-rail.
High-Speed Mode output swing: rail-to-rail. VVD = VDD high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION:

12 output swing: GND-to- (VDD-VTH).
Low-Power Mode high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION: output swing: GND-to- (VDD-VTH). VVD = VDD - VTH VVD

13 Sleep Mode VVD high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION:

14 Leakage Power Results: Anderson
70 60.8 Basic 60 50 39.7 38.7 40 36 % leakage power reduction vs. high-speed mode 30 20 10 0.3 LP mode Sleep mode LP mode LP mode Traditional (+unused (+used switch fanout) fanout)

15 Region Constrained Placement
Rather than just focusing on routing, consider constraining logic Most circuits exhibit locality Gayasen: FPGA’2004

16 Region Constrained Placement
Several issues to consider Size of sleep transistor Too large: increases leakage, area Too small: affects logic performance Size of region Too large: possibly unused resources, complicates placement Too small: Sleep transistors take up too much room

17 Experimental Flow: RCP
Different region sizes considered for flow Area constraints for portions of design determined by hand May encourage designers to create granular designs

18 Power Savings: RCP Note significant reduction in leakage power savings as region size increases Bottom curve primarily due to luck

19 Performance Limitation: RCP
Performance limited by use of regions Nearly 10% clock frequency reduction for many designs

20 Low-swing Signalling Techniques we have examined so far look at tinkering with supply voltage Also possible to modify wire signalling to reduce voltage swing Most of FPGA is made up of interconnect Approach targets dynamic power consumption George and Rabaey: 1997

21 Low-swing Signalling Interconnect swing is at 0.8V while rest of circuit operates at 1.5V Cascode circuitry used at sink to overcome slow speed issues 50% energy savings at cost of 25% delay

22 Alternate approach: Modifying FPGA CAD
FPGA architecture modification impact all designs- even those that don’t care about power Can placement and routing be modified to consider dynamic power Need to know which signals are high toggle Attempt to minimize length of high-toggle wires Minimize impact on performance and area Techniques fit well into our previous work on placement and routing Lamoreaux and Wilton

23 Modifying FPGA CAD Placement
Previous cost metrics for annealing considered bounding box wire length and timing costs Include additional term which considers signal switching activity

24 FPGA Placement for Power
Previous cost metrics for annealing considered bounding box wire length and timing costs Include additional term which considers signal switching activity Post-route energy reduced by 3.0%. Power decreased by 7% but delay increases by 4%

25 FPGA Routing Modifications for Power
Original routing cost function takes congestion b(n) and delay(n) into account Augment with factor that takes net activity into account Minimize length of most active nets, even in the presence of congestion.

26 FPGA Routing for Power Results
Potential benefits somewhat limited by placement Note that most nets have low activity Power is decreased by 6% but delay increased by 4%. Energy savings of about 3%

27 FPGA Embedded Memory Blocks
Embedded memory blocks (EMBs) are important parts of FPGAs Consume roughly 14% of Altera Stratix II dynamic power * Increasing in recent designs * Stratix II Low Power Applications Note, 2005

28 Embedded Memory Block Port Internal View
MClk Clk Enable Clk RAM cell BIT Bit Line Pre-charge MClk Write Data MClk Write Enable Column Mux Write Buffers Sense Amps Row Decode Read Data Read Latch Address Reducing clocking saves dynamic power

29 Power Optimization #1 Convert EMB read enable/write enable signals to associated read/write clock enable signals Limitations Each port has read or write enable control signal Embedded memory block has read enable input Before After Data Data Q Q Data Data Q Q Wr clk enable Rd clk enable Wr clk enable Rd clk enable Vcc Vcc Wren Rden Wren Write enable Read enable Rden Write enable Read enable Vcc Vcc Write Address Read Address Read Address Write Address Read Address Read Address Write Address Write Address Clock Clock

30 Implementation Conversion mode
Ties off R/W enable to RAM clock enables Doesn’t make transform if CE already present on port Combining mode AND user RAM clock enables with derived R/W clock Could impact performance Combined Write Clk Enable Write Enable User-defined Write Clk Enable

31 FPGA RAM Processing FIFO, Shift Register, RAM specification Logical-to-physical RAM processing Memory/ logic placement Create Logical Memory Placed Memory Logical RAMs/ logic RAM blocks/ logic FIFOs and Shift registers converted into logical RAMs Logical RAMs mapped to RAM blocks

32 Mapping RAM to EMBs Implementation choice can impact design area, performance, and power. Some mappings may require multiple EMBs User-defined (logical) memory Physical (EMB) memory 4K bits 4K bits 4K bits 4K bits 4k deep x 4 wide 16K bits M4K M4K M4K M4K 512K MRAM

33 Memory Organization Each EMB can be configured to have different depth and width (e.g. Stratix II M4K) All hold 4K bits Slightly lower power consumption for wider EMB configurations (not including routing) 4K words deep 1 bit wide 8 bits wide 512 words deep 128 words deep 32 bits wide

34 Area and Delay Optimal Mapping
Configure each EMB to be as deep as possible Number of address bits on each EMB same as on logical memory Area and performance efficient: no external logic needed Power inefficient: All EMBs must be active during each logical RAM access Vertical Slicing 4k words deep and 1 bit wide (4 times) Addr[0:11] Data[0:3] 4k words deep and 4 bits wide Logical memory 4 EMBs active during access EMB

35 Alternative Mapping Configure EMB to have width of logical RAM (e.g. 1Kx4) Allows shutdown of some RAMs each cycle But adds some logic Saves RAM power, adds combinational logic and register power Horizontal Slicing Addr Decoder Addr[10:11] 1K deep x 4 wide More Power Efficient: Logical memory (4 times) 1 EMB active during access Addr[0:9] 4k words deep and 4 bits wide 4 Addr[10:11] Data[0:3]

36 Multiplexer Power Increasing
RAM Slicing - Example Power reduction available with different slicing 4kx32 Dynamic Power Multiplexer Power Increasing 140 Best range 120 100 80 Dynamic Power (mW) 60 40 20 128 256 512 1k 2k 4k EMB Power Increasing Maximum Depth

37 Power Optimization #2: Power-aware RAM Partitioning
Completed placement Insert Decode and Mux Logic FIFO, Shift Register Create Logical Memory Power-aware Physical RAM processing Memory/ Logic Placement Power Library Algorithm considers possible logical to physical RAM mappings

38 Experimental Approach
40 designs evaluated Quartus 5.1 Mapped to smallest possible device and target max frequency Simulation with test vectors Power analysis with PowerPlay

39 Memory Power 21.0% average reduction for all techniques (9.7% with convert/combine)

40 Overall Core Dynamic Power
6.8% average power reduction for all techniques (2.6% with convert/combine) 35 Enable convert/ combine 30 Enable convert/ combine + mem 25 partition 20 % Dyn. Power Reduction 15 10 5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 -5 Designs

41 Design Performance 1.0% average performance loss for all techniques (0.1% for enable convert/combine) Average Design Clock Frequency 10 5 -5 % Frequency Improvement -10 Enable Convert/ -15 Combine -20 Enable Convert/ Combine + -25 Mem Partition -30 Designs

42 Enable convert/ combine + Mem partition
Results Summary Almost 7% core dynamic power reduction across all designs Some designs benefit more than others Minimal clock frequency hit for most designs Enable convert Enable convert/ combine Enable convert/ combine + Mem partition Core dynamic power -1.8% -2.6% -6.8% Memory dynamic power -6.3% -9.7% -21.0% Max clk freq -0.1% -0.2% -1.0% LUT count 0.0% 0.1% 0.7%

43 Impact of Multiple Embedded Memory Blocks
Rerun 40 designs but only allow one type of target EMB for each mapping All designs targeted to Stratix II EP2S180 Significant power impact for most designs versus EP2S180 target with no restrictions M512 M4K M-RAM Designs completed 23 38 4 Core dynamic power 40.4% 6.6% 47.3% Memory power 279.5% 33.3% 754.0% Max clk freq. -2.2% 0.6% -1.0% LUT count 0.4% -0.5% 0.0%

44 Summary Key to reducing RAM power is keeping clocks disabled.
Movement of read/write enables to clock enables limits dynamic activity Power-aware RAM partitioner attempts to select power-optimal mapping – combined with clock enable enhancement Overall About 21% average memory power reduction 10% enable convert/combine About 7% average dynamic power reduction 3% enable convert/combine Diversity of EMBs reduces power by 33%

45 Summary FPGA power consumption under consideration at numerous level: architecture, circuit, CAD, and physical FPGA companies just now embracing power-aware CAD, power-aware architectures on the way Many circuit-level techniques still possible RTL CAD synthesis techniques provide a promising area for exploration


Download ppt "Give qualifications of instructors: DAP"

Similar presentations


Ads by Google