After completing this module, you will be able to: Explain how power is dependent on the HDL coding style you use Describe how your designs power consumption is dependent on your use of control signals Explain how some common design techniques can improve your designs power consumption Show how some common design techniques can improve your designs power consumption Objectives
Enable rate is the dominant factor in BRAM power consumption, toggle activity is secondary –Only enable BRAM during an active read or write cycle Use smart architecting for multiple BRAM blocks Consider LUTRAM (Distributed memory) for small memory blocks Look at Enable Rates on BRAM
Minimum Area 2k x 36 Array ConceptualXilinx Block Memory Generator Tool Low power One BRAM enabled at a time 2k x 36 Array Allows selection between high performance and low power BRAM Power Optimization
XPE: Example of 16 sets of 2k bits BRAM LUTRAM XPE: Example of 16 sets of 18k bits LUTRAM BRAM 85 % power savings with LUTRAM 28 % power savings with BRAM Use LUTRAM for small storage for lowest power – This is a sample output from the Xilinx Power Estimator Spreadsheet LUTRAM vs. BRAM
Two kinds of resets – Global…usually used to reset after configuration This is done by default after configuration of the FPGA and does not need to be coded into the design Access to this net is done with the GSR port from the Startup component (only necessary if you wish to perform a global reset s second time) Note…if you are coding in a global reset into your HDL you are actually coding in a second reset Some ASIC technologies require at most an initialization when they power up. But FPGAs do not require a reset. – Local…used as a standard part of some components behavior – FSM, counters, etc Resets
The GSR input is an active-high global set/reset net that is active at the end of configuration –It uses a dedicated routing resource for signal distribution Saves general interconnect –It can also be used to restore the initial state of the FFs in the FPGA at any time The intial state is communicated with an INIT attribute It drives the output FFs for each block RAM, but does not affect the contents of each memory or SRL –It is connected to all synchronous elements through a wired OR gate This allows a local reset to also drive the FFs set/reset port Global Reset Net (GSR)
If you have a reset, you can initialize all registers in VHDL / Verilog code SR will cause the flip-flop to be set to the state inferred here –Inference is supported only for data types std_logic, bit_vector, bit, but NOT integer This is helpful for RTL simulation of the design –If it functions during simulation, it should function on the FPGA –Note…if you design without a reset in your design, you still get a free global reset VHDL: signal my_regsiter : std_logic_vector (7 downto 0) := (others <= 0); Verilog: reg [7:0] my_register = 8h00; Inferring an Initialization (XST only)
Synthesis can infer SRL-based shift registers –But only if no reset is inferred on the component (otherwise flip-flops are wasted) –Or, the synthesis tool can emulate the reset However, this will uses extra resources and take extra clock cycles to set up (not what you want) No Reset is Best
Each DSP slice effectively has more than 250 registers –None have an asynchronous reset Many designs that run out of slices are not fully utilizing their DSP slice resources –Synthesis tools will infer the DSP slice resources for multipliers, but they are not smart enough to infer other functions Can control synthesis use with attributes, but NOT if an asynchronous reset is used Block RAMs obtain minimum clock-to-output time by using their output register –Output registers only have synchronous resets Many designs that run out of slices are not fully utilizing the block RAM resources –Synthesis tools are not yet smart enough to infer less obvious functions Use a Synchronous Reset
Control the use of clock enables from the code –Code them only when needed –If a low-fanout CE is necessary, use synthesis attributes to control the use of control signals at the signal or module level Do not use global switches to turn off the use of CEs Results in an average of 25-percent LUT increase –Consider using alternative coding methods for low-fanout clock enables VHDL: Q <= ((not CE) AND A) OR (CE AND Q); Verilog: Q <= (~CE & A) | (CE & Q); VHDL: if (CE=1) then Q <= A; Verilog: if (CE=1) Q <= A; This will map the CE to the control port This will map the CE to a LUT input Clock Enable
Code properly to minimize power –Only enable BRAM during active read or write cycles –Use low power architecting for multiple Block RAM arrays (use CORE generator) –Build small memory blocks with LUTs(<4k bits) –Minimize local resets, if possible –No reset is best since the FPGA gets a global reset automatically Allows inference of SRL –Design with Synchronous Resets Enables inference of DSP slice and Block RAM output register –Dont build a reset into your design for simulation purposes Instead code for the INIT behavior –Control the use of Clock Enables Summary
