French 207 MAPLD 2005 Slide 1 Integrated Tool Suite for Post Synthesis FPGA Power Consumption Analysis Matthew French, Li Wang University of Southern California, Information Sciences Institute Tyler Anderson, Michael Wirthlin Brigham Young University
French 207 MAPLD 2005 Slide 2 FPGA Power Trends & Needs Number of F-F’s Power (mW) Clocking Frequency (MHz) Voltage (V) Internal Power Consumption Power calculated assuming 80% device utilization, 80% peak clock frequency, 12.5% toggling rate. Internal logic only, no I/O. Number of logic blocks & maximum operating frequency track Moore’s Law Voltage reduction is slower Resulting power increase is exponential Power needs to be a first class design constraint Limited power tools available –Spreadsheets Manual entry Prone to guess-timation –XPower (post-routing) At end of design cycle Profiled after timing simulation Time intensive Unwieldy file sizes Limited Reporting Only total power consumed No ability to capture power transients Limited design path if specifications not met Routing tools optimize only throughput
French 207 MAPLD 2005 Slide 3 Power Tools: Goals Push power analysis, visualization, and optimization to front of the tools chain: –Analyze power consumption at logic simulation with two levels of accuracy Pre-place-and-route, using heuristic estimates based on fanout Back-annotated with precise post-place- and-route RC data –Visualize by providing intuitive views to help the designer rapidly find and correct inefficient circuits, operating modes, data patterns, etc. –Optimize systems by automatically identifying problem paths and suggesting improvements Benefits –Closer to logical level and design entry –Power profiling during functional simulation –Early estimation before place and route –Automatic specific resource utilization power details –Facilitates high level design alternative exploration FPGA Tool Flow Proposed Power Tool Entry Point Current Power Tool Entry Point
French 207 MAPLD 2005 Slide 4 Tool Backbone: JHDL & EDIF Parser Leverage JHDL simulation Environment with EDIF Parser circuit manipulation JHDL –Java-based structural design tool for FPGAs –Circuits described by creating Java Classes –Design libraries provided for several FPGA families – JHDL design aides –Logic simulator & waveform viewer –Circuit schematic & hierarchy browser –Module Generators Circuit designer does not need to know Java! JHDL Data Structure EDIF Netlist EDIF Data Structure Manipulation Tools EDIF Parser 3 rd Party Tools EDIF Parser –Supports multiple EDIF files –Virtex2 libraries and memory initialization –Support for “black boxes” –No JHDL wrapper required – –Verified: Synplicity, Synplcity Pro, Coregen, System Generator, Chipscope JHDL Environment EDIF Parser
French 207 MAPLD 2005 Slide 5 Power Tool Flow: Timing-Level Source Code Synthesis Map Place & Route Xpower Bitgen EDIF Parser JHDL Power Analysis & Visualization Routed Circuit Model EDIF VHDL Verilog JHDL Xilinx Tool Flow.ncd To Target.pwr Power Tools Event Model Restructured –Tool Interoperability –Cross-probing Enabled Support dynamic insertion of 3 rd party (Power) tools –Circuit APIs in place –Graphical User Interfaces (GUI) support
French 207 MAPLD 2005 Slide 6 Power Visualization Tool Two views: –Instantaneous vs. cumulative power consumption over time –Sorted tree view of “worst offenders” Integrated “cross-probing” with existing JHDL tools –Unified Environment –Allows Experimentation –Smart Re-use of CPU Memory Help rapidly identify inefficient circuits and operating modes Per-cell / per-bit granularity Simulation trigger on power specification Cross Probing
French 207 MAPLD 2005 Slide 7 Post Synthesis Level Power Modeling Power Modeling –Quiescent power based on total circuit size –Dynamic Power Toggle Rates (Data Dependant) Components Used Routing Interconnect –Actual quiescent and dynamic power not known until circuit is placed and routed Leverage existing JHDL tool environment –Toggling rates derived from simulator Will lose glitching information –Components known from EDIF or JHDL primitives Component capacitance imported from Xpower –How to model routing interconnect? Do not have exact routing information at synthesis Routing tools can pick different route each iteration –Interconnect length and combinations vary ComponentCap (pF) ComponentCap (pF) FF1.21LUT1.0 SRL3.0LD1.0 INV1.0AND1.0 RAM1.0MULT17.2 DLL40.0IBUF1.0 BUFG6.0BRAM59.0 Xpower Component Capacitance InterconnectCap (pF) Long Line11.8 Hex Line0.59 Double Line0.44 Direct Connect0.29 Xpower Interconnect Capacitance
French 207 MAPLD 2005 Slide 8 Wire Power Model Analysis Developed power tools to analyze relationships Can plot capacitance vs –Fanout –Programmable Interconnect Points –Wire Length –Total Number of Nets –Total Number of Components Which relationships maintain correlation from synthesis to place and route? –Optimizer removes components, nets Can also use tools to judge routing quality –Identify Outliers –Information Available to do Power Weighted Placement and Routing Use Placement Macros in JHDL Use UCF placement and/or timing constraints Optimization Candidates
French 207 MAPLD 2005 Slide 9 Low Fanout Capacitance Variance Not all routes are created Equal Up to 60% variance on “same” route length East-West vs North- South Bias Switches sometimes use Doubles instead of Direct Connects 2.45 pF (#2727) YQ -> F2 (omux-B3) 2.37 pF (#4791) YQ -> G4 (omux-B4) 1.46 pF (#2768) YQ -> F4 (omux-A2) 0.75 pF (#131) YQ -> F2 (omux-A7) Direct ConnectDouble Wire Direct vs Double Switch Logic
French 207 MAPLD 2005 Slide 10 Capacitance vs Fanout Fanout model well correlated Secondary fit line corresponds to Macros High variance at low fanout Achieving 4.3% average error, 16% variance Explored device utilization models as well Placement Macros
French 207 MAPLD 2005 Slide 11 Resulting Power Tool Flow Source Code Synthesis Map Place & Route Xpower Bitgen EDIF Parser JHDL Power Analysis & Visualization Virtex II Power Model Routed Circuit Model EDIF VHDL Verilog JHDL Xilinx Tool Flow.ncd To Target.pwr Power Tools
French 207 MAPLD 2005 Slide 12 Power Optimization Approach Influence Xilinx Place&Route tools for power efficiency –Minimize clock/wire lengths of high power nets Use power analysis tools to identify hot-spots and generate constraints –Timing constraints on non-clock signals –Location constraints on sink flip-flops of clock signals Verify power optimization approaches –Use final circuit timing model to verify power savings Timing Constraint (ns) Placement Constraint (X,Y) bitgen Place & Route Xilinx Tool Flow.ncd Ngdbuild & Map.ncd.ucf EDIF Parser Power Tools EDIF Optimization Xpower Tool Verification vcd ModelSim vhd Verification
French 207 MAPLD 2005 Slide 13 Timing Constraint Power Optimization Wire power is optimized by reducing length –MAXDELAY constraint in UCF file defines the maximum latency a wire has Power tools contain Wire Table database –Sortable by: Average power, Toggling rate, Fanout, Load –Apply constraints Default Constraints Constraint Freq : 50 MHz Operating Freq : 50 MHz Poor Power Efficiency Power Timing Constraints Constraint Freq : 100 MHz Operating Freq : 50 MHz Better Power Efficiency Wire Table
French 207 MAPLD 2005 Slide 14 Timing Constraint Power Optimization: Preliminary Results -Power is reduced by from –1.4% to 11.8% -More constraints are not necessarily better -Can also vary amount of timing that nets are constrained by -Circuits still meet original timing specification requirements % of total nets constrained Clock (mW)Signal (mW)Total Power (mW) Clock + Signal Baseline, no constraints N/A All nets constrained 12.5% (-1.4%) Fanout < 10 constrained 11.1% (9.6%) Fanout < 4 constrained 10.6% (8.4%) Top 25% constrained 4.1% (11.8%)
French 207 MAPLD 2005 Slide 15 Location Constraint Power Optimization Power Optimization Guidelines –Minimize clock zone utilization –Group flip-flops as tightly as possible –Group flip-flops closer to clock trunks Less Power Efficient More Power Efficient Reduce clock paths by putting constraints on flip-flops locations, thus reducing the clock capacitance and power.
French 207 MAPLD 2005 Slide 16 Location Constraint Power Optimization Interface Clock table can be sorted by power, number of flip-flops etc. Users can select locations of flip-flops - Users can select how tightly flip-flops are placed - Users can define the area where flip-flops are placed The tool checks the validity of constraint areas. - Users can select which flip-flop groups are added with the constraints Clock Table
French 207 MAPLD 2005 Slide 17 Location Constraint Power Optimization Preliminary Results Clock (mW)Signal (mW)Logic (mW)Total Power (mW) Clock + Signal + Logic Baseline, no constraints All FFs Placed (33.6%) 27.6 (-38.8%) (10.6%) (22.9%) IOs in IOBs, all other FFs placed 356,251 (19.5%) 21,909 (-10%) 285,787 (0%) 663,947 (11.3%) -Individual clock net improvement ranged from -4% to 57% -Achieve up to 22.9% total power improvement -Circuits still meet timing requirement if IO buffer flip-flops are left in IOBs -Power could be further reduced if IO buffer flip-flops are not constrained to be within IOBs Unconstrained Constrained
French 207 MAPLD 2005 Slide 18 Conclusions Post-synthesis level power modeling is feasible –Some accuracy trade-offs inevitable –Quicker power results enable Capability to determine power specifications early in the design flow Feedback on design-level circuit power ramifications Tighter feedback loop to designer for more design iterations Optimization –Preliminary results encouraging –Tools do not alter original circuit functionality & use COTS inputs –Developing optimization algorithms & routines Tools are open source: This research made possible by a grant from the NASA Earth-Sun System Technology Office