System-level power analysis and estimation September 20, 2006 Chong-Min Kyung
Power Estimation & Analysis ; power calculation needs three models ; architecture, component, and activity clock & power network Lower-level specification Architecture ; component allocation Scheduling operations
Estimation vs. Analysis Analysis ; –for a given structure, i.e., netlist of components Estimation (=design prediction followed by analysis) ; –when the information on the structure of the design is incomplete –Used to explore different design alternatives, and find the best Example ; to estimate the interconnect power, one needs a floorplan prediction with clock and power network –In exploring the alternatives, often times, maintaining relative order between the prediction and actual implementation is enough.
System-level power analysis System-level design Process ; –1) allocation of components –2) partitioning system’s task onto these components (or, sub-systems) –3) organizing cooperation among components bound System-level design Inputs ; –Specification ; E.g., CDFG… –Environmental constraints ; E.g., performance, power, cost, form factors, TTM, number/load of I/O’s –Design space restriction ; E.g., enforced using some cores, available chip area, bus structure, etc.
Implementation model Should be used when execution model is not available, typically using spread sheet Usually start with a platform ; HW- and SW-platform Basically three components ; –COTS (off-the-shelf components); maybe only a single figure available from vendors such as for a processor Guess based on experience, know-how –Customer-specific module; Needs estimation based on prediction on number of gates, activity factor, and technology scaling factor Power consumption of this module may be insignificant, but its use can replace the power-hungry processor. –Communication power; Data transfer between blocks Clock power, cross-coupling What was ignored; software structure, data
Execution model Typically given as a program in C, HDL, SystemC, or some heterogeneous combination of these Allows more detailed power analysis as the dynamic system behavior is simulated ; –component power model, –system architecture, and –component activation pattern needed For example, BFM (bus functional model) and the activity information for each processor components such as issue queue, branch prediction unit, execution unit, cache, register file are needed
Memory model DTSE work by Catthor –Assume that memory is the dominant power consuming part in signal processing applications –Memory optimization in terms of power should be dome first –Objective; increase data locality Suppress memory access Optimize memory hierarchy –By doing Perform global loop and control flow transformations Data reuse analysis Storage cycle distribution Memory allocation and assignment In-place optimization
Memory model Memory chip ; power model is available in the data sheet Compiled memory core ; –Power model should be parameterized, at least, in terms of size. For that, simulation model is needed. But due to flat hierarchy simulation model of memory takes too long time. –Therefore, abstraction model is needed. Capacitance model is difficult to get as it reveals critical information of the memory vendor. -> Functional models not disclosing any internal cell structure is okay.
Other things to include in the execution model Interconnect power model –Input ; physical layout and material properties –Built based on measurement and simulation –However, on-chip interconnect is difficult to model, especially when complex bus encoding is used. Models for power management policy –Hardware for DPM (dynamic power mgmt) –Software –RTOS
Algorithm-Level Power Estimation in Orinoco Activity estimation ; –Code instrumentation ; inserts protocol statements to capture the activity during execution Architecture estimation ; –High-level synthesis ; Scheduling Allocation Binding –Physical Planning Floorplanning Clock tree generation
Algorithmic-level power estimation and analysis Algorithmic-level design –Objective; optimize in terms of performance, cost and power –Means; Selection of algorithm performing the requested function Optimization of the algorithm Partitioning the algorithm into HW and SW
Algorithm selection ; selecting the most power- efficient one –Comparison is based on the most power-efficient realization without actual implementation. Optimization ; –Reducing # of control statements, e.g., by loop unrolling, local statement reordering, memory access reordering –Floating-point for SW vs. fixed-point arithmetic for HW Partitioning ; –Trade-off analysis between HW and SW implementations –SW-to-HW transformation ; moving the computational kernels of the algorithm to power-optimized application- specific hardware No need for consecutive control steps to perform a single instruction No need for memory access to find out what to do next Minimal datapath just for performing the given task Maximal concurrency exploitable compared to processor core