Sam Xi. +, Hans Jacobson+, Pradip Bose+, gu-Yeon Wei

Quantifying Sources of Error in McPAT and Potential Impacts on Architectural Studies
Sam Xi*+, Hans Jacobson+, Pradip Bose+, gu-Yeon Wei*, and David Brooks* *Harvard university + IBM T.J. Watson Research center Distribution Statement A: Approved for Public Release, Distribution Unlimited

What is architectural power modeling?
CPU structural model Benchmark performance statistics Power and area estimates Distribution Statement A: Approved for Public Release, Distribution Unlimited

Power model accuracy is important
Power models are hard to validate, especially in academia. There is a stark contrast between the amount of time spent validating power models vs. using power models. Thousands of studies use tools like Wattch and McPAT Accurate power modeling is highly relevant for all platforms, especially power and area constrained ones like mobile SoCs. Problems: power models hard to validate, and we don’t firmly understand their accuracy Coarse-grained validation Relative or absolute? Tomorrow – make pie chart for core unit breakdown showing relative power inaccuracy Don’t expect absolute accuracy Relative accuracy is not even well established due to the coarse-grained validation methods Distribution Statement A: Approved for Public Release, Distribution Unlimited

An example of coarse-grained validation
Properly introduce McPAT. Validation chart of McPAT in published literature. First problem with this validation: total core power instead of validating models for the fetch unit and load/store unit separately (for example). This is true for three out of four cores. The other problem… Li, Sheng, et al. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO, 2009. Distribution Statement A: Approved for Public Release, Distribution Unlimited

An example of coarse-grained validation
Peak power validation using hard-coded duty cycle parameters No insight for application specific behavior Fine grained validation matters because many successive generations of commercial microprocessors make relatively small tweaks on existing designs. Certainly McPAT is not the only model that does validation like this. In this work, we are going to evaluate the accuracy of McPAT’s models for the IBM POWER7 multicore server chip in detail. Li, Sheng, et al. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO, 2009. Distribution Statement A: Approved for Public Release, Distribution Unlimited

This work will: This work will not:
Identify important sources of error in architectural power models Provide potential solutions Add new component models Assess uncore component models – core only Improve technology models First, I’d like to set some expectations here. This work is not about giving everyone a magical McPAT to fix all your problems. This is about identifying the most important sources of error and suggesting potential solutions, which may vary for Distribution Statement A: Approved for Public Release, Distribution Unlimited

Power modeling approaches
Analytical Power is calculated from capacitance equations. Empirical models or fudge factors are used for irregular components Empirical Power is calculated from pre-characterized power data and equations using existing designs. This work analyzes the tradeoff between two different power modeling approaches. Distribution Statement A: Approved for Public Release, Distribution Unlimited

Power modeling approaches
McPAT Flexible and fast Design space exploration Total cycles Total instructions Total branches DPM* Very accurate for the design Fairly rigid and inflexible Total cycles fetch stage 1 was active Total cycles where 5 instructions were decoded in 1 cycle. Validated to 5% of circuit simulations *Detailed power model This work analyzes the tradeoff between two different power modeling approaches. POWER7 is not that different from the other intel/AMD server class processors that are used with McPAT. Distribution Statement A: Approved for Public Release, Distribution Unlimited

No model captures every part of the system.
How much of POWER7 does McPAT model? Distribution Statement A: Approved for Public Release, Distribution Unlimited

McPAT only models a subset of the core
Average: 60% One bar chart here alongside numbers. Average: 35% Distribution Statement A: Approved for Public Release, Distribution Unlimited

McPAT only models a subset of the core
Modeled SRAM structures Flip-flop structures Caches, arrays, buffers, registers Unmodeled Control logic Prefetcher logic, instruction age tracking Increase font sizes. Distribution Statement A: Approved for Public Release, Distribution Unlimited

… Fetch Schedule True composition (area roughly to scale)
Register renaming tables, free lists Instruction age tracking ROB Unified issue queue Other Dispatch control Commit/ Flush Control Fetch Schedule … True composition (area roughly to scale) Distribution Statement A: Approved for Public Release, Distribution Unlimited

X X X … Fetch Schedule True composition (area roughly to scale)
Register renaming tables, free lists Instruction age tracking ROB Unified issue queue Other Dispatch control Commit/ Flush Control X Fetch Schedule X X … True composition (area roughly to scale) Distribution Statement A: Approved for Public Release, Distribution Unlimited

Some good news Power CDF Macros
Focus on just a fraction of the missing components 30% of total unit power Mention that this power data comes from DPM. Macros ≈ 20% macros modeled Distribution Statement A: Approved for Public Release, Distribution Unlimited

Can we use fudge factors for logic?
Assume that modeled and unmodeled macros are correlated in power total_power = modeled_power * FUDGE_FACTOR Distribution Statement A: Approved for Public Release, Distribution Unlimited

X X X … True composition (area roughly to scale)
Register renaming tables, free lists Instruction age tracking ROB Unified issue queue Other Dispatch control Commit/ Flush Control X Fetch and decode (IFU) Scheduling (ISU) X X … True composition (area roughly to scale) Distribution Statement A: Approved for Public Release, Distribution Unlimited

Can we use fudge factors for logic?
Assume that modeled and unmodeled macros are correlated in power total_power = modeled_power * FUDGE_FACTOR Covariance So this method could work. The next question: what are the fudge factors? Distribution Statement A: Approved for Public Release, Distribution Unlimited

Fudge factors are hard to apply
Ratio of modeled to unmodeled power varies greatly within a single benchmark and across benchmarks. Fudge factor Distribution Statement A: Approved for Public Release, Distribution Unlimited

X X X … To recap Register renaming tables, free lists
Instruction age tracking ROB Unified issue queue Other Dispatch control Commit/ Flush Control X Fetch and decode (IFU) Scheduling (ISU) X X Same bar chart here. … Distribution Statement A: Approved for Public Release, Distribution Unlimited

Architectural power models must model more control logic.
X Architectural power models must model more control logic. This brings me to my last major takeaway from this report. In this work we have emphasized the need for fine-grained validation and shown how fine grained validation identifies and eliminates many sources of error, which can be very important for architectural studies like voltage noise. Distribution Statement A: Approved for Public Release, Distribution Unlimited

Instruction fetch unit
Initial power error 150% Just show ISU plot. Instruction fetch unit Distribution Statement A: Approved for Public Release, Distribution Unlimited

Categories of error Abstraction error Modeling assumption error
Insufficient modeling detail Modeling assumption error Assumptions about underlying implementation differ from the CPU at hand Input error Incorrectly specified parameters Coding error Programming mistakes. Order of importance Distribution Statement A: Approved for Public Release, Distribution Unlimited

Insufficient modeling detail often leads to worst case approximations
Example parameter: peak_issue_width 2 read and 1 write port on register file per instruction POWER7: 6 instructions per cycle → McPAT makes an 18 ported register file (incorrect for P7!) Solution: add more parameters! This assumption is not inherently wrong, but for high-performance cores, there are other factors at play. peak_issue_width is TOO high level to model POWER7’s instruction issue rules. Don’t say that the solution is “simple” – it might be simple, but getting there was certainly not. Distribution Statement A: Approved for Public Release, Distribution Unlimited

Categories of error Abstraction error Modeling assumption error
Insufficient modeling detail Modeling assumption error Assumptions about underlying implementation differ from the CPU at hand Input error Incorrectly specified parameters Coding error Programming mistakes. Order of importance Distribution Statement A: Approved for Public Release, Distribution Unlimited

Register renaming tables
The SMT model Shared Partitioned Duplicated ALUs/FPUs Issue queues ITLB, DTLB Load/store buffers Register renaming tables Instruction buffers Naturally, different core designs will have a different set of these assumptions. Distribution Statement A: Approved for Public Release, Distribution Unlimited

Adjusting the SMT model can be tricky
Shared Partitioned Duplicated ALUs/FPUs Issue queues ITLB, DTLB Load/store buffers Register renaming tables Instruction buffers Distribution Statement A: Approved for Public Release, Distribution Unlimited

Fixing SMT for register renaming tables
RAT.local.area = <invoke CACTI for one table>; RAT.total_area = RAT.local.area * number_hardware_threads; RAT for 100 registers Because the RAT is shared per thread on POWER7, the modeled area in McPAT is 7x larger than what it should be. Mention that POWER7 is a four-way SMT processor. There are two ways to fix this: one is to modify source code, and the other is to just compensate for the duplication via the parameters – this is what we were told to do in some cases. is modeled as RAT for 100 registers Distribution Statement A: Approved for Public Release, Distribution Unlimited

Can we compensate through the parameters?
RAT for 100 registers Specify 1/4 of the number of physical registers One potentially simple way to fix this problem without modifying source code is… (in our discussions with the McPAT authors, we found that sometimes this is the intended approach) RAT for 25 registers = 100 total regs! Distribution Statement A: Approved for Public Release, Distribution Unlimited

Dividing by 4 does not always work
RAT for 25 registers Changing # of physical registers affects other structures (like the physical register files). Register renaming table power is mostly from dependency logic model, not array accesses. Solution: add more parameters! The dependency logic model was tricky to identify as the main contributor to power. Distribution Statement A: Approved for Public Release, Distribution Unlimited

<param> … </param>
X Architectural power models must model more control logic. <param> … </param> Microarchitectural component models need more detail. We added many parameters in McPAT to address errors and assumptions which were incorrect for POWER7. These new parameters caused a significant drop in power error. Isolating the correct parameters was quite tricky. Distribution Statement A: Approved for Public Release, Distribution Unlimited

Other affected structures
I-cache, D-cache Cache prefetch buffers I-TLB, D-TLB Reorder buffer Instruction buffers Branch history tables Physical register files Load/store queues The point is to show that at first glance, we wouldn’t necessarily observe that anything is wrong with the existing models, and it’s only by examining each and every model very closely do we find that these small details are responsible for a lot of power error. Distribution Statement A: Approved for Public Release, Distribution Unlimited

Improvement in power estimates
Just show ISU plot. Before fixes Instruction fetch unit Distribution Statement A: Approved for Public Release, Distribution Unlimited

Before fixes After fixes Instruction fetch unit Distribution Statement A: Approved for Public Release, Distribution Unlimited

150 % → 20 % error Before fixes After fixes DPM Instruction fetch unit Distribution Statement A: Approved for Public Release, Distribution Unlimited

Example of fixing error sources in action
Ground truth After fixes We analyzed inductive noise on POWER7 using three models… Introduce voltage noise and why we care – we want the smallest noise. After adding a lot of parameters and fixing assumptions in McPAT, the inductive noise swing was much more muted and more closely match what our ground truth model says. Before fixes Distribution Statement A: Approved for Public Release, Distribution Unlimited

Power error drops → voltage noise error drops
Before fixes This is the distribution of supply voltage noise BEFORE our fixes to McPAT. Distribution Statement A: Approved for Public Release, Distribution Unlimited

Before fixes After fixes This is the distribution of supply voltage noise AFTER to our fixes to McPAT. Distribution Statement A: Approved for Public Release, Distribution Unlimited

Before fixes After fixes DPM prediction The final distribution shows what the ground truth model says about inductive noise. The reduction in voltage noise error can be very significant. I’ve been talking about sources of error in McPAT’s models up to this point. Now let’s switch to the flip side – missing models in McPAT. Distribution Statement A: Approved for Public Release, Distribution Unlimited

X Architectural power models must model more control logic. <param> … </param> Microarchitectural component models need more detail. IFU ISU LSU FXU VSU Power models must be validated at the unit level or lower. Distribution Statement A: Approved for Public Release, Distribution Unlimited

Thank you! Dean Tullsen, UCSD Michael Healy, IBM Sheng Li, Intel
Thomas Strach, IBM Victor Zyuban, IBM Runjie Zhang, UVA Norm Jouppi, Google Anonymous reviewers Chris Batten, Cornell They should be used for core dynamic power studies on POWER7-like chips only – we make no guarantees about their accuracy on any other platform. Our power models may be downloaded at: Distribution Statement A: Approved for Public Release, Distribution Unlimited

X Architectural power models must model more control logic. <param> … </param> Microarchitectural component models need more detail. IFU ISU LSU FXU VSU Power models must be validated at the unit level or lower. Distribution Statement A: Approved for Public Release, Distribution Unlimited

Funding acknowledgements
This work has been partially sponsored by Defense Advanced Research Projects Agency (DARPA), Microsystems Technology Office (MTO), under contract no. HR C The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government. This document is: Approved for Public Release, Distribution Unlimited. Distribution Statement A: Approved for Public Release, Distribution Unlimited

Sam Xi. +, Hans Jacobson+, Pradip Bose+, gu-Yeon Wei

Similar presentations

Presentation on theme: "Sam Xi. +, Hans Jacobson+, Pradip Bose+, gu-Yeon Wei"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sam Xi. +, Hans Jacobson+, Pradip Bose+, gu-Yeon Wei

Similar presentations

Presentation on theme: "Sam Xi. +, Hans Jacobson+, Pradip Bose+, gu-Yeon Wei"— Presentation transcript:

Similar presentations

About project

Feedback