Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Michael D’Mello

Similar presentations


Presentation on theme: "Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Michael D’Mello"— Presentation transcript:

1 Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Michael D’Mello michael.d’mello@intel.com

2 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 2 Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Agenda Basic data collection mechanisms Some architectural considerations Cycle accounting methodology Use the VTune™ Performance Analyzer to identify micro- architectural bottlenecks in software running on Intel ® Core™ 2 Duo Xeon ® processors Address the performance bottleneck for Intel ® Core™ 2 Duo Xeon ® processors

3 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 3 Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Basic data collection mechanisms Deterministic interrupts Processor interrupted at regular time intervals Interrupts based on pre-assigned metric A performance counter increments on the CPU every time an event occurs A sample of the execution context is recorded every time a performance counter overflows Events = samples * sample after value

4 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 4 Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Disclaimer: This block diagram is for example purposes only. Significant hardware blocks have been arranged or omitted for clarity. Some resources (Bus Unit, L2 Cache, etc…) are shared between cores. Branch Target Buffer Microcode Sequencer Register Allocation Table (RAT) 32 KB Instruction Cache Next IP Instruction Decode (4 issue) Fetch / Decode Retire Re-Order Buffer (ROB) – 96 entry IA Register Set To L2 Cache/Memory Port Bus Unit Reservation Stations (RS) 32 entry Scheduler / Dispatch Ports 32 KB Data Cache Execute Port FP Add SIMD Integer Arithmetic Memory Order Buffer (MOB) Load Store Addr FP Div/Mul Integer Shift/Rotate SIMD Integer Arithmetic Port Store Data Architecture Block and Instruction Flow

5 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 5 Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Simpler abstraction – OOO engine Fetch & Branch prediction Decode ReservationStation ExecutionUnits Re-Order Buffer Retirement Writeback Notes: uops wait until their inputs are available in RS uops wait until their inputs are available in RS uops wait to be retired in ROB uops wait to be retired in ROB

6 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 6 Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Accounting for cycles - 1 For simplicity select the micro-op dispatch point to begin analysis Decompose Total Cycles into sum of two input parts Time spent issuing micro-ops to execution unit Time spent not issuing micro-ops (i.e. execution stalls) Decompose Total Cycles spent issuing micro-ops into three “output” components Cycles during which executed micro-ops are retired Cycles during which executed micro-ops are not retired Stalls Use simple balance equations to dig deeper: micro-ops dispatched/executed = # retired + # not retired Convert to units of cycles using the micro-op dispatch rate

7 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 7 Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Accounting for cycles - 2 Use VTune® Sampling to track selected events i.CPU_CLK_UNHALTED.CORE tracks Total Cycles ii.RS_UOPS_DISPATCHED to track total number of micro-ops dispatched iii.RS_UOPS_DISPATCHED:C=1 tracks cycles during which micro-ops are dispatched iv.RS_UOPS_DISPATCHED_NONE (same as RS_UOPS_DISPATCHED:C=1:I=1) gives second term of input equation v.UOPS_RETIRED_ANY & UOPS_RETIRED FUSED gives an estimate of total micro-ops retired (approximate) vi.Micro-op dispatch rate obtained by dividing (ii) by (iii) vii.# of cycles during which micro-ops not ultimately retired are executed is given by the difference (ii) – (v) divided by (iii) viii.Using (i), (vi), and (vii) obtain Stalls

8 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 8 Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Recap Achieve basic objective (Minimize Total Cycles) as follows: Minimize the Stall component by removing memory & other bottlenecks Minimize the Non-Retired component by reducing the branch mispredictions Minimize Retired component by reducing instructions (SSEx)

9 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 9 Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Characterizing Stalls & Branch Mis- predictions Percentage of time stalled (RS_UOPS_DISPATCHED _CYCLES_NONE/CPU_CLK_UNHALTED.CORE)*100 Fractions of useful & wasted work 1.Count number of UOPS dispatched Use RS_UOPS_DISPATCHED 2.Count number of UOPS executed which are eventually retired Use (UOPS_RETIRED.ANY + UOPS_RETIRED.FUSED) 3.Count number of UOPS executed which are non retired Difference of amount dispatched & amount retired 4.Compute fractions

10 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 10 Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Characterizing FSB Usage/Saturation Method: [(Core Frequency) *64*BUS_TRANS_BURST.BOTH_CORES.ALL_AGENTS] divided by CPU_CLK_UNHALTED.CORE Always useful to run a calibration test case Further analysis via: BUS_TRANS_ set of events Use VTune® Help facility for explanation of each event

11 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 11 Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Other characterizations of Stalls Instruction starvation component may be monitored via RESOURCE_STALLS.ANY ROB & RS must be purged of incorrect executions Approximate via RESOURCE_STALLS_BR_MISS_CLEAR Units of this event are in cycles Other resource limited stalls: Resource_Stalls.ROB_FULL (96 instructions in ROB) Resource_Stalls.LD_ST (All Store or Load buffers in use) Resource_Stalls.RS_FULL (32 instructions waiting for inputs in Reservation Station ) For more information see paper on Cycle Accounting by David Levinthal (available through www.intel.com)

12 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. 12 Performance Counters on Intel® Core™ 2 Duo Xeon® Processors


Download ppt "Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Michael D’Mello"

Similar presentations


Ads by Google