Download presentation
Presentation is loading. Please wait.
Published byMercy Atkins Modified over 8 years ago
1
Parapet Research Group, Princeton University EE Workshop on Hardware Performance Monitor Design and Functionality HPCA-11 Feb 13, 2005 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk ISCI Gilberto CONTRERAS Margaret MARTONOSI
2
Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 2 Several explored research avenues Runtime power/thermal estimations Dynamic management Workload phases and application behavior prediction HPCs provide value beyond simulations Long-timescales Real-system behavior Hardware Performance Counters (HPCs) Go beyond Performance
3
Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 3 Runtime power Isci & Martonosi [MICRO 2003] Contreras & Martonosi [Submitted 2005] Runtime thermal Lee & Skadron [HP-PAC in IPDPS 2005] Dynamic power management Choi et al. [ISLPED 2004] Weißel & Bellosa [CASES 2002] Dynamic thermal management Bellosa et al. [COLP 2003] Workload phases and application behavior prediction Isci & Martonosi [WWC 2003] Duesterwald et al. [PACT 2003] Hardware Performance Counters (HPCs) Go beyond Performance
4
Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 4 High-Performance Corner: P4 Power Estimation Idea: MaxPower[I] x ArchScaling[I] x AccessRate[I] + NonGatedPower[I] Power of component I = Motivation: Fast (Real-time) Estimated view of on-chip detail (Per physical component) Design: Developed heuristics using 24 events to approximate access rates for 22 chip components Used 15 counters with 4 rotations to collect all event data Validation: Real-time estimates against real-time measured power
5
Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 5 P4 Power Estimator Results Average difference: ~5% among all benchmarks SPEC CPU2000 & other applications Gcc Measured Modeled GzipVprVortexGap Crafty
6
Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 6 Embedded Corner: PXA255 Power Estimation Idea: PerformanceEvents nx5 x LinearParameters 5x1 + IdlePower CPU Power nx1 = Motivation: Runtime power optimizations under DVFS Design: Parameter estimation (OLS) using dominant counter readings and live power measurements Power estimation at various CPU configurations Validation: Comparison between estimates and real-time measured power PerformanceEvents nx2 x LinearParameters 2x1 + IdlePower Mem Power nx1 =
7
Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 7 PXA255 Results DB CDC Java 5% average error across 3 domains Java CDC Java CLDC SPEC2000
8
Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 8 Proposals from Experiences 1. Track each physical unit individually for power & thermal: Ex: Trace Cache μCode ROM μop Queue Allocate Rename Instr-n Queue1 Instr-n Queue2 Schedulers MEM EXE Dispatch Ports All tracked with in-flight μops written to μop queue Need individual utilization counts for each physical unit available on die for power and hotspot analyses
9
Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 9 Proposals from Experiences 2. Need bitline activity counts Utilization is not complete information, power in part depends on switching factor Not necessarily fully detailed counts Accumulate bitwise XOR of current and previous input/output ports Sample RegFile ports/bit populations 30mW (10%) swing 400Mhz 1.3V PXA255 Processor
10
Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 10 Proposals from Experiences 2. Need bitline activity counts Utilization is not complete information, power in part depends on switching factor Not necessarily fully detailed counts Accumulate bitwise XOR of current and previous input/output ports Sample RegFile ports/bit populations 20mW swing 111…11 000…00 + 111…11 000…00 + 400Mhz 1.3V PXA255 Processor 111…11 + 000…01 000…00 + 000…01 111…11 + B 111…11 000…00 011…11 000…00 001…11 000…00 : 000…11 000…00 000…01 000…00 A 000…01 000…01 000…01 000…01 000…01 000…01 : 000…01 000…01 000…01 000…01
11
Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 11 Proposals from Experiences 3. More detailed off-chip/memory access support in the embedded domain Mem Power ~40% of system power Tracking memory hierarchy transactions may help render better memory power estimates REX Memory power consumption (one 16b bank) Main memory Read/Writes Core + DMA Transaction length in bytes Activity factors can be shared with RegFile
12
Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 12 Proposals from Experiences 4. Metrics related to queue occupancy Modern processor ≡ Several queues Depending on implementation Power ∝ Queue occupancy Buyuktosunoglu et al. [ISLPED’02] Tradeoffs in Power-Efficient Issue Queue Design
13
Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 13 Proposals from Experiences 5. General/aggregate metrics in addition to specialized cases/ breakdowns simplify runtime sampling for unit accesses P4 ex1. MOB: Only event MOB_load_replays Counts replays for unknown st addr./data, partial/unaligned addr. match No info for MOB entries/accesses/updates P4 ex2. FPU: Has 8 separate events (with 2 dedicated ESCRs) Need at least 4 rotations to collect P4 ex3. INT ALU: No dedicated event
14
Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 14 Additional Comments for HPC Design General/aggregate metrics in addition to specialized cases/ breakdowns simplify runtime sampling for unit accesses Metrics related to RegFile accesses vs. forwarding Semi-distributed implementations will always induce dependencies among simultaneously countable events Higher parallelism among (power oriented) metrics for minimal counter rotations at runtime Implementations that allow counter rotations without need for intermediate logging Partitioned / Dual-mode / Buffered counters Different events for different types of accesses to same units with different magnitude power implications i.e. branch scan < BHT update < BTA update Different API/SW demands: Lightweight implementations for runtime analyses Per-thread for application profiling vs. global for real-time measurement comparisons and hotspots
15
Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 15 Wishlist for Power/Thermal 1) For each physical unit on die, separate events to track utilization rates Sub events for different type of accesses with different power costs 2) Bitline activity counters for switching units 3) Occupancy counters for related queues 4) Counter support for off-core memory accesses 5) High parallelism among power events for minimal counter rotations
16
Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 16 Conclusions New opportunities remain to be explored in future PMC designs for power and thermal studies Direct correspondence to physical units Bitline and occupancy counters We believe in the feasibility of these additions with the continuing emphasis given to counter design, as long as power is also considered a primary design target.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.