Reducing the Scheduling Critical Cycle using Wakeup Prediction HPCA-10 Todd Ehrhart and Sanjay Patel Center for Reliable and High-Performance Computing.

Reducing the Scheduling Critical Cycle using Wakeup Prediction HPCA-10 Todd Ehrhart and Sanjay Patel Center for Reliable and High-Performance Computing University of Illinois at Urbana-Champaign Feb. 18, 2004

Outline n Overview n Analysis n Architecture n Experiments n Conclusions

Intuition n Loops and other execution patterns may cause a steady-state in machine delays ABCDABCD 1000 n After a few iterations, may have steady-state –May have (near-)constant delays

Basic Observation n Wakeup delay is highly invariant –Bias toward positive deviations

So... n Wakeup times can be estimated based on static IP n Idea: –Ignore dependencies –Estimate wakeup times –Wakeup instruction when time expires n Breaks the scheduling critical cycle –Thus, can reduce cycle time n But, there are problems

Architectural Flow Predict Wakeup Time Wait for Wakeup Timeout Execute Instructions If wakeup time wrong, must replay

Architectural Flow Predict Wakeup Time Wait for Wakeup Timeout Execute Instructions Re-predict Mis-speculated Instructions Determine Actual Wakeup Time Check Feedback

Fixing the problems n Replays –Cost-adjust the wakeup estimate Probability of a replay Cost of a replay

Fixing the problems n Replays –Cost-adjust the wakeup estimate Probability of a replay Cost of a replay n Replay cost unknown/unmeasurable –Make replay cost an adjustable parameter Depends on machine load –Use load as feedback value –Goal: maximize retire bandwidth

Fixing the problems n Replays –Cost-adjust the wakeup estimate Probability of a replay Cost of a replay n Replay cost unknown/unmeasurable –Make replay cost an adjustable parameter Depends on machine load –Use load as feedback value –Goal: maximize retire bandwidth n Re-prediction –Exponential backoff

Cost-adjusted Wakeup Estimate n Being close counts

Cost-adjusted Wakeup Estimate, II n After some assumptions and math... n Minimum cost occurs at: –F(d) = Rf(d) and f(d) > Rf '(d) n f(d) is unknown, so use a gradient-descent technique: n Looks like a running average R is replay cost estimate

Feedback-adjusted Replay Cost n Cost of replay changes during execution –Program phases, etc. n Add second feedback layer –Observe loads on each class of functional unit Adjust replay cost accordingly To prevent wild oscillations, adjust once every 1000 cycles –Cheap: Needs a few accumulators, and is off critical path r is estimated cost of single replay; R = r * count

Re-prediction n An observation (covers 99% of instances): n Return instruction to Self-Schedule Array –but, with twice its previous wakeup time estimate Slope=2

High-Level Architecture

Scheduler Architecture

Predictor Architectures n Local allowance

Predictor Architectures n Global allowance

Predictor Architectures n Problem: On miss, cannot fall back on dependency-based wakeup –Cycle-time constraints n Default Predictor –Used on miss in main predictor –Update same as Global Allowance predictor

Finding the Actual Wakeup Time n Done in the Register File –conceptually: Reg File Source Register Numbers VRegister InfoCycle Count Ready to Execute? ANDMIN - Wait Time Actual Wakeup Time Set to zero when register written; counts up each cycle.

Setup n X86 trace-driven simulator –Fetch timing effects simulated –7 traces 26m to 100m consecutive insts. From SPECint n 8-wide, 18-deep pipeline n Configurations: –Baseline: 1-cycle wakeup/select –BasePipeSched: 2-cycle wakeup/select –WPLocal: local wakeup prediction (128x4 predictor), r=1 –WPLocalAdj: WPLocal + feedback-adjusted r –WPGlobal: global wakeup prediction (128x4 predictor), r=1 –WPGlobalAdj: WPGlobal + feedback-adjusted r

Results 7% IPC drop

Ideal Fetch n Approximates high-bandwidth fetch –Trace cache, etc. n Otherwise, same as before.

Results: Ideal Fetch 7% IPC drop

Resource-constrained n Half the number of functional units in each class n Uses i-cache fetch (like first experiment) n Otherwise, same as others

Results: Resource-constrained 9% IPC drop

Other Results n Some leeway in prediction accuracy –Doubling predictions results in 27% IPC drop. n Works consistently in deep pipelines –Without pipelined wakeup: –With pipelined wakeup:

Conclusions n Likely to increase performance –IPC drop ~7% –Performance impact of cycle time decrease could exceed that of IPC decrease n Feedback paths are not critical –Simpler design process

Reducing the Scheduling Critical Cycle using Wakeup Prediction HPCA-10 Todd Ehrhart and Sanjay Patel Center for Reliable and High-Performance Computing.

Similar presentations

Presentation on theme: "Reducing the Scheduling Critical Cycle using Wakeup Prediction HPCA-10 Todd Ehrhart and Sanjay Patel Center for Reliable and High-Performance Computing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reducing the Scheduling Critical Cycle using Wakeup Prediction HPCA-10 Todd Ehrhart and Sanjay Patel Center for Reliable and High-Performance Computing.

Similar presentations

Presentation on theme: "Reducing the Scheduling Critical Cycle using Wakeup Prediction HPCA-10 Todd Ehrhart and Sanjay Patel Center for Reliable and High-Performance Computing."— Presentation transcript:

Similar presentations

About project

Feedback