Download presentation
Presentation is loading. Please wait.
Published byBaldwin Watts Modified over 9 years ago
1
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Cyclone: A Low-Complexity Broadcast-Free Dynamic Instruction Scheduler Dan Ernst - Andrew Hamel - Todd Austin Advanced Computer Architecture Lab The University of Michigan
2
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Challenges in High-Speed Dynamic Scheduling Broadcast-based dynamic scheduler circuits are: –High complexity –Power-hungry –Scale poorly Global synchronization is becoming increasingly expensive –More Pipeline Stages + Slow Long Wires + Increasing Clock Speeds = Difficult Global Signal Design –Example: Pipeline stalling Memory scheduling is a “second class citizen” –Non-deterministic latencies don’t fit well into current popular dynamic scheduling paradigm
3
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Goals 1)Design a competitive, completely broadcast-free scheduler -Minimize global synchronization 2)Address memory scheduling in a “first class” way 3)Minimize “loose loops”
4
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Difference in Approaches From an instruction’s point of view… : Scheduling is just figuring out how long to wait. Broadcast approach –Instruction’s schedule is “recomputed” every cycle –Polling (“can I go now? How about now?”) Cyclone approach –Schedule based on a single timing computation –Instruction is given an execution time once, so no re-computation needed –Put in a timed “router” to execute the schedule as best it can
5
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Conceptual Overview Timing Predictor Routing/Timing Network Dependence Check FU I I@txI@tx I@tx+I@tx+ I@tx’I@tx’
6
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Pre-scheduler Design I0I0 I1I1 I2I2 I3I3 PSCHED 0 max + reschedule? timing table PSCHED 1 I0I0 I2I2 I3I3 16 Example Schedule a)b) max + + dep check MUX control I1I1 862 7 17 188 4 7
7
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Cyclone Scheduler replay? fn units register file ready bits bypass REGEX/MEMSCHED instruction pre-scheduler store set predictor branch predictor countdown/replay queue main queue (includes timing information) switchback datapaths I0I0 24 5 4 3 2 1 1 Not ready! Ready!
8
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Cyclone – Switchback Conflict replay? fn units register file ready bits bypass REGEX/MEMSCHED instruction pre-scheduler store set predictor branch predictor countdown/replay queue main queue (includes timing information) switchback datapaths 4 3 2
9
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Cyclone – Switchback Conflict replay? fn units register file ready bits bypass REGEX/MEMSCHED instruction pre-scheduler store set predictor branch predictor countdown/replay queue main queue (includes timing information) switchback datapaths 3 210
10
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Architectural Methodology Baseline architectural model –Derived from SimpleScalar 3.0 –More sophisticated scheduling support Separated ROB and RS Variable-length pipelines Selective scheduler replay on memory latency misprediction –Store Set predictor Cyclone model –Replaced scheduling portion of pipeline with Cyclone model –Added timing information to store set predictor Simulated SPEC2000 (INT and FP)
11
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Cyclone IPC
12
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Circuit Timing and Area Methodology Timing – SPICE models –Critical paths of Cyclone Switchback paths were very fast Pre-scheduler dependence check was the critical path –CAM-style broadcast windows Used models from last year’s ISCA (Tag Elimination) –Both used TSMC 0.18 m process at 1.8 V –Presented here as Throughput (IPns) Area Analysis – Register Bit Equivalent (RBE) –Process-independent analytical model of chip area –Assumed RAM/CAM area scaled quadratically with number of ports –Modeled scheduler structures and extra tables (also RF) –More information in Mulder, Quach, and Flynn. [17]
13
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan 4-wide Complexity Analysis
14
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan 8-wide Complexity Analysis
15
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Design Space Overview
16
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Complexity Options Run at higher frequency –Deeper pipelines Make the total scheduler size larger –Increase IPC Run at same frequency –Much lower power
17
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Conclusions Competitive broadcast-free scheduling –Allows high speed circuits at the expense of IPC –Saves chip area Power savings… Alternative to stalling –Avoid broadcasting across stages by using the replay mechanism
18
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Future Directions… Close the IPC gap –Wider queues? Complete Power analysis –Trade-off between size and activity rate Further opportunities to pipeline the control system –Global synchronization without fast global communication
19
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan
20
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Cyclone Extras
21
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Current/Future Work Pipelined Global Control – Low Power Razor –Average-case design opportunities Simple and effective selective replay implementations (WDDD) –Spawned from previous work (Tag Elimination – ISCA ’02) Removing as much global control as possible from pipelines
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.