University of Michigan Electrical Engineering and Computer Science 1 StageNet: A Reconfigurable CMP Fabric for Resilient Systems Shantanu Gupta Shuguang Feng Jason Blome Scott Mahlke 2 nd Workshop on Reconfigurable and Adaptable Architecture Dec 1, 2007
University of Michigan Electrical Engineering and Computer Science 2 Reliability Challenge Increasing defect rates is a major challenge [ITRS’03] ↑ power density ↓ feature sizes ↑ failures in time (FIT) Permanent faults ► Manufacturing defects ► Time dependent dioxide breakdown (TDDB) ► Negative bias threshold inversion (NBTI) ► Electromigration (EM) ► …. [Srinivasan, DSN‘04] For 32nm technology node, an 8 core CMP would face ~30 faults in 4 years
University of Michigan Electrical Engineering and Computer Science 3 Traditional solutions ► TMR ► Tandem / HP Non-stop ► Impractical for mainstream Cost Power Low gain Tolerating Permanent Faults Current approaches ► Detection/Prediction Using sensors Analytical models Redundant execution BIST ► Repair Replacement Reconfiguration K-pos DP-31/32 Teramac (1995)
University of Michigan Electrical Engineering and Computer Science 4 Lower design complexity Lower overheads Reconfiguration Granularity FETCH DEC EXEC WB MEM CORE level Range of choices for the reconfiguration granularity STAGE levelMODULE level - ElastIC, DT’ 06 - Reunion, MICRO’06 - Configurable Isolation, ISCA’07 - Online Diagnosis of Hard Faults, MICRO’ 05 - Ultra Low-Cost Defect Protection, ASPLOS’ 06 Better resource utilization
University of Michigan Electrical Engineering and Computer Science 5 Mean Time to Failure Comparison Area increase (%) MTTF increase (%) MODULE level STAGE level CORE level + Easiest to do in practice -- Poorest MTTF gains STAGE level + Circuit/logical boundary + Improved MTTF gains -- Architectural complexity MODULE level + Best MTTF gains -- Hardest to repair
University of Michigan Electrical Engineering and Computer Science 6 Throughput Comparison STAGE level CORE level STAGE level reconfiguration allow significantly more graceful throughput degradation Monte-Carlo study Randomly injected failures Assumes that stages are shared resources
University of Michigan Electrical Engineering and Computer Science 7 Goal of this Research Design a computing substrate ► Fault tolerant ► Graceful performance degradation with defects ► Highly reconfigurable ► Adaptable to the workload Design that can meet the challenge of facing ~ 100s of faults while maintaining 70-80% throughput
University of Michigan Electrical Engineering and Computer Science 8 Core 2 Core 0 Core 1 Core 3 CMP Fabric Stage1 StageN Stage2 Stage3 Stage1 StageN Stage2 Stage3 Stage1 StageN Stage2 Stage3 Stage1 StageN Stage2 Stage3
University of Michigan Electrical Engineering and Computer Science 9 StageNet CMP Fabric Stage1StageNStage2Stage3 Stage1StageNStage2Stage3 Stage1StageNStage2Stage3 Stage1StageNStage2Stage3 Configuration Manager Allocator Logical pipeline
University of Michigan Electrical Engineering and Computer Science 10 StageNet CMP Fabric - Benefits Stage1StageNStage2Stage3 Stage1StageNStage2Stage3 Stage1StageNStage2Stage3 Stage1StageNStage2Stage3 Configuration Manager
University of Michigan Electrical Engineering and Computer Science 11 StageNet CMP Fabric - Issues Allocator Performance / Efficiency ► Scaling with number of stages ► Impact of router delay Transmission delay (tdelay) Congestion delay Design overheads ► Area ► Power Micro-architectural concerns ► Data forwarding logic ► Control flow handling 256 bits 64
University of Michigan Electrical Engineering and Computer Science 12 Experimental Setup MiBench suite SimpleScalar No. of instructions - No. of cycles - Branch mis-predicts - I/D cache misses …. StageNet Model CPI Results Simulates an in-order core with default parameters Stores statistics for the benchmarks Parameterizable performance model for StageNet
University of Michigan Electrical Engineering and Computer Science 13 Effect of varying pipeline depth tdelay 1
University of Michigan Electrical Engineering and Computer Science 14 Effect of varying transmission delay stages 10
University of Michigan Electrical Engineering and Computer Science 15 Router delay is the leading cause for the slowdown Need some way to improve system utilization Let us send macro-ops (MOP) ► MOP is an instruction bundle Upper bound on length Upper bound on live-ins / live-outs No branches in between ► Advantages Amortizes delay / contention Increases resource utilization Performance enhancement Max length 4 Max live-ins 2 >> ST LD + / >> & << ST + LD
University of Michigan Electrical Engineering and Computer Science 16 Effect of varying MOP size tdelay 4 stages 10
University of Michigan Electrical Engineering and Computer Science 17 Conclusions Reliability aware architectures with a finer grained reconfiguration are desirable for: ► Better MTTF gains ► Graceful throughput degradation StageNet, a potential solution, allows stage level reconfiguration and is: ► Easy to reconfigure ► Inherently redundant ► Potentially scalable issue width Using StageNet, significant reconfiguration flexibility can be traded with a small loss in performance
University of Michigan Electrical Engineering and Computer Science 18 Future Work Micro-architectural issues ► Data bypass handling ► Control flow handling ► Sharing state between pipeline stages Network design ► Design of routers ► Design of interconnection Simulation setup ► Validation of results using a cycle accurate simulator
University of Michigan Electrical Engineering and Computer Science 19 StageNet: A Reconfigurable CMP Fabric for Resilient Systems
University of Michigan Electrical Engineering and Computer Science 20 Back up slides
University of Michigan Electrical Engineering and Computer Science 21 Repair ElastIC DT’06 H.Qin, UC Berkeley F. Bower, Tolerating Hard Faults in Microprocessor Array Structures, DSN’ 04