University of Michigan Electrical Engineering and Computer Science 1 StageNet: A Reconfigurable CMP Fabric for Resilient Systems Shantanu Gupta Shuguang.

Slides:



Advertisements
Similar presentations
Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
Advertisements

Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.
Managing Wire Delay in Large CMP Caches Bradford M. Beckmann David A. Wood Multifacet Project University of Wisconsin-Madison MICRO /8/04.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
University of Michigan Electrical Engineering and Computer Science 1 A Distributed Control Path Architecture for VLIW Processors Hongtao Zhong, Kevin Fan,
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
University of Michigan Electrical Engineering and Computer Science 1 Libra: Tailoring SIMD Execution using Heterogeneous Hardware and Dynamic Configurability.
University of Michigan Advanced Computer Architecture Laboratory StageWeb: Interweaving Pipeline Stages into a Wearout and Variation Tolerant CMP Fabric.
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors Fred A. Bower, Daniel J. Sorin, and Sule Ozev.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of.
(C) 2005 Daniel SorinDuke Computer Engineering Autonomic Computing via Dynamic Self-Repair Daniel J. Sorin Department of Electrical & Computer Engineering.
CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Erasing Core Boundaries.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science August 20, 2009 Enabling.
On Modeling the Lifetime Reliability of Homogeneous Manycore Systems Lin Huang and Qiang Xu CUhk REliable computing laboratory (CURE) The Chinese University.
University of Michigan Electrical Engineering and Computer Science 1 Reducing Control Power in CGRAs with Token Flow Hyunchul Park, Yongjun Park, and Scott.
Glenn Reinman, Brad Calder, Department of Computer Science and Engineering, University of California San Diego and Todd Austin Department of Electrical.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
OCIN Workshop Wrapup Bill Dally. Thanks To Funding –NSF - Timothy Pinkston, Federica Darema, Mike Foster –UC Discovery Program Organization –Jane Klickman,
Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms Lin Huang, Feng Yuan and Qiang Xu Reliable Computing Laboratory Department.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Self-calibrated.
In-Band Flow Establishment for End-to-End QoS in RDRN Saravanan Radhakrishnan.
FPGA Defect Tolerance: Impact of Granularity Anthony YuGuy Lemieux December 14, 2005.
Cost-Efficient Soft Error Protection for Embedded Microprocessors
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
University of Michigan Electrical Engineering and Computer Science Data-centric Subgraph Mapping for Narrow Computation Accelerators Amir Hormati, Nathan.
University of Michigan Electrical Engineering and Computer Science 1 Top 5 Reasons Reliability is the Biggest Fallacy in Computer Architecture Research.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Maestro: Orchestrating.
University of Michigan Electrical Engineering and Computer Science 1 Online Timing Analysis for Wearout Detection Jason Blome, Shuguang Feng, Shantanu.
HPCA, Austin, Texas February BulletProof: A Defect-Tolerant CMP Switch Architecture 1 BulletProof: A Defect-Tolerant CMP Switch Architecture Kypros.
University of Michigan Electrical Engineering and Computer Science 1 A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded.
Software-Based Online Detection of Hardware Defects: Mechanisms, Architectural Support, and Evaluation Kypros Constantinides University of Michigan Onur.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations ‡ Computer Science and Engineering, UC San Diego variability.org.
Shuchang Shan † ‡, Yu Hu †, Xiaowei Li † † Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
Presenter: Jyun-Yan Li Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors Pramod Subramanyan, Virendra.
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.
University of Michigan Electrical Engineering and Computer Science 1 Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-Thread Applications.
Comparing Memory Systems for Chip Multiprocessors Leverich et al. Computer Systems Laboratory at Stanford Presentation by Sarah Bird.
Adaptive Cache Partitioning on a Composite Core Jiecao Yu, Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Scott Mahlke Computer Engineering Lab University.
Amalgam: a Reconfigurable Processor for Future Fabrication Processes Nicholas P. Carter University of Illinois at Urbana-Champaign.
University of Michigan Electrical Engineering and Computer Science Composite Cores: Pushing Heterogeneity into a Core Andrew Lukefahr, Shruti Padmanabha,
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Encore: Low-Cost,
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
TEMPLATE DESIGN © Integer ALU2 DEC 1 ID/ EXE Stage EXE/ MEM Stage Reg File D-Cache PC MEM/ WB Stage IF/ID Stage I-Cache.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 Bundled Execution.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Adaptive Online Testing.
Thermal-aware Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work was supported.
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 1 The StageNet Fabric.
Taniya Siddiqua, Paul Lee University of Virginia, Charlottesville.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Efficient Soft Error.
University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.
PipeliningPipelining Computer Architecture (Fall 2006)
Adaptive Cache Partitioning on a Composite Core
DynaMOS: Dynamic Schedule Migration for Heterogeneous Cores
Hwisoo So. , Moslem Didehban#, Yohan Ko
Maestro: Orchestrating Lifetime Reliability in Chip Multiprocessors
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Presentation transcript:

University of Michigan Electrical Engineering and Computer Science 1 StageNet: A Reconfigurable CMP Fabric for Resilient Systems Shantanu Gupta Shuguang Feng Jason Blome Scott Mahlke 2 nd Workshop on Reconfigurable and Adaptable Architecture Dec 1, 2007

University of Michigan Electrical Engineering and Computer Science 2 Reliability Challenge Increasing defect rates is a major challenge [ITRS’03] ↑ power density ↓ feature sizes  ↑ failures in time (FIT) Permanent faults ► Manufacturing defects ► Time dependent dioxide breakdown (TDDB) ► Negative bias threshold inversion (NBTI) ► Electromigration (EM) ► …. [Srinivasan, DSN‘04] For 32nm technology node, an 8 core CMP would face ~30 faults in 4 years

University of Michigan Electrical Engineering and Computer Science 3 Traditional solutions ► TMR ► Tandem / HP Non-stop ► Impractical for mainstream Cost Power Low gain Tolerating Permanent Faults Current approaches ► Detection/Prediction Using sensors Analytical models Redundant execution BIST ► Repair Replacement Reconfiguration K-pos DP-31/32 Teramac (1995)

University of Michigan Electrical Engineering and Computer Science 4 Lower design complexity Lower overheads Reconfiguration Granularity FETCH DEC EXEC WB MEM CORE level Range of choices for the reconfiguration granularity STAGE levelMODULE level - ElastIC, DT’ 06 - Reunion, MICRO’06 - Configurable Isolation, ISCA’07 - Online Diagnosis of Hard Faults, MICRO’ 05 - Ultra Low-Cost Defect Protection, ASPLOS’ 06 Better resource utilization

University of Michigan Electrical Engineering and Computer Science 5 Mean Time to Failure Comparison Area increase (%) MTTF increase (%) MODULE level STAGE level CORE level + Easiest to do in practice -- Poorest MTTF gains STAGE level + Circuit/logical boundary + Improved MTTF gains -- Architectural complexity MODULE level + Best MTTF gains -- Hardest to repair

University of Michigan Electrical Engineering and Computer Science 6 Throughput Comparison STAGE level CORE level STAGE level reconfiguration allow significantly more graceful throughput degradation Monte-Carlo study Randomly injected failures Assumes that stages are shared resources

University of Michigan Electrical Engineering and Computer Science 7 Goal of this Research Design a computing substrate ► Fault tolerant ► Graceful performance degradation with defects ► Highly reconfigurable ► Adaptable to the workload Design that can meet the challenge of facing ~ 100s of faults while maintaining 70-80% throughput

University of Michigan Electrical Engineering and Computer Science 8 Core 2 Core 0 Core 1 Core 3 CMP Fabric Stage1 StageN Stage2 Stage3 Stage1 StageN Stage2 Stage3 Stage1 StageN Stage2 Stage3 Stage1 StageN Stage2 Stage3

University of Michigan Electrical Engineering and Computer Science 9 StageNet CMP Fabric Stage1StageNStage2Stage3 Stage1StageNStage2Stage3 Stage1StageNStage2Stage3 Stage1StageNStage2Stage3 Configuration Manager Allocator Logical pipeline

University of Michigan Electrical Engineering and Computer Science 10 StageNet CMP Fabric - Benefits Stage1StageNStage2Stage3 Stage1StageNStage2Stage3 Stage1StageNStage2Stage3 Stage1StageNStage2Stage3 Configuration Manager

University of Michigan Electrical Engineering and Computer Science 11 StageNet CMP Fabric - Issues Allocator Performance / Efficiency ► Scaling with number of stages ► Impact of router delay Transmission delay (tdelay) Congestion delay Design overheads ► Area ► Power Micro-architectural concerns ► Data forwarding logic ► Control flow handling 256 bits 64

University of Michigan Electrical Engineering and Computer Science 12 Experimental Setup MiBench suite SimpleScalar No. of instructions - No. of cycles - Branch mis-predicts - I/D cache misses …. StageNet Model CPI Results Simulates an in-order core with default parameters Stores statistics for the benchmarks Parameterizable performance model for StageNet

University of Michigan Electrical Engineering and Computer Science 13 Effect of varying pipeline depth tdelay 1

University of Michigan Electrical Engineering and Computer Science 14 Effect of varying transmission delay stages 10

University of Michigan Electrical Engineering and Computer Science 15 Router delay is the leading cause for the slowdown Need some way to improve system utilization Let us send macro-ops (MOP) ► MOP is an instruction bundle Upper bound on length Upper bound on live-ins / live-outs No branches in between ► Advantages Amortizes delay / contention Increases resource utilization Performance enhancement Max length 4 Max live-ins 2 >> ST LD + / >> & << ST + LD

University of Michigan Electrical Engineering and Computer Science 16 Effect of varying MOP size tdelay 4 stages 10

University of Michigan Electrical Engineering and Computer Science 17 Conclusions Reliability aware architectures with a finer grained reconfiguration are desirable for: ► Better MTTF gains ► Graceful throughput degradation StageNet, a potential solution, allows stage level reconfiguration and is: ► Easy to reconfigure ► Inherently redundant ► Potentially scalable issue width Using StageNet, significant reconfiguration flexibility can be traded with a small loss in performance

University of Michigan Electrical Engineering and Computer Science 18 Future Work Micro-architectural issues ► Data bypass handling ► Control flow handling ► Sharing state between pipeline stages Network design ► Design of routers ► Design of interconnection Simulation setup ► Validation of results using a cycle accurate simulator

University of Michigan Electrical Engineering and Computer Science 19 StageNet: A Reconfigurable CMP Fabric for Resilient Systems

University of Michigan Electrical Engineering and Computer Science 20 Back up slides

University of Michigan Electrical Engineering and Computer Science 21 Repair ElastIC DT’06 H.Qin, UC Berkeley F. Bower, Tolerating Hard Faults in Microprocessor Array Structures, DSN’ 04