Presentation is loading. Please wait.

Presentation is loading. Please wait.

TEMPLATE DESIGN © 2008 www.PosterPresentations.com Integer ALU2 DEC 1 ID/ EXE Stage EXE/ MEM Stage Reg File D-Cache PC MEM/ WB Stage IF/ID Stage I-Cache.

Similar presentations


Presentation on theme: "TEMPLATE DESIGN © 2008 www.PosterPresentations.com Integer ALU2 DEC 1 ID/ EXE Stage EXE/ MEM Stage Reg File D-Cache PC MEM/ WB Stage IF/ID Stage I-Cache."— Presentation transcript:

1 TEMPLATE DESIGN © 2008 www.PosterPresentations.com Integer ALU2 DEC 1 ID/ EXE Stage EXE/ MEM Stage Reg File D-Cache PC MEM/ WB Stage IF/ID Stage I-Cache DEC 2 Instruction Scheduler Enable Lines Diagnosing Intermittent Faults Using Software Techniques Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan The University of British Columbia Intermittent Faults Research Objective Diagnosis Technique GoalsOverview of the Diagnosis Approach Isolate Fault-Prone Unit  Intermittent hardware faults are bursts of errors that occur at the same location and last from a few cycles to a few seconds.  Intermittent faults will be a significant concern in future processors. Transient Fault Intermittent Faults mov R1, #5 mov R2, #6 mov R3, #7 ld R4, R1, Array_Addr ld R5, R2, Array_Addr ld R6, R3, Array_Addr mult R7, R5, R4 Failur e Program Execution time Research Motivation  Diagnosis is vital in guiding fine-grained recovery techniques (e.g., hardware reconfiguration) and hence facilitating processor degraded performance. If core 8 malfunctions, then two possible recovery options would be available: 1.The whole core 8 is disabled without fine- grained diagnosis, or 2. Part of core 8 is disabled with fine-grained diagnosis.  Requires no hardware support,  Provides formal guarantees of correctness and completeness,  Scalable,  Few false positives. Modeling Intermittent Faults Impact on Programs - Example Modeling Intermittent Faults Impact on Programs - Results  The DDG model is more than two orders of magnitude faster than equivalent fault-injection experiments.  89 to 93% of the faults' crash distances are within 100 nodes. Overview of the Diagnosis Approach - Example Identify Erroneous Data  An intermittent fault affected 14-18,  Crash instruction: 27,  Erroneous data: 14, 17, 16, 19 and 21.  Expected fault spans over nodes 14-19.  Actual fault affected nodes 14-18. Array_Addr #5#5 #6#6 #7#7...... Intermittent Error 4 5 6 1 2 3 7 Operating Systems Directions Contact Information Layali Rashid PhD Candidate Department of Electrical and Computer Engineering The University of British Columbia lrashid@ece.ubc.ca Isolate Instructions First Affected by the Fault 3 Identify Instructions that Change Erroneous Data 2 1  Of the intermittent faults that are non-benign, 95% result in a program crash.  91 to 95% of the faults cause program to crash within 300 nodes of the fault’s start.  Of the intermittent faults that are non-benign, 95% result in a program crash.  91 to 95% of the faults cause program to crash within 300 nodes of the fault’s start. Conclusions  Diagnosis is vital in guiding fine-grained recovery.  Diagnosing intermittent faults using software techniques is possible.  Most intermittent faults cause program to crash shortly after the fault’s start.  Use Dynamic Dependency Graph (DDG).  Map tasks to cores based on the core's functioning units and the task's requirements.  Modify a program on the fly to avoid using malfunctioning units.  Provide feedback to instruction scheduler about the malfunctioning units, such that minimal performance overhead is encountered. Integer ALU1  Back trace erroneous data in DDG.


Download ppt "TEMPLATE DESIGN © 2008 www.PosterPresentations.com Integer ALU2 DEC 1 ID/ EXE Stage EXE/ MEM Stage Reg File D-Cache PC MEM/ WB Stage IF/ID Stage I-Cache."

Similar presentations


Ads by Google