L i a b l eh kC o m p u t i n gL a b o r a t o r y Test Economics for Homogeneous Manycore Systems Lin Huang† and Qiang Xu†‡ †CUhk REliable computing laboratory.

Slides:



Advertisements
Similar presentations
1 Impact of Decisions Made to Systems Engineering: Cost vs. Reliability System David A. Ekker Stella B. Bondi and Resit Unal November 4-5, 2008 HRA INCOSE.
Advertisements

Collaborative QoS Prediction in Cloud Computing Department of Computer Science & Engineering The Chinese University of Hong Kong Hong Kong, China Rocky.
A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Jack Lee Yiu-bun, Raymond Leung Wai Tak Department.
Active Learning and Collaborative Filtering
Topic 6: Introduction to Hypothesis Testing
Robust Low Power VLSI ECE 7502 S2015 Burn-in/Stress Test for Reliability: Reducing burn-in time through high-voltage stress test and Weibull statistical.
Chapter 10 Quality Control McGraw-Hill/Irwin
On Modeling the Lifetime Reliability of Homogeneous Manycore Systems Lin Huang and Qiang Xu CUhk REliable computing laboratory (CURE) The Chinese University.
02/25/06SJSU Bus David Bentley1 Chapter 12 – Design for Six Sigma (DFSS) QFD, Reliability analysis, Taguchi loss function, Process capability.
Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms Lin Huang, Feng Yuan and Qiang Xu Reliable Computing Laboratory Department.
Output Hazard-Free Transition Tests for Silicon Calibrated Scan Based Delay Testing Adit D. Singh Gefu Xu Auburn University.
L i a b l eh kC o m p u t i n gL a b o r a t o r y Performance Yield-Driven Task Allocation and Scheduling for MPSoCs under Process Variation Presenter:
Performance Evaluation of Peer-to-Peer Video Streaming Systems Wilson, W.F. Poon The Chinese University of Hong Kong.
Copyright 2007 Koren & Krishna, Morgan-Kaufman Part.2.1 FAULT TOLERANT SYSTEMS Part 2 – Canonical.
Major Points An example Sampling distribution Hypothesis testing
Oct Defect Avoidance and Circumvention Slide 1 Fault-Tolerant Computing Dealing with Low-Level Impairments.
IENG 486 Statistical Quality & Process Control
L i a b l eh kC o m p u t i n gL a b o r a t o r y On Effective and Efficient In-Field TSV Repair for Stacked 3D ICs Presenter: Li Jiang Li Jiang †, Fangming.
1 Product Reliability Chris Nabavi BSc SMIEEE © 2006 PCE Systems Ltd.
L i a b l eh kC o m p u t i n gL a b o r a t o r y Trace-Based Post-Silicon Validation for VLSI Circuits Xiao Liu Department of Computer Science and Engineering.
Quality Control Prof. R. S. Rengasamy Department of Textile Technolgoy
L i a b l eh kC o m p u t i n gL a b o r a t o r y On Effective TSV Repair for 3D- Stacked ICs Li Jiang †, Qiang Xu † and Bill Eklow § † CUhk REliable.
1 Dr. Jerrell T. Stracener EMIS 7370 STAT 5340 Probability and Statistics for Scientists and Engineers Department of Engineering Management, Information.
UNEP Training Resource ManualTopic 15 Slide 1 Using EIA to move towards sustainability F EIA is a foundation tool F EIA is a tried and tested process F.
L i a b l eh kC o m p u t i n gL a b o r a t o r y Yield Enhancement for 3D-Stacked Memory by Redundancy Sharing across Dies Li Jiang, Rong Ye and Qiang.
PARAMETRIC STATISTICAL INFERENCE
Fundamentals of Data Analysis Lecture 9 Management of data sets and improving the precision of measurement.
Application of the Direct Optimized Probabilistic Calculation Martin Krejsa Department of Structural Mechanics Faculty of Civil Engineering VSB - Technical.
Classifying Attributes with Game- theoretic Rough Sets Nouman Azam and JingTao Yao Department of Computer Science University of Regina CANADA S4S 0A2
USING PREFERENCE CONSTRAINTS TO SOLVE MULTI-CRITERIA DECISION MAKING PROBLEMS Tanja Magoč, Martine Ceberio, and François Modave Computer Science Department,
1 Customer-Aware Task Allocation and Scheduling for Multi-Mode MPSoCs Lin Huang, Rong Ye and Qiang Xu CHhk REliable computing laboratory (CURE) The Chinese.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
M Global Software Group 1 Motorola Internal Use Only Better Software Quality at a Lower Cost: Testing to Eliminate Software Black Holes Isaac (Haim) Levendel,
Test Architecture Design and Optimization for Three- Dimensional SoCs Li Jiang, Lin Huang and Qiang Xu CUhk Reliable Computing Laboratry Department of.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
1 CSCE 932, Spring 2007 Yield Analysis and Product Quality.
CAS 721 Course Project Implementing Branch and Bound, and Tabu search for combinatorial computing problem By Ho Fai Ko ( )
Optimal Resource Allocation for Protecting System Availability against Random Cyber Attack International Conference Computer Research and Development(ICCRD),
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
Qiang XU CUhk REliable computing laboratory (CURE)
A Biased Fault Attack on the Time Redundancy Countermeasure for AES Sikhar Patranabis, Abhishek Chakraborty, Phuong Ha Nguyen and Debdeep Mukhopadhyay.
Disk Failures Eli Alshan. Agenda Articles survey – Failure Trends in a Large Disk Drive Population – Article review – Conclusions – Criticism – Disk failure.
A Supplier’s Optimal Quantity Discount Policy Under Asymmetric Information Charles J. Corbett Xavier de Groote Presented by Jing Zhou.
Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,
Oct 1999 The After Effects of the 1999 fishery: Catch and Discard Rates in the 2000 fisheries in the re-opened closed areas.
Quality Control Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill.
Part.2.1 In The Name of GOD FAULT TOLERANT SYSTEMS Part 2 – Canonical Structures Chapter 2 – Hardware Fault Tolerance.
LSM733-PRODUCTION OPERATIONS MANAGEMENT By: OSMAN BIN SAIF LECTURE 30 1.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Inferential Statistics Psych 231: Research Methods in Psychology.
CS203 – Advanced Computer Architecture Dependability & Reliability.
Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.
Pouya Ostovari and Jie Wu Computer & Information Sciences
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Fault-Tolerant Resynthesis for Dual-Output LUTs Roy Lee 1, Yu Hu 1, Rupak Majumdar 2, Lei He 1 and Minming Li 3 1 Electrical Engineering Dept., UCLA 2.
CHAPTER 4s Reliability Operations Management, Eighth Edition, by William J. Stevenson Copyright © 2005 by The McGraw-Hill Companies, Inc. All rights reserved.
Understanding Results
Fault-Tolerant NoC-based Manycore system: Reconfiguration & Scheduling

CHAPTER 9 Testing a Claim
Statistical Process Control
Facility Planning Systematics Process
Department of Electrical Engineering
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
A handbook on validation methodology. Metrics.
Presentation transcript:

l i a b l eh kC o m p u t i n gL a b o r a t o r y Test Economics for Homogeneous Manycore Systems Lin Huang† and Qiang Xu†‡ †CUhk REliable computing laboratory (CURE) Department of Computer Science & Engineering The Chinese University of Hong Kong ‡CAS-CUHK Shenzhen Institute of Advanced Integration Technology

Observations on Manufacturing Test Cost Manufacturing test is responsible for achieving sufficient high defect coverage As technology advances … Test patterns that target more kinds of errors become essential Accelerated testing methods (e.g., burn-in test) becomes difficult Manufacturing test cost – a great share of production cost In particular, burn-in cost can range from 5-40% of production cost If we are able to relax the coverage requirement, manufacturing cost can be dramatically reduced If we are able to relax the coverage requirement, manufacturing cost can be dramatically reduced

Manycore Processor Era Provides us An Opportunity The integration of a large number of cores on a single silicon die Increasingly popular in the industry yield-driven redundant Traditional yield-driven redundant cores aims to improve the manufacturing yield test cost-driven redundant We propose to introduce a few test cost-driven redundant cores in addition to yield-driven spares for test cost reduction If test cost reduction exceeds the manufacturing cost increment, the total production cost can be reduced If test cost reduction exceeds the manufacturing cost increment, the total production cost can be reduced

Manycore Processor Era Provides us An Opportunity If test cost reduction exceeds the manufacturing cost increment, the total production cost can be reduced If test cost reduction exceeds the manufacturing cost increment, the total production cost can be reduced Consider a 16-core processor To guarantee that all 16 cores work well provided they pass manufacturing test, we need … – Very high defect coverage to identify killer defects – Sufficient burn-in to weed out chips with latent defects Manufacturing test is responsible for 16 out of 20 cores (instead of all 20 cores) to work – Defect coverage requirement can be lowered – Burn-in test can be reduced or eliminated – Manufacturing cost increases

Agenda Background Test Economics with Partial/No Burn-In Test Economics with Partial Manufacturing Test Experimental Results Conclusion

Basics in Yield Modeling Defects on chip – Negative-binomial distribution Defect type Killer defects Latent defects Bathtub curve

Problem 1 [Partial Burn-In] Enable partial/no burn-in test only Given defect coverage requirement, we consider to introduce redundant cores into manycore system that functions if no less than cores are defect-free We fabricate cores on a chip Chips with all cores pass test are sold out Eventually we need to guarantee cores are defect-free at the end of infant morality Determine the number of burn-in driven spares and burn-in time such that … The production cost per sold chip is minimized Product quality constraint is met

The Impact of Partial Burn-In The reliability induced by latent defects follows Weibull distribution with decreasing failure rate Assume that all latent defects reveal themselves after full burn- in time

Product Quality and Chip Test Yield Product quality requirement The probability that a sold chip actually functions at the end of infant mortality should be higher than a threshold – no less than cores on a chip is defect-free at the end of infant mortality – all cores on a chip pass manufacturing test after (partial) burn-in Chip test yield

Product Quality and Chip Test Yield

Define – -out-of- cores are initially defect-free – cores in that set maintain defect-free after burn-in time We obtain

Cost Model Simple yet effective cost model – capture the key impact of introducing burn-in driven redundancy Manufacturing cost – normalize to the case that manufacturing cost of each core for manycore chips without redundancy is 1 unit ATE cost – ATE cost per fabricated core is unit Burn-in cost – normalize the cost of fully burn-in process as unit and assume it is proportional to the burn-in time

Case Study on Partial/No Burn-In Homogeneous manycore system that functions with no less than 32 defect-free cores Product quality requirement is set to 500DPPM

Problem 2 [Partial Burn-In & Relaxed Defect Coverage] Not only enable partial/no burn-in test but also relax the defect coverage for core tests We introduce test cost-driven spares and yield-driven ones We have totally identical cores on chip Chips containing no less than pass- test cores are shipped out Eventually we need to guarantee cores are defect-free at the end of infant morality

Problem 2 [Partial Burn-In & Relaxed Defect Coverage] Determine the number of test cost-driven spares, number of yield-driven spares, defect coverage for core test, and burn-in time such that … The production cost per sold chip is minimized Product quality constraint is met

The Impact of Test Decision Criterion Ideally a prefect manufacturing test is able to reject all bad cores while accept all defect-free ones and In reality … Test escapes False rejects

Product Quality with False Rejects Redefine – no less than cores on a chip is defect-free at the end of infant mortality – no less than cores among all cores on a chip pass manufacturing test after (partial) burn-in Similarly, we have

Product Quality with False Rejects Notations – -out-of- cores are initially defect-free – cores in that set maintain defect-free after burn-in time – among good cores on a chip, pass the test – among bad cores, pass the test We have

Cost Model Total production cost ATE cost depends on defect coverage

Experimental Setup Homogeneous manycore system that functions with no less than 32 defect-free cores (i.e., ) The best, and combination in terms of production cost is determined by exploring solution space System parameters,,,,,, Product quality requirement is set to 500DPPM

Tradeoff between Burn-In Cost and ATE Cost under Product Quality Constraint High defect density Low defect density

Comparison between Traditional and Proposed Strategy 22.28%

Comparison between Traditional and Proposed Strategy 25.26%

Comparison between Traditional and Proposed Strategy

Conclusion We propose to introduce spare cores into manycore system Burn-in test time can be shorten Defect coverage requirement can be relaxed Without sacrificing quality of the shipped products We develop novel analytical models to verify the effectiveness of the proposed strategy

Test Economics for Homogeneous Manycore Systems Thank you for your attention !