by Xiang Mao and Qin Chen

Slides:



Advertisements
Similar presentations
1 Impact of Decisions Made to Systems Engineering: Cost vs. Reliability System David A. Ekker Stella B. Bondi and Resit Unal November 4-5, 2008 HRA INCOSE.
Advertisements

A Fast Estimation of SRAM Failure Rate Using Probability Collectives Fang Gong Electrical Engineering Department, UCLA Collaborators:
Hadi Goudarzi and Massoud Pedram
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
Henry C. H. Chen and Patrick P. C. Lee
Imbalanced data David Kauchak CS 451 – Fall 2013.
Modeling & Simulation. System Models and Simulation Framework for Modeling and Simulation The framework defines the entities and their Relationships that.
Sensitivity Analysis In deterministic analysis, single fixed values (typically, mean values) of representative samples or strength parameters or slope.
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
Lincoln University Canterbury New Zealand Evaluating the Parallel Performance of a Heterogeneous System Elizabeth Post Hendrik Goosen formerly of Department.
Optimal resampling using machine learning Jesse McCrosky.
SE 450 Software Processes & Product Metrics Reliability: An Introduction.
On the Construction of Energy- Efficient Broadcast Tree with Hitch-hiking in Wireless Networks Source: 2004 International Performance Computing and Communications.
Dependability Evaluation. Techniques for Dependability Evaluation The dependability evaluation of a system can be carried out either:  experimentally.
Effective Gaussian mixture learning for video background subtraction Dar-Shyang Lee, Member, IEEE.
Toward Optimal Network Fault Correction via End-to-End Inference Patrick P. C. Lee, Vishal Misra, Dan Rubenstein Distributed Network Analysis (DNA) Lab.
Grid Load Balancing Scheduling Algorithm Based on Statistics Thinking The 9th International Conference for Young Computer Scientists Bin Lu, Hongbin Zhang.
Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces Kavraki, Svestka, Latombe, Overmars 1996 Presented by Chris Allocco.
T WO WAY ANOVA WITH REPLICATION  Also called a Factorial Experiment.  Replication means an independent repeat of each factor combination.  The purpose.
Integrated Social and Quality of Service Trust Management of Mobile Groups in Ad Hoc Networks Ing-Ray Chen, Jia Guo, Fenye Bao, Jin-Hee Cho Communications.
AMOST Experimental Comparison of Code-Based and Model-Based Test Prioritization Bogdan Korel Computer Science Department Illinois Institute of Technology.
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
Vector Control of Induction Machines
1 Enabling Large Scale Network Simulation with 100 Million Nodes using Grid Infrastructure Hiroyuki Ohsaki Graduate School of Information Sci. & Tech.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Performance Evaluation of Computer Systems Introduction
1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
1 EnviroStore: A Cooperative Storage System for Disconnected Operation in Sensor Networks Liqian Luo, Chengdu Huang, Tarek Abdelzaher John Stankovic INFOCOM.
1 6. Reliability computations Objectives Learn how to compute reliability of a component given the probability distributions on the stress,S, and the strength,
Robin McDougall Scott Nokleby Mechatronic and Robotic Systems Laboratory 1.
Lecture 19 Nov10, 2010 Discrete event simulation (Ross) discrete and continuous distributions computationally generating random variable following various.
Aemen Lodhi (Georgia Tech) Amogh Dhamdhere (CAIDA)
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
Scalable Computing on Open Distributed Systems Jon Weissman University of Minnesota National E-Science Center CLADE 2008.
Chin-Yu Huang Department of Computer Science National Tsing Hua University Hsinchu, Taiwan Optimal Allocation of Testing-Resource Considering Cost, Reliability,
Evaluation of the XL Routing Algorithm in Multiple Failure Conditions Nguyen Cao Julie Morris Khang Pham.
Copyright © 2011, Performance Evaluation of a Green Scheduling Algorithm for Energy Savings in Cloud Computing Truong Vinh Truong Duy; Sato,
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Statistical Analysis of Inlining Heuristics in Jikes RVM Jing Yang Department of Computer Science, University of Virginia.
April 28, 2003 Early Fault Detection and Failure Prediction in Large Software Systems Felix Salfner and Miroslaw Malek Department of Computer Science Humboldt.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Machine Design Under Uncertainty. Outline Uncertainty in mechanical components Why consider uncertainty Basics of uncertainty Uncertainty analysis for.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
CHARACTERIZING CLOUD COMPUTING HARDWARE RELIABILITY Authors: Kashi Venkatesh Vishwanath ; Nachiappan Nagappan Presented By: Vibhuti Dhiman.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Traffic Simulation L2 – Introduction to simulation Ing. Ondřej Přibyl, Ph.D.
Authors: Jiang Xie, Ian F. Akyildiz
Real-Time Soft Shadows with Adaptive Light Source Sampling
Hardware & Software Reliability
Trustworthiness Management in the Social Internet of Things
Availability Availability - A(t)
Modeling and Simulation (An Introduction)
Center for Advanced Life Cycle Engineering (CALCE)
IMAGE SEGMENTATION USING THRESHOLDING
CHAPTER 3 Architectures for Distributed Systems
Simulation - Introduction
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E
Fault Tolerance Distributed Web-based Systems
Optimizing MapReduce for GPUs with Effective Shared Memory Usage
Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.
An Adaptive Middleware for Supporting Time-Critical Event Response
CSCD 506 Research Methods Winter 2018
YuankaiGao,XiaogangLi
Performance Evaluation of Computer Networks
Performance Evaluation of Computer Networks
Anand Bhat*, Soheil Samii†, Raj Rajkumar* *Carnegie Mellon University
Presentation transcript:

by Xiang Mao and Qin Chen Important Sampling to Evaluate Real-time System Reliability - A Case Study by Xiang Mao and Qin Chen

Outline Introduction Technical Background Implementation Importance Sampling The RAPIDS Simulator Implementation Experimental Result & Analysis Conclusion 23:25

Introduction Applications of real-time system: Aircraft control Traffic control Factory automation etc. Requirement of high reliability: Deliver critical outputs in a timely fashion, even in the presence of a few component failures 23:25

Introduction Reliability in real-time system is Dynamic Modeling vs. Simulation Drawbacks of Simulation Require a very long computation times Unreliability ≤ 10-5 ~ days to weeks A solution – Importance Sampling 23:25

Introduction Major works of the paper: Implementing importance sampling in the RAPIDS simulator; Analyzing the expected behavior of such a scheme; Validating the implementation; Investigating the tunable parameters in the scheme and providing guidelines on their use.

Importance Sampling Help understanding – overheated processor Introduction: The set of failed states Unreliability X is a random variable with density function p(x) 23:25

Importance Sampling Implementation Heuristics Forcing (or accelerate): increase the rate at which state transitions occur. Balanced failure biasing: bias the system towards more faults Each of them has a likelihood ratio associated with it. The overall likelihood ratio is just the product of the individual ones. 23:25

The RAPIDS Simulator http://www.ecs.umass.edu/ece/realtime 23:25 (Parallel Virtual Machine) http://www.ecs.umass.edu/ece/realtime 23:25

Implementation Event Generator Decide the time of the next system state transition. Implement forcing to accelerate the state changes. Decide whether the next transition is a fault arrival or repair. Implement failure biasing to push the system towards more component faults. Calculate the likelihood ratio associated with each ‘change of measure‘ and store this value along with the event. 23:25

Implementation Analyzer Receive reports from the simulated system. If it corresponds to one of the above mentioned ‘change of measure‘, update the current simulation weight. If the system fails within the mission time, set the simulation output to the current value of the likelihood ratio; else set the simulation output to zero. 23:25

Experiments Reliability region that Importance Sampling works well Performance metrics Simulation acceleration Sample variance reduction Same sample variance; Same number of simulations.

System Configurations Increasing system reliability The system to be modeled can be viewed as a collection of computing nodes that are connected by a communication network. Transient/permanent processor failure rate

Simulation Acceleration The bigger AF is, the more simulation speed-up is. 1. Except for config 1, IS achieves simulation speed-up; 2. As the system reliability increases, so does the acceleration factor; 3. If system reliability is low, it is better to use normal simulation.

Bias Parameter Failure bias Bad parameter Good parameter How fast we push the system towards failure Bad parameter Too low or too high  high sample variance Good parameter Low sample variance  fewer samples/simulation speed-up!

Bias Parameter, cont’d Fixed system  fixed error rate + fixed number of simulations Change bias  different sample variance

Conclusions Importance Sampling in a simulator testbed Achieves simulation acceleration, when the system reliability is high (unreliability ≤ 10-3 ) Proper tuning up of parameter needed to achieve sample variance reduction Not suitable for less reliable systems

Question ? The End 23:25