1 Graduate Statistics Student, 2 Undergraduate Computer Science Student, 3 Professor and Director of Statistical Consulting Collaboratory 4 Chief Technology.

Slides:



Advertisements
Similar presentations
A Comparison of Shewhart and CUSUM Methods for Diagnosis in a Vendor Certification Study Erwin M. Saniga Dept. of Bus. Admin. University of Delaware Newark,
Advertisements

Materials for Lecture 11 Chapters 3 and 6 Chapter 16 Section 4.0 and 5.0 Lecture 11 Pseudo Random LHC.xls Lecture 11 Validation Tests.xls Next 4 slides.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Mitigating Risk of Out-of-Specification Results During Stability Testing of Biopharmaceutical Products Jeff Gardner Principal Consultant 36 th Annual Midwest.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
Topic 6: Introduction to Hypothesis Testing
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Importance Sampling. What is Importance Sampling ? A simulation technique Used when we are interested in rare events Examples: Bit Error Rate on a channel,
MARE 250 Dr. Jason Turner Hypothesis Testing II. To ASSUME is to make an… Four assumptions for t-test hypothesis testing:
Sample size computations Petter Mostad
Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.
Evaluating Hypotheses
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Hypothesis Testing Introduction to Study Skills & Research Methods (HL10040) Dr James Betts.
The Experimental Approach September 15, 2009Introduction to Cognitive Science Lecture 3: The Experimental Approach.
Control Chart to Monitor Quantitative Assay Consistency Based on Autocorrelated Measures A. Baclin, M-P. Malice, G. de Lannoy, M. Key Prato GlaxoSmithKline.
Control Charts.
Fast and Robust Worm Detection Algorithm Tian Bu Aiyou Chen Scott Vander Wiel Thomas Woo bearhsu.
Today Concepts underlying inferential statistics
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
“There are three types of lies: Lies, Damn Lies and Statistics” - Mark Twain.
Other Univariate Statistical Process Monitoring and Control Techniques
TESTING A HYPOTHESIS RELATING TO THE POPULATION MEAN 1 This sequence describes the testing of a hypothesis at the 5% and 1% significance levels. It also.
Statistical Process Control
Active Learning Lecture Slides
Linear Regression and Correlation
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.
1 Level of Significance α is a predetermined value by convention usually 0.05 α = 0.05 corresponds to the 95% confidence level We are accepting the risk.
Introduction to Statistical Inference Probability & Statistics April 2014.
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.
Introduction to Statistical Quality Control, 4th Edition
Department of Statistics, University of California, Riverside, CA Graduate Student, 2 Faculty and Director of Collaboratory, 3 Manager of Collaboratory.
Association between 2 variables
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Psyc 235: Introduction to Statistics DON’T FORGET TO SIGN IN FOR CREDIT!
1 Lecture 19: Hypothesis Tests Devore, Ch Topics I.Statistical Hypotheses (pl!) –Null and Alternative Hypotheses –Testing statistics and rejection.
© 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Using the Repeated Two-Sample Rank Procedure for Detecting Anomalies in Space and Time Ronald D. Fricker, Jr. Interfaces Conference May 31, 2008.
Controlling Non-Homogeneous Multistream Binomial Processes with a Chi-Squared Control Chart Peter Wludyka Associate Professor of Statistics Director of.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Chapter 10 Verification and Validation of Simulation Models
Monitoring High-yield processes MONITORING HIGH-YIELD PROCESSES Cesar Acosta-Mejia June 2011.
Bootstrap Event Study Tests Peter Westfall ISQS Dept. Joint work with Scott Hein, Finance.
Applied Quantitative Analysis and Practices LECTURE#14 By Dr. Osman Sadiq Paracha.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Assumptions 1) Sample is large (n > 30) a) Central limit theorem applies b) Can.
Dr. Dipayan Das Assistant Professor Dept. of Textile Technology Indian Institute of Technology Delhi Phone:
1 SMU EMIS 7364 NTU TO-570-N Control Charts Basic Concepts and Mathematical Basis Updated: 3/2/04 Statistical Quality Control Dr. Jerrell T. Stracener,
1 SMU EMIS 7364 NTU TO-570-N More Control Charts Material Updated: 3/24/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.
Copyright © Cengage Learning. All rights reserved. 16 Quality Control Methods.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
1 Testing Statistical Hypothesis The One Sample t-Test Heibatollah Baghi, and Mastee Badii.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Chapter 13 Understanding research results: statistical inference.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
P-values.
Online Conditional Outlier Detection in Nonstationary Time Series
Estimation of the Number of Min-Cut Sets in a Network
Chapter 10 Verification and Validation of Simulation Models
Statistical Process Control
ENGM 620: Quality Management
Ch13 Empirical Methods.
Searching for GRB-GWB coincidence during LIGO science runs
Statistical Data Analysis
Special Control Charts II
Presentation transcript:

1 Graduate Statistics Student, 2 Undergraduate Computer Science Student, 3 Professor and Director of Statistical Consulting Collaboratory 4 Chief Technology Officer, Integrien Corporation Introduction Conventional CUSUM Procedure Data Qi Zhang 1, Carlos J. Rendon 2, Daniel R. Jeske 3 Veronica Montes de Oca 1, and Mazda Marvasti 4 Alive TM is the major software product of Integrien Corporation that monitors, visually presents and reports the health of a business information technology system. The Statistical Consulting Collaboratory at the University of California, Riverside was contacted to develop a nonparametric statistical change-point detection procedure that would be applied to most types of univariate data. Our work extended the conventional CUSUM procedure to a nonparametric timeslot stationary context and is being implemented into the next release of Alive TM. CUSUM Screening of Historical Data CUSUM with Resetting Performance Evaluation For each simulated sample path, compute. H is the 100(1-  )th percentile of the EDF of these values, where  is the nominal false alarm level. Special Thanks To: The Staff of Integrien Corporation, Pengyue James Lin (CTO, College of Humanities, Arts and Social Sciences at UCR), Dr. Huaying Karen Xu (Associate Director of Statistical Consulting Collaboratory at UCR), Prof. Keh-Shin Lii (Dept of Statistics at UCR), Graduate Students of the Spring 2006 offering of STAT 293. Let X n denote the measurement of a univariate process at the n th time point and assume that with µ and σ 2 known. If X n shifts upward or downward more than K units from the mean, we say that there is a serious change. The CUSUM statistics are expressed as where K is generally called the reference value. If or are above some predetermined threshold H, we conclude that there is a change in the mean. The threshold H is determined to control the average run length (ARL) between false alarms, and is usually obtained from Monte Carlo Simulations target Level of change that is “serious.” For non-Gaussian measurements, use the 100  th and 100(1-  ) th percentile, and, for each timeslot instead of  + K and  – K. The generalized CUSUM becomes where  n = timeslot associated with the current hour  {1, 2, …168} Data from a real client was available. Data within each hour timeslot were assumed to be i.i.d. Empirical distributions for each timeslot are estimated from a rolling window of 12 weeks of historical data. Nonparametric CUSUM Procedure Implementation Flow Chart Reset CUSUM statistics after each alarm to eliminate the effect of previous alarm. Alarm end is determined via slope test. Real example from Integrien data H H Reset point Week1…121314…2021 Cycle 1 Predict Cycle 2 Predict … Cycle 7 Predict Cycle 8 Predict Cycle 9Predict Metric Number of Alarms Average detection Time (min) False Positives per cycle False Negatives per cycle Computation Time per Cycle (min) Live Active Resp. Time Oracle Study Based on Simulated Data Monte Carlo Simulations for H Timeslot Attribute Value Denotes median of distribution Assume the data windows causing alarms by the CUSUM procedure are anomalous. A slope test is used to find the start and end point of the data window. Start Point When the CUSUM statistic alerts, begin a backward sequence of fitted lines using windows of v points. Predicted start point is the rightmost point of the first window for which the hypothesis is not rejected on the basis of a t-test. End Point At the time the CUSUM statistic alerts, begin a forward sequence of fitted lines using windows containing the previous v points. Predicted end point is the time at which the CUSUM is the largest value within the first window for which the hypothesis is not rejected on the basis of a t-test. Inject an event that shifts the timeslot distributions by 100X% during the second half of the week. Report the average number of samples between the starting point of an injected event and the point at which the CUSUM signals. Average is based on 1,000 sample path simulations for each cycle. Study Based on Real Client Data Real-time Processing Off-line Processing Construct Timeslot Distributions Determine H for Screening Run CUSUM on Historical Data Screen Alerts from Historical Data Historical Data Construct Screened Timeslot Distributions Determine H for New Data CUSUM on New Data Monitor for Alerts Generalized CUSUM for Live Sessions A Nonparametric CUSUM Algorithm for Timeslot Sequences with Applications to Network Surveillance Variability in the Mean and Std. Dev. of the # of Live Sessions on a Network Server 12 weeks of historical data and 9 new monitoring weeks (cycles). True alarms were determined by subject matter expert. Conclusion: The procedure performs well with respect to 0 false negatives per cycle indicating alarms will be adequately detected. Conclusion: If the shift is small, the average number of samples until detection will be large. If the shift is large, the average number of samples until detection will small, therefore an alarm will be signaled immediately. - S + - S - - S + - S - H Signal an alarm here First backward-window before the alarm where the slope is not positive First forward-window after the alarm where the slope is no longer positive Predicted Start Time Predicted End Time t Illustrative Timeslot Distributions for # of Live Sessions on a Network Server