Steve Zhang Armando Fox In collaboration with:

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Naïve-Bayes Classifiers Business Intelligence for Managers.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Classification: Alternative Techniques
SVM—Support Vector Machines
Ira Cohen, Jeffrey S. Chase et al.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Toward automated diagnosis and forecasting.
4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini
Ensemble Learning: An Introduction
Three kinds of learning
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
CIBB-WIRN 2004 Perugia, 14 th -17 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini Feature.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Automated Workload Management in.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Chapter 5 Data mining : A Closer Look.
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
Ekrem Kocaguneli 11/29/2010. Introduction CLISSPE and its background Application to be Modeled Steps of the Model Assessment of Performance Interpretation.
Issues with Data Mining
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Experimental Evaluation of Learning Algorithms Part 1.
© 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Capturing, indexing and retrieving system history Ira Cohen, Moises Goldszmidt, Julie Symons, Terence Kelly – HP Labs Steve Zhang, Armando Fox -Stanford.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
CLASSIFICATION: Ensemble Methods
Classification Techniques: Bayesian Classification
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
NTU & MSRA Ming-Feng Tsai
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Ensemble Classifiers.
Machine Learning: Ensemble Methods
CS 9633 Machine Learning Support Vector Machines
Machine Learning with Spark MLlib
OPERATING SYSTEMS CS 3502 Fall 2017
CEE 6410 Water Resources Systems Analysis
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
CFA: A Practical Prediction System for Video Quality Optimization
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Roberto Battiti, Mauro Brunato
Introduction to Data Mining, 2nd Edition by
Machine Learning Feature Creation and Selection
K Nearest Neighbor Classification
Data Mining Practical Machine Learning Tools and Techniques
A Unifying View on Instance Selection
COSC 4335: Other Classification Techniques
Ensembles.
Feature Selection Methods
Roc curves By Vittoria Cozza, matr
On applying pattern recognition to systems management
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
COSC 4368 Intro Supervised Learning Organization
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Ensembles of Models for Automated Diagnosis of System Performance Problems Steve Zhang (steveyz@cs.stanford.edu) Armando Fox In collaboration with: Ira Cohen, Moises Goldszmidt, Julie Symons of HP Labs

Motivation Complexity of deployed systems surpasses ability of humans to diagnose, forecast, and characterize behavior Scale and interconnectivity Levels of abstraction Lack of closed-form mathematical characterization Changes in input, applications, infrastructure Need for automated diagnosis tools ROC Retreat Jan 2005

Metric Attribution Service Level Objectives (SLO) Usually expressed in terms of high-level behaviors, e.g. desired average response time or throughput Which low level metrics are most correlated with a particular SLO violation Root cause diagnosis usually requires domain-specific knowledge Often does point to root cause Technique can be applied to more than just performance problems ROC Retreat Jan 2005

Background and Previous Work Modeling system behavior as a pattern classification problem Learn a function F that maps state of low level metrics (M) to SLO state (s+/s-) Tree Augmented Naïve Bayes (TAN) models are used to represented F’s underlying probability distribution Models evaluated using Balanced Accuracy (BA) = 0.5(Prob(F(M) = s-|s-) + Prob(F(M) = s+| s+)) First step is feature selection Find subset of metrics that produces F with best BA Usually use heuristic search (greedy search) Avoids curse of dimensionality as well as overfitting ROC Retreat Jan 2005

Background (Cont.) TAN Models have key property of interpretability: (Metric Attribution) TAN models represent joint probability estimate P(s+/-,M) For each SLO violation, invert the joint probability estimate of each metric to obtain P(Mi|s+) and P(Mi|s-). Metric is attributed with an SLO violation if difference P(Mi|s+) - P(Mi|s-) > 0. Output is a list of metrics that are implicated with each SLO violation. (out of the subset of the metrics chosen during the feature selection process) Not straightforward with many popular classifiers like Support Vector Machines or Neural Networks. ROC Retreat Jan 2005

Key Results of Previous Work Combinations of metrics more predictive of SLO violations than individual metrics. Different combinations under different workload conditions Different combinations under different SLO definitions Small numbers of metrics (3-8) usually sufficient to predict SLO violation Selected metrics yield insight into cause of problem Relationships among multiple metrics of a model simple enough to be captured by TAN In most cases models yield a BA of 90% - 95% ROC Retreat Jan 2005

Why Multiple Models? Previous work focused on forensic analysis, after all data available Experiments show models built using data collected under one condition do not work well on data collected under different conditions By using multiple models, we can incorporate new data as needed by learning new models Multiple models would help keep a history of past behavior and eliminate need to re-learn conditions that occurred previously. ROC Retreat Jan 2005

Managing an Ensemble of Models How much data is needed to build a model? Determined empirically using learning surfaces When do we add a new model to the ensemble? When a new model does statistically significantly better over its training window than all current models How to fuse information from multiple models? Use metric attribution from model that has best Brier score Brier score is similar to Mean Squared Error (MSE) and offers a finer grain evaluation of a model, important when using small evaluation windows ROC Retreat Jan 2005

Algorithm Outline Start with empty TrainingWindow and empty Ensemble for every new sample do Add sample to TrainingWindow if TrainingWindow has enough samples then Train new Model M on TrainingWindow Compute acc(M) using cross validation if acc(M) is significantly higher than accuracy of all models in Ensemble then Add new model to Ensemble end if Compute Brier Score over TrainingWindow for all models if new sample is SLO violation then Do metric attribution using model with best Brier Score end for ROC Retreat Jan 2005

Experimental Setup Apache Web server + WebLogic App Server + Oracle DB Petstore installed on app server ROC Retreat Jan 2005

Workloads Collected system metric data under 5 different workload conditions RAMP: gradual increase in concurrency and request rate BURST: sudden, sustained bursts of increasingly intense workload against backdrop of moderate activity BUYSPREE: alternates between normal periods (1 hour) and periods where clients are doing heavy buying (but number of clients remain the same) DBCPU & APPCPU: External program takes 30% of CPU for DB or APP server every other hour. Used a combination of data from different workloads to evaluate our techniques ROC Retreat Jan 2005

Accuracy of the Ensemble Ensemble significantly outperforms single model Also does slightly better than a workload specific approach Indicates that some workload conditions too complex for single model ROC Retreat Jan 2005

Accuracy During Training DBCPU(1), APPCPU(2), BUYSPREE(3), RAMP(4), BURST(5) ROC Retreat Jan 2005

Metric Attribution Each instance of SLO violation may be indicated by several metrics. Which metrics indicate violations change as the workload changes First column indicates SLO violation (white) or compliance Other columns indicate if a particular metric indicated violation (white) or not ROC Retreat Jan 2005

Learning Surfaces ROC Retreat Jan 2005

Learning Surfaces (Cont.) Simpler workload conditions require fewer samples of each class Accuracy of model depends on ratio of class samples, not just number of samples less sensitive with sufficient samples of each class ROC Retreat Jan 2005

Summary Ensemble of models perform better than single models or even workload specific models Our approach allows for rapid adaptation to changing conditions No domain specific knowledge is required Different workloads seem to be characterized by different metric-attribution “signatures” (future work) ROC Retreat Jan 2005