Fingerprinting the Datacenter: Automated Classification of Performance Crises Kenneth Wade, Ling Su.

Slides:



Advertisements
Similar presentations
VMware Site Recovery Manager: Technical Overview
Advertisements

Configuration management
Capacity Planning in a Virtual Environment
SLA-Oriented Resource Provisioning for Cloud Computing
Self-Adaptive, Energy-Conserving variant of Hadoop Distributed File System Kumar Sharshembiev.
Software Quality Assurance Plan
Predictor of Customer Perceived Software Quality By Haroon Malik.
QAAC 1 Metrics: A Path for Success Kim Mahoney, QA Manager, The Hartford
Feature Detection and Outline Registration in Dorsal Fin Images A. S. Russell, K. R. Debure, Eckerd College, St. Petersburg, FL Most Prominent Notch analyze.
EuroSys 2010 Paris April 13-16, Overview 7 workshops, 5 tutorials 10 sessions Storage Systems Transactional Memory Real-Time Systems Systems Management.
LOW-OVERHEAD MEMORY LEAK DETECTION USING ADAPTIVE STATISTICAL PROFILING WHAT’S THE PROBLEM? CONTRIBUTIONS EVALUATION WEAKNESS AND FUTURE WORKS.
Look Who’s Talking: Discovering Dependencies between Virtual Machines Using CPU Utilization HotCloud 10 Presented by Xin.
Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation.
ElasticTree: Saving Energy in Data Center Networks Very offended by KALYAN MANDA LEI XIA.
Experience with some Principles for Building an Internet-Scale Reliable System Mike Afergan (Akamai and MIT) Joel Wein (Akamai and Polytechnic University,
SE 450 Software Processes & Product Metrics Reliability: An Introduction.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Toward automated diagnosis and forecasting.
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
Lesson 13-Intrusion Detection. Overview Define the types of Intrusion Detection Systems (IDS). Set up an IDS. Manage an IDS. Understand intrusion prevention.
SE 450 Software Processes & Product Metrics 1 Defect Removal.
Cumulative Violation For any window size  t  Communication-Efficient Tracking for Distributed Cumulative Triggers Ling Huang* Minos Garofalakis.
Information Seeking Behavior of Scientists Brad Hemminger School of Information and Library Science University of North Carolina at Chapel.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Virtualization in Data Centers Prashant Shenoy
Dg.o conference 2006 Near-Duplicate Detection for eRulemaking Hui Yang, Jamie Callan Language Technologies Institute School of Computer Science Carnegie.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Fingerprinting the Datacenter Marcel Flores Shih-Chi Chen.
Near-Duplicate Detection by Instance-level Constrained Clustering Hui Yang, Jamie Callan Language Technologies Institute School of Computer Science Carnegie.
1 Real Time, Online Detection of Abandoned Objects in Public Areas Proceedings of the 2006 IEEE International Conference on Robotics and Automation Authors.
Fingerprinting the Datacenter Offense Mykell Miller, Gautam Bhawsar.
Mining Officially Unrecognized Side effects of drugs by combining Web Search and Machine learning Carlo Carino, Yuanyuan Jia, Bruce Lambert, Patricia West.
Automated malware classification based on network behavior
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.
How to Resolve Bottlenecks and Optimize your Virtual Environment Chris Chesley, Sr. Systems Engineer
IIT Indore © Neminah Hubballi
Post test survey of the General Census of Population and Housing.
Windows 2000 Advanced Server and Clustering Prepared by: Tetsu Nagayama Russ Smith Dale Pena.
Software Metrics - Data Collection What is good data? Are they correct? Are they accurate? Are they appropriately precise? Are they consist? Are they associated.
Light showcase: System Center 2012 SP1- Operations Manager Medium showcase: System Center 2012 SP1- Operations Manager Deep showcase:
Environment for Information Security n Distributed computing n Decentralization of IS function n Outsourcing.
Continual Service Improvement & ITIL V3
1 LECTURE 6 Process Measurement Business Process Improvement 2010.
ConSil Jeff Chase Duke University. Collaborators Justin Moore –received PhD in April, en route to Google. Did this research. Wrote this paper. Named the.
Papers on Storage Systems 1) Purlieus: Locality-aware Resource Allocation for MapReduce in a Cloud, SC ) Making Cloud Intermediate Data Fault-Tolerant,
Capturing, indexing and retrieving system history Ira Cohen, Moises Goldszmidt, Julie Symons, Terence Kelly – HP Labs Steve Zhang, Armando Fox -Stanford.
ESR 2 / ER 2 Testing Campaign Review A. CrivellaroY. Verdie.
Problem Formulation Elastic cloud infrastructures provision resources according to the current actual demand on the infrastructure while enforcing service.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
2131 Structured System Analysis and Design By Germaine Cheung Hong Kong Computer Institute Lecture 8 (Chapter 7) MODELING SYSTEM REQUIREMENTS WITH USE.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms Author: Monika Henzinger Presenter: Chao Yan.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
The Canopies Algorithm from “Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching” Andrew McCallum, Kamal Nigam, Lyle.
CHARACTERIZING CLOUD COMPUTING HARDWARE RELIABILITY Authors: Kashi Venkatesh Vishwanath ; Nachiappan Nagappan Presented By: Vibhuti Dhiman.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Big Data Quality Panel Norman Paton University of Manchester.
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
© 2010 VMware Inc. All rights reserved Why Virtualize? Beng-Hong Lim, VMware, Inc.
Unobtrusive Mobile User Recognition Patent by Seal Mobile ID Presented By: Aparna Bharati & Ashrut Bhatia.
Bishnu Priya Nanda , Tata Consultancy Services Ltd.
Experience Report: System Log Analysis for Anomaly Detection
Constructing a Predictor to Identify Drug and Adverse Event Pairs
Background Post-transplant outcomes in adult Fontan patients remain poorly defined. Available studies limited to sub-groups within published registry data.
MIS5101: Business Intelligence Outcomes Measurement and Data Quality
Dependability Evaluation and Benchmarking of
Collaborative Filtering Nearest Neighbor Approach
Steve Zhang Armando Fox In collaboration with:
Requirements Management
Presentation transcript:

Fingerprinting the Datacenter: Automated Classification of Performance Crises Kenneth Wade, Ling Su

Reliable System Backup Fingerprint system Safe and tolerant structure Any alarm when Fingerprint failed

Evaluation 100 metrics sampled over 15 min periods, but only 3 key performance indicators (KPI) are designated o very small subset of collected metrics crisis declared when 10% of machines violate KPI's service- level agreement o if there are hundreds of machines running 24x7, many machines may be violating KPI before a crisis declared o why not have warnings as machines start to violate SLAs

Evaluation 5 identifications performed per crisis, starting when crisis detected and continuing 4 subsequent 15 min epochs o in each epoch, identification is a known label or 'x' o "stable" if 0 or more 'x's followed by 0 or more identical labels... accuracy only determined if "stable"  if sequence is unstable, accuracy of individual labels not performed!  what about one crisis causing another?

Evaluation hot & cold quantile thresholds change over time due to changes in workload & performance of the application and is periodically re-computed based on new data o what if workload or application performance changes significantly before new thresholds computed? o how often are the thresholds re-computed?

Automation Unidentified crises A lot of manual efforts should be put o Their goal is is to automate the crisis identification process o The name of this paper is Automated Classification of Performance Crises

Scaling They didn't mention anything about scaling of Fingerprint What if there are thousands of crises exist? Time to compare each crises

Experiment Environment Basic information of data center o Server type o Crises frequency Configuration information of Fingerprint

Lack of Significance By far, the most referenced paper (14 times) was a paper from 2005 called Capturing, Indexing, Clustering, and Retrieving System History... who's authors include Armando Fox and Moises Goldszmidt o a method for extracting an indexable signature for characterizing system state is presented o they essentially adapted this signatures work to a datacenter

Lack of Significance This paper is 2 years old and has only been cited by 12 others o a quarter of the papers that cited this paper were also authored by Armando Fox (with one of those also authored with Moises Goldszmidt) If this fingerprinting is actually useful for datacenter operators, wouldn't people be using this methodology by now?