Download presentation
Presentation is loading. Please wait.
1
Fingerprinting the Datacenter Marcel Flores Shih-Chi Chen
2
Motivation Large datacenters often encounter large and complex crises Come in the form of dipping below SLAs Often complex and difficult to diagnose Can be costly to operators
3
Approach Want to quantify the state of the datacenter in a compact manner Can be compared to past crises Allows for easy identification and diagnoses of crises
4
Fingerprints Tracks quantiles for each metric Determines hot/normal/cold status for each metric Includes only relevant metrics Uses a similarity metric for comparison
5
Fingerprint - details Track quantiles of each metric Resistant to outliers Measure 25%, 50%, 95% quantiles Determines if each measurement is Hot (>98th percentile), Cold (<2nd percentile), or Normal
6
Relevant Metrics Select metrics via feature selection and classification Technique from statistical machine learning Eliminates noise from the fingerprints
7
Identification Define a similarity metric Allows comparison between current state fingerprint and known crisis fingerprints Identification Threshold determines when two fingerprints are considered the same
8
Evaluation Used data gathered from a real live data center consisting of hundreds of servers 240 days About 100 metrics per server
9
Evaluation Criteria Discrimination: when are two crises different? Identification Stability: when does it provide a consistent suggestion? Identification Accuracy: when does it provide the correct label?
10
Offline Uses all known data Attempts to recall the crises that it saw Provides a baseline What is the best possible (if it knew everything)? Dominates existing methods, near perfect.
11
Quasi-Online More realistic, but still computes the thresholds offline Doesn’t know the future Known and Unknown accuracy of 85%
12
Online Everything online, computed on the fly Including Identification Threshold Achieved both accuracies to 80% (with 10 seeding crises) 78% known, 74% unknown (with 2) Does well with smaller seeding set!
13
A note on Thresholds Hot/Cold thresholds were selected arbitrarily Ran evaluations with varied values from other statistical methods Showed reduced discriminative power (95% down from 99%) Why mess with what works?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.