Download presentation
Presentation is loading. Please wait.
Published byMiles White Modified over 8 years ago
1
InteMon: Intelligent monitoring system for large clusters Evan Hoke, Jimeng Sun and Christos Faloutsos
2
What is InteMon? Monitoring software for large clusters, specifically targeted for the Data Storage System Center Monitors, analyzes and displaces time value streams Detects abnormalities in data and calls attention to them Greatly reduces amount of human monitoring required
3
How it Works 3 parts: Monitoring, Analyzing Data, Presentation
4
Monitoring List of host names and signals (MIBs) stored in database Daemon running on server queries all hosts for all signals in database via SNMP protocol Querying is staggered over course of a minute to reduce load on network Returned values stored in database, indexed by time, machine and signal type
5
Database Design Entry in database for each stream to be monitored and the machines they belong to Entries grouped into “SPIRIT instances”, i.e. sets that are analysed together Each “SPIRIT instance” associated with a normalization function Each set of hidden variables / reconstructed data associated with “SPIRIT instance”
6
Data Analysis / Abnormality Detection Uses SPIRIT algorithm [Papadimitrou05] Data analysed every minute Correlations are searched for across all signals on a given machine and all signals of the same type across all machines Both raw data, and data normalized by logarithms analysed Raw data
7
Data Analysis (Cont.) Hidden variables that represent correlations calculated and stored in database Data reconstructed from hidden variables Change in number of hidden variables signifies correlations break down – abnormality Weights of streams that contribute to new hidden variable stored Hidden variables Reconstruction
8
How SPIRIT Works Stream values represent a vector in n dimensional space given time Calculate m<n dimensional projections (hidden variables) for vectors s.t. squared residuals is minimized The squared residuals are bounded by a minimum and maximum energy, dropping below or exceeding these causes m to grow or shrink error 20 o C30 o C 20 o C 30 o C Temperature T 2 Temperature T 1
9
How SPIRIT Works (Cont.) Each new vector projected onto hidden variable space and error calculated Projection matrix updated by averaging in error vector scaled s.t. effect of old data decreases exponentially Algorithm runs in O(n) where n= # of streams, no need to access old data error 20 o C 30 o C 20 o C30 o C Temperature T 1
10
Front End Main page displays abnormalities that have occurred in last 24 hours, with links to pertinent graphs. Graph pages display raw data, normalized data, hidden variables and reconstructed data. Abnormalities marked with red boxes Links on the side for analysis of each abnormality. Abnormality analysis page displaces weights contributing to abnormality
11
Screen Shots
14
Acknowledgments Spiros Papadimitriou, Jimeng Sun and Christos Faloutsos for their work on SPIRIT John Strunk and Greg Ganger for advice from the PDL side National Science Foundation, Pennsylvania Infrastructure Technology Alliance, Intel, NTT and Hewlett-Packard for funding
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.