Download presentation
Presentation is loading. Please wait.
1
Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak Saharon Rosset IBM Research SysML07 (Second Workshop on Tackling Computer Systems Problems with Machine Learning Techniques ) Presented By Hassan Wassel
2
Introduction System logs is a critical tool for system administrators. They are massive in amount We need to rank them according to importance. Previous work: Ranking using expert rules Visualization One machine log
3
What is Important? This paper propose that an important message is the message appears in a probability higher than the expected. Represent messages of the same type by one message type. Calculate the empirical distribution of probabilities and rank them. Systems are not homogeneous.
4
Algorithm Using K-means clustering to divide system logs into classes. Estimate the empirical distribution of each class. Given a system log, identify a class and rank messages according to its P
5
Clustering K-Means tries to minimize an objective function J=Sum j Sum i d 2 (X i, Z j ) Inputs: Number of Clusters Distance Matrix Outputs: Membership matrix Objective function value Features Clusters Patterns
6
Dimensionality Problem The data was 3000 system log with 15,000 message type. However, it is sparse Distance measurement using these 15,000 feature is computationally intensive. Solution: Dimensionality reduction
7
Feature Construction Using Spearman Correlation between every two system logs Corr(x,y) = 1 – (6 || r x – r y || 2 )/(N(N-1)) From k logs X n message types to k X k similarity matrix. Question: How to calculate rank vectors?
8
Evaluation Compare Spearman Correlation to other feature construction schemes. Histogram of Pairwise distance Maximal Mutual Information Improvement in Score
9
Comment Future Work Correlation based clustering Feature extraction + choice of distance measure Bi-clustering Fuzzy Clustering Evaluation Use of human expertise to evaluate the ranking. Clustering index
10
Thank you! Pros and Cons!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.