Presentation is loading. Please wait.

Presentation is loading. Please wait.

Application-level logs: visualization and anomaly detection

Similar presentations


Presentation on theme: "Application-level logs: visualization and anomaly detection"— Presentation transcript:

1 Application-level logs: visualization and anomaly detection
Peter Bodík UC Berkeley

2 Introduction previous work:
visualization of HTTP access logs automatic detection and localization of anomalies can we extend this to application-level logs? preliminary work on application logs from Amazon.com

3 Overview review work with Ebates.com
capturing application behavior at Amazon.com: application logs visualization of application logs anomaly detection

4 Work with Ebates.com HTTP access logs
analyzed the top 40 pages (98% of all traffic) detection of anomalies compare current traffic with normal traffic Naïve Bayes, chi-square test visualization easy to notice anomalies

5 Sample anomaly How long did it take you to read this? warning #3:
detection time: Sun Nov 16 19:27:00 PST 2003 start: Sun Nov 16 19:24:00 PST 2003 end: Sun Nov 16 21:05:00 PST 2003 significance = 7.05 Most anomalous pages: /landing.jsp /landing_merchant.jsp /mall_ctrl.jsp How long did it take you to read this?

6 Visualization of the same anomaly

7 Conclusion Pros: Cons: anomaly detection worked
visualization worked even better makes false positives “cheaper” able to detect/localize problems earlier Cons: only looks at web pages won’t tell you enough about the problem night anomalies

8 Anomaly score in one dataset
night anomalies

9 Application-level logs
Can we use the same approach on app-level logs? analyzed logs from 3 failures in Amazon.com

10 Amazon.com F user request A H C E HTML page B G D

11 Application logs every request recorded in an application log
request calls operation() in service B service B remote calls to other services C D E every request recorded in an application log

12 How to visualize application logs?
Is this similar to HTTP access logs? HTTP logs: user requests a web page count number of hits to a page application logs: request calls remote methods count number of calls to a method count number of requests to an operation

13 Summary of features for every method M: #requests for operation O
count #requests that called M #calls to M per request average execution time of M #requests for operation O #requests to host H average Time, UserTime, SystemTime

14 Failure 1 from Amazon.com operators: “problem from 12:37 to 12:41,
affected most services, applications recovered at 12:52”

15 Failure 2 from Amazon.com operators:
“10 minute outage from 12:52 to 13:02”

16 Failure 3 misconfiguration have logs only from one service
no anomalies visible

17 Modeling method frequencies
same as for page frequencies in HTTP logs assumption: frequency of a page/method doesn’t change during the day frequencies are independent model the frequency as a Gaussian compute mean and variance anomaly score  negative log likelihood

18 Negative log likelihood/anomaly score over time

19 Anomalous method anomaly higher variance, causes false positives
frequency anomaly higher variance, causes false positives time [hours]

20 ... another method frequency mean changes over time time [hours]

21 … and another one wouldn’t notice this peak, causes false negatives
frequency wouldn’t notice this peak, causes false negatives time [hours]

22 Well ... this is just another source of anomalies! How do we know what’s really broken? sort anomalies by time of detection only the early anomalies are important fine-grained anomalies detect problem earlier earlier warning likely root cause

23 Library of failures signature of a failure:
not just which metrics are anomalous but also when they became anomalous


Download ppt "Application-level logs: visualization and anomaly detection"

Similar presentations


Ads by Google