Application-level logs: visualization and anomaly detection

Slides:



Advertisements
Similar presentations
Analyzing Significant Differences between Means
Advertisements

Chapter 13 and 14. Error  Type 1: False positive. The effect was not really present, but our statistics lead us to believe that it is.  Type 2: False.
1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Minqi Zhou © Tan,Steinbach, Kumar Introduction to Data Mining.
Assuming normally distributed data! Naïve Bayes Classifier.
Copyright 2004 David J. Lilja1 What Do All of These Means Mean? Indices of central tendency Sample mean Median Mode Other means Arithmetic Harmonic Geometric.
On Community Outliers and their Efficient Detection in Information Networks Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1.
EECS Presentation Web Tap: Intelligent Intrusion Detection Kevin Borders.
5/1/2006Sireesha/IDS1 Intrusion Detection Systems (A preliminary study) Sireesha Dasaraju CS526 - Advanced Internet Systems UCCS.
Benchmarking Anomaly-based Detection Systems Ashish Gupta Network Security May 2004.
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
Mean, Variance, and Standard Deviation for Grouped Data Section 3.3.
Tracking Services for ANY websites and web applications Zhu Xiong CSE 403 LCO.
Anomaly detection Problem motivation Machine Learning.
Ambulation : a tool for monitoring mobility over time using mobile phones Computational Science and Engineering, CSE '09. International Conference.
Visualization of the Webpage Popularity for Ping Wales Visualization of the Popularity of the Web Access for Ping Wales Xiaochuan Huang (George) Supervised.
Towards Automatic Structured Web Data Extraction System Tomas Grigalis, 2nd year PhD student Scientific supervisor: prof. habil. dr. Antanas Čenys.
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
User Profiling for Intrusion Detection in Windows NT Tom Goldring R23.
Jump to first page Tracking users Analyzing how people use your site by Dylan Tweney
Put it to the Test: Usability Testing of Library Web Sites Nicole Campbell, Washington State University.
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
AP Stats BW 9/16 You are going to buy a battery for your video camera. You have 2 companies to choose from and they both claim their batteries will last.
Verification in NCEP/HPC Using VSDB-fvs Keith F. Brill November 2007.
Use of Hierarchical Keywords for Easy Data Management on HUBzero HUBbub Conference 2013 September 6 th, 2013 Gaurav Nanda, Jonathan Tan, Peter Auyeung,
N-Gram-based Dynamic Web Page Defacement Validation Woonyon Kim Aug. 23, 2004 NSRI, Korea.
The Statistical Analysis of Data. Outline I. Types of Data A. Qualitative B. Quantitative C. Independent vs Dependent variables II. Descriptive Statistics.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
I can be You: Questioning the use of Keystroke Dynamics as Biometrics —Paper by Tey Chee Meng, Payas Gupta, Debin Gao Presented by: Kai Li Department of.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
3 common measures of dispersion or variability Range Range Variance Variance Standard Deviation Standard Deviation.
Data Mining BS/MS Project Anomaly Detection for Cyber Security Presentation by Mike Calder.
Presenter: Kuei-Yu Hsu Advisor: Dr. Kai-Wei Ke 2013/4/29 Detecting Skype flows Hidden in Web Traffic.
Bradley Cowie Supervised by Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University DATA CLASSIFICATION FOR CLASSIFIER.
Using HTTP Access Logs To Detect Application-Level Failures In Internet Services Peter Bodík ‡, Greg Friedman †, Lukas Biewald †, Helen Levine §, George.
Web Measurement. The Web is Different from other Commuication Media More precise measurement of activity on Web sites is available More precise measurement.
Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 10 Introduction to the Analysis.
Using HTTP Access Logs To Detect Application-Level Failures In Internet Services Peter Bodík, UC Berkeley Greg Friedman, Lukas Biewald, Stanford University.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
Confidence Intervals Cont.
Chapter 9 Intruders.
Statistical Identification of Encrypted Web-Browsing Traffic
Outline Introduction Characteristics of intrusion detection systems
Memory Standardization
Univariate Descriptive Statistics
Roland Kwitt & Tobias Strohmeier
Detecting Insider Information Theft Using Features from File Access Logs Every action, on your phone, on your computer, online, has some risk associated.
Program Usability Based on the Perception of Bugs as Features
A survey of network anomaly detection techniques
Part IV Significantly Different Using Inferential Statistics
Facility to save and recover models
Chapter 9 Intruders.
EXECUTIVE SUMMARY Ambulance Response Programme - Key Metric Variance
Refining of Failure Detection Technique in Web Applications
Video Ad Mining for Predicting Revenue using Random Forest
ADVANCED ANOMALY DETECTION IN CANARY TESTING
Nir Zaidman and Michael Tahar
Chapter 10 Introduction to the Analysis of Variance
Data Mining Anomaly Detection
Jia-Bin Huang Virginia Tech
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Type I and Type II Errors
Data Mining Anomaly Detection
Anomalous Database Transaction Detection
Presentation transcript:

Application-level logs: visualization and anomaly detection Peter Bodík UC Berkeley

Introduction previous work: visualization of HTTP access logs automatic detection and localization of anomalies can we extend this to application-level logs? preliminary work on application logs from Amazon.com

Overview review work with Ebates.com capturing application behavior at Amazon.com: application logs visualization of application logs anomaly detection

Work with Ebates.com HTTP access logs analyzed the top 40 pages (98% of all traffic) detection of anomalies compare current traffic with normal traffic Naïve Bayes, chi-square test visualization easy to notice anomalies

Sample anomaly How long did it take you to read this? warning #3: detection time: Sun Nov 16 19:27:00 PST 2003 start: Sun Nov 16 19:24:00 PST 2003 end: Sun Nov 16 21:05:00 PST 2003 significance = 7.05 Most anomalous pages: /landing.jsp 19.55 /landing_merchant.jsp 19.50 /mall_ctrl.jsp 3.69 How long did it take you to read this?

Visualization of the same anomaly

Conclusion Pros: Cons: anomaly detection worked visualization worked even better makes false positives “cheaper” able to detect/localize problems earlier Cons: only looks at web pages won’t tell you enough about the problem night anomalies

Anomaly score in one dataset night anomalies

Application-level logs Can we use the same approach on app-level logs? analyzed logs from 3 failures in Amazon.com

Amazon.com F user request A H C E HTML page B G D

Application logs every request recorded in an application log request calls operation() in service B service B remote calls to other services C D E every request recorded in an application log

How to visualize application logs? Is this similar to HTTP access logs? HTTP logs: user requests a web page count number of hits to a page application logs: request calls remote methods count number of calls to a method count number of requests to an operation

Summary of features for every method M: #requests for operation O count #requests that called M #calls to M per request average execution time of M #requests for operation O #requests to host H average Time, UserTime, SystemTime

Failure 1 from Amazon.com operators: “problem from 12:37 to 12:41, affected most services, applications recovered at 12:52”

Failure 2 from Amazon.com operators: “10 minute outage from 12:52 to 13:02”

Failure 3 misconfiguration have logs only from one service no anomalies visible

Modeling method frequencies same as for page frequencies in HTTP logs assumption: frequency of a page/method doesn’t change during the day frequencies are independent model the frequency as a Gaussian compute mean and variance anomaly score  negative log likelihood

Negative log likelihood/anomaly score over time

Anomalous method anomaly higher variance, causes false positives frequency anomaly higher variance, causes false positives time [hours]

... another method frequency mean changes over time time [hours]

… and another one wouldn’t notice this peak, causes false negatives frequency wouldn’t notice this peak, causes false negatives time [hours]

Well ... this is just another source of anomalies! How do we know what’s really broken? sort anomalies by time of detection only the early anomalies are important fine-grained anomalies detect problem earlier earlier warning likely root cause

Library of failures signature of a failure: not just which metrics are anomalous but also when they became anomalous