Mining Anomalies in Network-Wide Flow Data Anukool Lakhina with Mark Crovella and Christophe Diot NANOG35, Oct 23-25, 2005
2 My Talk in One Slide Goal: A general system to detect & classify traffic anomalies at carrier networks Network-wide flow data (eg, via NetFlow) exposes a wide range of anomalies –Both operational & malicious events I am here to seek your feedback
3 Network-Wide Traffic Analysis Simultaneously analyze traffic flows across the network; e.g., using the traffic matrix Network-Wide data we use: Traffic matrix views for Abilene and Géant at 10 min bins
4 LA HSTN ATLA NYC Power of Network-Wide Analysis Distributed Attacks easier to detect at the ingress IPLS Peak rate: 300Mbps; Attack rate ~ 19Mbps/flow
5 How do we extract anomalies and normal behavior from noisy, high-dimensional data in a systematic manner? But, This is Difficult!
6 The Subspace Method [LCD:SIGCOMM ‘04] An approach to separate normal & anomalous network-wide traffic Designate temporal patterns most common to all the OD flows as the normal patterns Remaining temporal patterns form the anomalous patterns Detect anomalies by statistical thresholds on anomalous patterns
7 An example user anomaly One Src-Dst Pair Dominates: 32% of B, 20% of P traffic Cause: Bandwidth Measurement using iperf by SLAC
8 An example operational anomaly Multihomed customer CALREN reroutes around outage at LOSA
9 Summary of Anomaly Types Found [LCD:IMC04] Alpha DOS Scans Flash Events Unknown False Alarms Traffic Shift Outage Worm Point-Multipoint
10 Automatically Classifying Anomalies [LCD:SIGCOMM05] Goal: Classify anomalies without restricting yourself to a predefined set of anomalies Approach: Leverage 4-tuple header fields: SrcIP, SrcPort, DstIP, DstPort –In particular, measure dispersion in fields Then, apply off-the-shelf clustering methods
11 Example of Anomaly Clusters Summary: Correctly classified 292 of 296 injected anomalies ( DstIP ) ( SrcIP ) Legend Code Red Scanning Single source DOS attack Multi source DOS attack ( SrcIP ) Dispersed Concentrated Dispersed
12 Summary Network-Wide Detection: –Broad range of anomalies with low false alarms –In papers: Highly sensitive detection, even when anomaly is 1% of background traffic Anomaly Classification: –Feature clusters automatically classify anomalies –In papers: clusters expose new anomalies Network-wide data and header analysis are promising for general anomaly diagnosis
13 More information Ongoing Work: implementing algorithms in a prototype system For more information, see papers & slides at: Your feedback much needed & appreciated!