Detectability of Traffic Anomalies in Two Adjacent Networks Augustin Soule, Haakon Ringberg, Fernando Silveira, Jennifer Rexford, Christophe Diot
3 Anomaly detection in large networks Anomaly detection is complex for large network Network-wide analysis [Lakhina 04] is promising Validated against multiple networks at different time –Abilene 03, Geant 04, Sprint Europe 03 Features impacting the anomaly detection are unknown yet Compare the anomaly observed between two networks
4 Using entropy for anomaly detection Hypothesis : the distribution changes during an anomaly Entropy is a measure of the dispersion of the distribution 1.Minimum if the distribution is concentrated 2.Maximum if the distribution is spread Four features 1.Source IP distribution 2.Destination IP distribution 3.Source Port distribution 4.Destination port distribution Normal During a DOS attack
5 Detecting anomalies Kalman filter method [Soule 05] Method Overview 1.Use a model to predict the traffic 2.Innovation = Prediction error High threshold avoid false positive
6 Collected dataset Abilene and Geant monitoring Collected three month of data 1.BGP 2.IS-IS 3.NetFlow Isolate twenty consecutive days of complete measurement Connected through two peering links SamplingTemporal aggregation Anonymization Abilene1/1005 min11 bits Geant1/ min0 bits
7 Abilene and Geant Use routing information to isolate 1.Traffic from Abilene to Geant 2.Traffic from Geant to Abilene Detect anomalies inside each dataset using the same threshold parameter, but different data- reduction parametes
8 58 anomalies 10 anomalies 14 anomalies 78 anomalies Anomalies detected Compare the anomalies sent versus the anomalies observed 1.Expected for G2A and A 2.Surprising for G and A2G Amount of traffic ? Sampling ? Anonymization ? Threshold ? Method ? Model ?
9 Undetected anomalies Examples of anomalies detected in a network but undetected in the other. 1.Impact of Sampling & Method 2.Impact of customer’s Traffic Mix 3.Impact of anonymization
10 Geant No spike Example 1 : attack over Port 22 Abilene Spike > 8σ Sampling affects the perception of anomaly The effect depends on the type of anomaly
11 Example 2 : Alpha Flow Destination IP entropy Abilene Large file transfer between two hosts Observed in Geant Undetectable in Abilene In this Abilene the traffic is already concentrated by Web traffic The anomaly detectability is impacted by traffic Geant
12 Example 3 : Scan over an IP subnet Attacker doing a subnet scan One source host Multiple destination hosts 1.Concentration of source IP 2.Dispersion of destination IP But we observe concentration in the Destination IP entropy Anonymization can : 1.Help to detect anomalies 2.Impact the anomaly identification
13 Summary First synchronized observation of two networks for anomaly detection Identification of various features impacting anomaly detection 1.Sampling 2.Traffic mix 3.Anonymization Two anomalies are impacted differently by each features What impacts detectability ?
Thanks for listening !
backup
16 Collecting data from two networks GEANT and Abilene connected through two peering links (nov. 05) 20 consecutive days of traffic and routing data from GEANT and Abilene Using a Kalman filter on the entropy of the distribution of IP and ports Entropy increase => spread of the distribution Entropy decrease => concentration in the distribution
17 What impacts the detection of anomalies ? Objective : Identify important elements that influence anomaly detection Understand the source of false negatives Idea : observe the same anomalies on different networks using the same method parameters
18 Previous work – Multiple methods detecting statistical traffic anomalies : o Anomaly a priori o Input data o Complexity o Model type No a priori Netflow records Low complexity Network-wide diagnosis
19 Impact of the anomymization (contd)
20 Power of Network-Wide Analysis Introduced by [Lakhina 04] and [Soule 05] Learn and use natural correlation between links to model expected behavior Pros : 1.Accurately detects small or distributed events 2.False positive rate typically 10% This paper : 1.Understand why some anomalies are not detected Still to be done : 1.“Automatic” method calibration
21 Attacker Victim Power of Network-Wide Analysis Learn and use the existing correlations between flows Attacker Peak rate: 300Mbps; Attack rate ~ 19Mbps/flow Attack found by methods; hard to manually isolate Traffic (# of bytes) Time Figures from [Lakhina 05]
22 Example 3 : Scan over an IP subnet Attacker doing a subnet scan One source host Multiple destination hosts 1.Concentration of source IP 2.Dispersion of destination IP But we observe concentration in the Destination IP entropy Anonymization can : 1.Help to detect anomalies 2.Impact the anomaly identification Source IP entropy Destination IP entropy
23 Two neighbor observation Differences can be explained by: Network Operation Center 1. Sampling rate 2. Anonymization 3. Anomaly detection threshold Customer’s traffic 1.Traffic mix