Impact of Packet Sampling on Anomaly Detection Metrics Daniela Brauckhoff*, Bernhard Tellenbach*, Arno Wagner*, Anukool Lakhina **, Martin May* *ETH Zurich, ** Boston University IMC '06 Proceedings of the 6th ACM SIGCOMM conference on Internet measurement. New York, NY, USA 2006 Citations: 226 Otto
Motivation The general opinion about sampling Valuable information lost Needed anyway Size constraints Cannot get unsampled netflow from some routers Interesting questions arise: How much information is actually lost? Are all anomalies equally affected by sampling? Are all detection metrics equally affected by sampling? At which sampling rate is a certain anomaly still detectable? Can we estimate the original anomaly size from a sampled view? Otto
Article Goal Dataset Study impact of packet sampling on Blaster worm Visibility Anomaly detection metrics Bytes Packets Flows Traffic Features Others Dataset Unsampled Netflow records One week capture Backbone router of a national ISP Known Blaster outbreak in data Otto
Article Goal Dataset Study impact of packet sampling on Blaster worm Visibility Anomaly detection metrics Bytes Packets Flows Traffic Features Others Dataset Unsampled Netflow records One week capture Backbone router of a national ISP Known Blaster outbreak in data Otto
Entropy as a Detection Metric Otto
Entropy as a Detection Metric Otto
Entropy as a Detection Metric Otto
Used variables Otto
Sampling Metodology For individual packets in the flow trace, determine Packet size (bytes) packet_size = flow_size/num_packets (average packet size) Timestamps timestamp randomly chosen within flow bounds Randomly sample every 10th 100th 250th 1000th Otto
Baseline Metodology One baseline per metric and sampling rate AD algorithms measure distance from (predicted) baseline to (actual) observed metrics Each AD method uses it’s own algorithm to determine the baseline model Anomaly is known Construction of an “ideal baseline” By removing all blaster packets from the observed trace destination port: TCP 135 Length: 40, 44, 48 bytes Otto
Baseline Metodology Otto
Sampling Baseline flow counts Flow counts Otto
Sampling Baseline flow dst IP entropy Flow dst IP entropy Otto
Sampling Comparison Otto
Baselines Otto
Anomaly Distance Otto
Distance vs Sampling Rate: During Attack Otto
Scaling Metodology Identification of Blaster packets based on dst port, packet size, tcp Amplification of the Blaster worm Insertion of new packets Same src IP, and dst IP Random selection from SWITCH IP range Attenuation of the Blaster worm Randomly throwing out of some of the Blaster packets Otto
Scaling Otto
Scaling Otto
Conclusion Some metrics are more resilient to sampling Future work Flow DST IP entropy is most resilient for Blaster Future work Other types of anomalies, anomaly intensities Other distance metrics Different bin sizes Further anomaly metrics Anomaly detectability at different sampling rates Otto