Balancing Risk and Utility in Flow Trace Anonymization

Balancing Risk and Utility in Flow Trace Anonymization
Martin Burkhart, ETH Zurich Joint work with Daniela Brauckhoff, Elisa Boschi, Martin May

Motivation Sharing of traffic measurements is crucial
Only a limited set of sources available Reproducibility of results Dynamics / variability of traffic Get the big picture (e.g. Internet Storm Center) Keep up with globalized attacks (e.g. botnets) More and more traces are collected but not shared Data protection legislation Security concerns Competitive advantage

State-Of-The-Art: Anonymization
Black Marking Truncation E.g. last bits of IP addresses Permutation Random (Partial) Prefix-preserving IP address permutation Enumeration E.g. Timestamps: keep the logical order of events Categorization Randomization (data mining community) K-Anonymity (data mining community)

The Tradeoff in Anonymization
It‘s a trade-off RU-Maps t: Anony. Strength X-Axis: Utility(t) Y-Axis: Risk(t) Not quantitatively studied, lack of metrics Strongly dependent on the application / attacker model Risk(t) Algorithm X X t=0.1 X t=0.2 X t=0.4 X Prefix Pres. X Random Perm. X t=0.7 Sweet Spot Utility(t)

A Case Study: IP Address Truncation
Techniques that permute IP addresses 1:1 are reversible Characteristic object sizes/frequencies, behavioral profiling, fingerprint active ports, exploit prefix structure Apply IP address truncation and evaluate the risk and utility dimensions Lower risk: Hosts are aggregated to subnets Lower utility: Resolution of entities is reduced Quantifying the tradeoff: How bad is it in numbers? IP address 8 bits trunc. 16 bits trunc.

Internal vs. External Prefixes
Factor 3 Factor 53 x = 8 Asymmetry in prefixes external Internal (AS 559) Is this reflected in Risk reduction? Utility reduction? Unique Count (log) Prefix length (32-x)

Measuring Utility of Truncated Data
Specific application: anomaly detection Compare detection quality of scans and (D)DoS attacks in original and truncated data Two IP-based metrics Unique address count Address entropy 3 weeks of NetFlow data ~ 43 billion flows SWITCH network

Measuring Detection Quality
Ground truth: Manual identification of scans/(D)DoS attacks Run a Kalman filter on metric timeseries Utility measured by AUC (area under the ROC curve) Vary threshold

Utility of Truncated Data
Internal metrics degrade faster than external metrics Counts degrade faster than Entropy

Approximating Risk of Host Identification
In general: Truncation of x bits leads to 2^(32-x) prefixes with 2^x addresses per prefix But: only a fraction (A) of potential addresses is usually active Hence, On average A*2^x addresses per prefix 1, 2, 3, , 11, 12, ... 240, 241, , 255 e.g. A = 10%

Risk of Truncated Data (total: 2.2 million) (total: 4.3 billion) Risk for external addresses is higher due to sparcity! Constant offset:

The Risk-Utility Tradeoff
No truncation 4 bits 8 bits 12 bits 16 bits best tradeoff Metric x Utility Risk internal entropy 8 0.94 0.035 12 0.87 0.002 external entropy 16 0.97 0.02

Conclusion We made a quantitative evaluation of the risk-utility tradeoff in anonymization Entropy is much more resistant to truncation than unique counts Risk and utility degrade faster for internal addresses For detection of scans and (D)DoS attacks, it is possible to get a good tradeoff with high utility and low risk

Thank You for the Attention

Balancing Risk and Utility in Flow Trace Anonymization

Similar presentations

Presentation on theme: "Balancing Risk and Utility in Flow Trace Anonymization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Balancing Risk and Utility in Flow Trace Anonymization

Similar presentations

Presentation on theme: "Balancing Risk and Utility in Flow Trace Anonymization"— Presentation transcript:

Similar presentations

About project

Feedback