Prefix-Preserving IP Address Anonymization: Measurement-based Security Evaluation and a New Cryptography-based Scheme Jun Xu, Jinliang Fan, Mostafa Ammar, Sue Moon College of Computing Sprint ATL Georgia Tech modified & presented by Zihui Ge
1 Overview Motivation IP address anonymization prefix-preserving Prefix-preserving anonymization canonical form TCPdpriv cryptography-based scheme Attacks models, analysis, evaluation
1 Motivation Traces collected, to share or not to share? client personal privacy? commercial confidentiality? IP address anonymization → one to one mapping, consistent Prefix relationships among IP addresses? important: routing performance, clustering of end-systems Prefix-preserving anonymization → → preserve prefix correlation among addresses
1 IP Address Anonymization Basic anonymization a: original 4-byte IP address a =a 1 a 2 … a 32 a’: anonymized IP addressa’=a’ 1 a’ 2 …a’ 32 F: 1-to-1 mapping functiona’=F(a) Prefix preserving anonymization if a, b share k-bit prefix a 1 =b 1,a 2 =b 2, …, a k =b k, a k+1 =b k+1 then a’=F(a), b’=F(b) share k-bit prefix a’ 1 =b’ 1,a’ 2 =b’ 2, …, a’ k =b’ k, a’ k+1 =b’ k+1
1 Canonical Form Canonical construction of F using a series of f i a’ i = a i f i-1 (a 1, a 2, …, a i-1 ) f 0 is a constant F is a prefix-preserving anonymization function A prefix-preserving anonymization function necessarily takes this form Different schemes use different f i Visualized as a tree
1 Visualization : Address Space
1 Visualization: Original Address Tree
1 Visualization: Anonymization Function FlipLeaf Node f 0 ()= f 1 (0)=1 f 1 (1)=0 f 2 (0,0)=0
Visualization: Anonymized Address Tree
1 TCPdpriv a 1 a 2 …a k a k+1 a K+2 …a n rand(a 1 a 2 …a k a k+1 …a n ) a 1 a 2 …a k a k+1 b k+2 …b n a’ 1 a’ 2 …a’ k a’ k+1 …a’ n a’ 1 a’ 2 …a’ k a’ k+1 a’ 1 a’ 2 …a’ k a’ k+1 b’ k+2 …b’ n Sequentially scan IP address look up prefix in “history” table randomly choose suffix concatenate prefix,suffix; update “history” table rand(b k+2 …b n )
1 TCPdpriv Sequentially scan IP address look up prefix in “history” table randomly choose suffix concatenate prefix,suffix; update “history” table Mapping is trace-dependent Need to maintain a table to track previous mappings table size grow over time look up cost increase over time Unable to process in parallel
1 New Crytography-Based Algorithm deterministic f i function trace-independent What PRF to use? Practical bock ciphers, e.g., AES, can be modeled as PRP L: least significant bit R: pseudo-random function P: secret padding K: secret key f i (a 1, a 2, …, a i-1 ) := L(R(P(a 1 a 2 …a i-1 ), K))
1 Attacks on Anonymization Schemes Cryptographic attacks scheme specific vulnerability comes from the specific construction of f i TCPdpriv: not susceptible our scheme: provable secure Semantic attacks common to all schemes vulnerability comes from the canonical construction of F effectiveness should be measured
1 Evaluation of Semantic Attacks Metrics to measure effect of attacks Virtual (but theoretically interesting) attacks good measure of the resistance of a specific trace to semantic attacks in general good relative reference points for measuring the effectiveness of practical attacks. Practical attacks
1 Metrics to Measure Effect of Attacks Measure of attack severity: U: # of unknown uncompressed bits C: # of unknown compressed bits K i : # of addresses with exactly i known most significant bit
1 If an address is compromised … C=9, U=18, K 1 =4, K 2 =2, K 3 =2, K 4 =1000?0010 1??? 000? 01?? 1???
1 Evaluation on Real Traces Measure the resistance of a specific trace to semantic attacks in general Effect of compromising random address Effect of compromising greedily-generated address
1 Effect of Compromising Random Addresses
1 Practical Attacks Frequency Analysis DNS Server Address Tracing Others
1 Conclusions Canonical form of constructing prefix- preserving anonymization function New cryptography-based scheme Framework of measuring the resistance of traces and the effectiveness of attacks Implementation