Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke, Carsten Schmoll, Tanja Zseby Fraunhofer Institute FOKUS, Berlin, Germany
Evaluation of Header Field Entropy for Hash-Based Packet Selection PAM 2008, Cleveland Outline 2 1.Introduction Multipoint Sampling 2.Problem Statement 3.Approach 4.Measurement Setup 5.Measurement Results 6.Conclusion
Evaluation of Header Field Entropy for Hash-Based Packet Selection PAM 2008, Cleveland Introduction Multipoint Sampling 3 Passive Multipoint Measurements –at observation points a packet ID and timestamp exported for each packet –trace observable based on occurrence of packet ID –delay = timestamp A – timestamp B of packets with equal ID Multipoint Collector Point A Point B Point C
Evaluation of Header Field Entropy for Hash-Based Packet Selection PAM 2008, Cleveland Introduction Multipoint Sampling 4 CChallenge in Passive Multipoint Measurements immense amounts of measurement data High infrastructure costs: processing, storing, exporting Random Packet Selection and Estimation Random Sampling (n-out-of-N, probabilistic) unsuitable -> inconsistent sample at observation points Duffield and Grossglauser in “Trajectory Sampling for Direct Traffic Observation” propose hash-based packet selection.
Evaluation of Header Field Entropy for Hash-Based Packet Selection PAM 2008, Cleveland Introduction Multipoint Sampling 5 IP HeaderTransport HeaderPayload hash input hash function packet selectedpacket not selected consistent selected subset if x, h and S are equal at all observation points Hash-Based Paket Selection
Evaluation of Header Field Entropy for Hash-Based Packet Selection PAM 2008, Cleveland Problem Statement Which packet content to use as hash input? Requirements for header fields 1.static between network nodes ( IP TTL and checksum) 2.variable among packets Challenge: HBS is deterministic; but goal is to emulate random selection choice of hash input can introduce bias to the selection 6
Evaluation of Header Field Entropy for Hash-Based Packet Selection PAM 2008, Cleveland Problem Statement 7 How bias is introduced -packets in a hash input collision have same hash input -selection decision is not independent -the more packets in collision the more grievous the bias -unsuitable to use whole packet because hash value calculation time increases with hash input length
Evaluation of Header Field Entropy for Hash-Based Packet Selection PAM 2008, Cleveland Approach Approach –packets differ more often in high variable bytes –entropy per byte used to measure variability Entropy Information Efficiency p i probability that hash value i occurs H(B) entropy dependent on discrete Variant of Byte Values 8
Evaluation of Header Field Entropy for Hash-Based Packet Selection PAM 2008, Cleveland Evaluation dependent on analyzed traces -6 IPv4 trace groups – 1 IPv6 -geographical locations (NZ, AUT, FR, NED – 2 LEO) -network location (university, peering point, large ISP) -application mix Measurement Setup 9
Evaluation of Header Field Entropy for Hash-Based Packet Selection PAM 2008, Cleveland Measurement Results Entropy IPv4 10
Evaluation of Header Field Entropy for Hash-Based Packet Selection PAM 2008, Cleveland Measurement Results High Entropy Header Fields IPv4: Identification, Length LSB, Src/Dst Address 2 LSB TCP: Chksum, SeqNo, AckNo, Src/Dst Port 2 LSB UDP: Chksum, Length LSB, Src/Dst Port 2 LSB ICMP: Chksum, Bytes 12,13,18,19 IPv6: Length LSB –more IPv6 traces required for further evaluation –Addresses anonymized and no transport header - only 8 bytes could be evaluated Recommended 8 byte Configuration IP ID field + 6 Transport Header Bytes: TCP (Checksum, 2 LSB of Seq and AckNo) UDP (Checksum, Source Port, LSB Destination Port, LSB Length) ICMP (Checksum, Bytes 12,13,18,19) 11
Evaluation of Header Field Entropy for Hash-Based Packet Selection PAM 2008, Cleveland Measurement Results 12 Empirical Hash Input Collisions Evaluation 4 configurations used 1.whole IP and transport header (minimum reachable collisions) 2.only IP header (bad configuration) 3.8 high entropy bytes 4.Molina‘s 16 bytes sum of packets on 20 largest collisions of each trace –Large collision: all or none decision of all packets that have same attributes –Small collisions: packets equal in one collision but different between
Evaluation of Header Field Entropy for Hash-Based Packet Selection PAM 2008, Cleveland Measurement Results Hash Input Collision Comparison recommended 8 bytes better than Molina’s 16 bytes LEO2 traces include a large VPN traffic flow with UDP Checksum==0 – more high entropy bytes should be used 13
Evaluation of Header Field Entropy for Hash-Based Packet Selection PAM 2008, Cleveland Conclusion Outcome give a recommendation of 8 bytes for use as hash input for HBS 8 recommended bytes sufficient to gain unique hash inputs Henke, Schmoll, Zseby “Empirical Evaluation of Hash Functions for Multipoint Measurements” hash calculation time linear increase with input length hash functions are able to select representative subset based on 8 bytes 14
Evaluation of Header Field Entropy for Hash-Based Packet Selection PAM 2008, Cleveland Future Work Correlation between Bytes Correlation between address bytes entropy of combined bytes expected to be average of entropy IPv6 entropy evaluation of IPv6 addresses transport headers