A Resource-minimalist Flow Size Histogram Estimator Bruno Ribeiro, Don Towsley UMass Amherst Tao Ye Sprint
Internet core router: TCP flows Flow size histogram Internet core router: TCP flows Flow size e.g. # of packets TCP flow Flow size histogram used: Traffic profiling Anomaly detection Histogram hard to obtain TCP flows: Hundreds of millions flows/hour (OC-48 router) Estimating flow size histograms Random packet sampling is inaccurate [Ribeiro et al. 2006] Flow sampling: more memory & accurate tail needs packet sampling Current data streaming methods have slow estimators Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
Outline Related work Our resource-minimalist approach Experiment Conclusions Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
Related work [Kumar et al. 2004] Router Packet hash collision!! Universal hash function Flow size histogram 1 2 1 1 2 Estimation phase (powerful backend server) counters hash collisions Complexity: O( (maximum flow size)3 ) Sketch phase Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
Resource-minimalist Approach Insight: Don’t need to count every flow size Idea: Group large flow sizes into bins Fine grained flow histogram < k packets Coarse grained flow histogram > k packets Approach: Probablistic counting Reduces counters to 6 bits Requires: Low collision probability (e.g. counter/flow = 2/1) Result: O(k3 + log(W)) estimator, e.g., k=16 and W=107 Problem: Low collision → more memory (2 counters / flow) Approach: Counter folding Negligible increase in estimator error Requires one extra bit / counter Result: Reduces number of counters by half Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
Group large flow sizes & Probabilistic counting [Morris 78] Counter increments (probabilisitc): With ma = 2ª , 6 bit counter bins up to W=1014 Hash counter p=1/m1 p=1/m2 Arrived packets: k k+2 k-1 k+1 2 1 … … … k-1 k m1 m2 average Counter value k → flow sizes = [k, k+m1-1] Counter value k+1 → flow sizes = [k+m1, k+m1+m2-1] Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
Counter folding: Detecting some collisions Maximum hash value = M M/2 counters If hash(packet) < M/2 → red Otherwise (hash(packet) mod M/2) → blue Detectable blue – red collision: 1 bit required Undetectable collision flow 7 flow 9 flow 8 Flows: Counters: 6 1 2 2 1 6 M/2 counters Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
Counter folding À 1 Collision policy: “red flow cannot increment blue counter” “blue flow overwrites red counter” counter = 0 are red Flows: Counters: 6 1 2 2 1 3 Counter colors: (extra bit) 1 1 1 1 Result: e.g. if 1 counter / flow All red counters are also blue counters = 0 Virtually expands hash table in ≈ 50% (virtual 2 counters/ flow) Blue counters evict red counters Flow sampling effect: Discards 15% flows at random Folding: interesting fact Number of foldings Policy: Evict newest flow (color = flow ID) Flow sampling À 1 Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
Experiment Evaluated with simulations Same accuracy without counter folding requires 13MB of memory Evaluated with simulations Our worst result with Internet core traces 9.5 million flows 8MB of memory k=16 W=1014 k Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"
Conclusions Insights Our Estimator Group large flow sizes using probabilistic counters Counter folding Fast quasi-random sampling Our Estimator Time complexity Sketch phase Universal hash cost Two additions One subtraction Estimation phase O(k3 + log(W)) Space complexity ≈ 1/4 memory usage of [Kumar et al. 2004] Bruno Ribeiro, Tao Ye, Don Towsley, "A Resource-minimalist flow size histogram estimator"