Connect. Communicate. Collaborate Using Temporal Locality for a Better Design of Flow-oriented Applications Martin Žádník, CESNET TNC 2007, Lyngby
Connect. Communicate. Collaborate Motivation Optimize performance of network applications Where context is retrieved with every arrival of the packet Such as passive monitoring applications such as NetFlow, IDS, … So far, scaling by sampling
Connect. Communicate. Collaborate Memory limitation Context must be stored in memory which is either –small and fast or –large and slow What about memory hierarchy? Use large memory with cache similarly to PC architecture Only if locality of traffic is good –spatial –temporal
Connect. Communicate. Collaborate Steps Find a network characteristic for locality Apply it on real samples Analyze results Optimize architecture Optimize performance Focus on flow-oriented applications
Connect. Communicate. Collaborate Time characteristic is dependent on the speed of link Pseudo-Time is counted in number of packets Not interested directly in time but rather in sequence locality (what is next) Metric
Connect. Communicate. Collaborate Characteristic Flow gap = gap (measured in number of diff. packets) between two packets of the same flow
Connect. Communicate. Collaborate Measurement Collecting data –samples of 8 – 30 mil. packets –tcpdump, headers only – :64540, :64510 Offline processing –Perl scripts –average gaps, maximum gaps –cumulative histograms
Connect. Communicate. Collaborate Results Distribution of flow-gaps is exponential for common traffic
Connect. Communicate. Collaborate Apply results Estimate size of the cache in system of cache and slow memory (DRAM) Optimize replacement policy Estimate the speed-up Case study on FlowMon probe
Connect. Communicate. Collaborate Real World On chip cache latency 1 clock cycle External cache 4 clock cycles DRAM average latency 16 cycles
Connect. Communicate. Collaborate Amdahl’s law
Connect. Communicate. Collaborate FlowMon context - speedup 8x 64bit words Internal Cache 9 cycles External Cache 12 cycles DRAM 24 cycles
Connect. Communicate. Collaborate Victim policy LRU x Random
Connect. Communicate. Collaborate Entering policy Sample&Hold [Estan,Varghese] Target elephants flows only Make sense only for really small cache
Connect. Communicate. Collaborate Conclusion PseudoTime locality of flows Measurements on real samples So far, on-chip CACHE only Speed-up 1.7x: Memory architecture described in VHDL and used for FlowMon probe on COMBO6X cards Future work: –Corelation with timestamps –Implement LRU or Sample&Hold