Download presentation
Presentation is loading. Please wait.
1
1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel
2
2 Outline network monitoring/measurement information theory & compression single point trace compression joint network trace compression future work
3
3 Motivation service providers, service users monitoring anomaly detection debugging traffic engineering pricing, peering, service level agreements architecture design application design
4
4 Network of Network Sensors network monitoring: sensing a network embedded vs. exogenous single point vs. distributed different granularities full traffic trace: packet headers flow level record: timing, volume summary statistics: byte/packet counts challenges growing scales: high speed link, large topology constrained resources: processing, storage, transmission 30G headers/hour at UMass gateway solutions sampling: temporal/spatial compression: marginal/distributed
5
5 Entropy & Compression (I) Shannon entropy of discrete r.v. compression of i.i.d. sequence by source coding coding: expected code length: info. theoretic bound: Shannon/Huffman coding assign short codeword to frequent outcome achieve the H(X) bound
6
6 Entropy & Compression (II) joint entropy entropy rate of stochastic process exploit auto-correlation lower bound on # bits per sample of X compression ratio: H(X)/M, M – original size per sample Lempel-Ziv Coding asymptotically achieve entropy rate of stationary process universal data compression algorithms: LZ77, gzip, winzip
7
7 Entropy & Compression (III) joint entropy rate of set of stochastic processes joint data compression exploit cross-correlation between sources joint compression ratio Slepian-Wolf Coding distributed compression: encode each process individually, achieve joint entropy rate in limit require knowledge of cross-correlation structure
8
8 Network Trace Compression naïve way: treat as byte stream, compress by generic tools gzip compress UMass traces by a factor of 2 network traces are highly structured data multiple fields per packet diversity in information richness correlation among fields multiple packets per flow packets within a flow share information temporal correlation multiple monitors traversed by a flow most fields unchanged within the network spatial correlation network trace models quantify information content of network traces serves as lower bounds/guidelines for compression algorithms
9
9 Packet Header Trace source IP address destination IP address data sequence number acknowledgment number time stamp (sec.) time stamp (sub-sec.) total lengthToSvers.HLen IPIDflags TTLprotocolheader checksum destination portsource port window sizeHlen fragment offset TCP flags urgent pointerchecksum Timing IP Header TCP Header 01631
10
10 Header Field Entropy source IP address destination IP address data sequence number acknowledgment number time stamp (sec.) time stamp (sub-sec.) total lengthToSvers.HLen IPIDflags TTLprotocolheader checksum destination portsource port window sizeHlen fragment offset TCP flags urgent pointerchecksum Timing IP Header TCP Header 01631 Θ: flow id T: Time
11
11 Single Point Compression T0 Θ0Θ0 T1 Θ1Θ1 T3 Θ0Θ0 Tn ΘnΘn Tm Θ0Θ0 temporal correlation introduced by flows packets from same flow closely spaced in time they share header information packet inter-arrival: # bits per packet: T0 Θ0Θ0 T3 Θ0Θ0 Tm Θ0Θ0 flow based trace: flow record: Θ0Θ0 KT0 flow ID flow size arrival time packet inter-arrival # bits per flow id: H( Θ )
12
12 Flow Level Model Poisson flow arrival rate: flow inter-arrival: independent packet inter-arrival: K – flow length # bits per flow: # bits per second: H (Φ) marginal compression ratio
13
13 Empirical Results: single point 1 hour UMass gateway traces Sept. 22, 2004 to Oct. 23, 2004 1am, 10am, 1pm
14
14 Distributed Network Monitoring single flow recorded by multiple monitors spatial correlation: traces collected at distributed monitors are correlated marginal node view: #bits/sec to represent flows seen by one node, bound on single point compression network system view: #bits/sec to represent flows cross the network, bound on joint compression joint compression ratio: quantify gain of joint compression
15
15 Baseline Joint Entropy Model “perfect” network fixed routes/constant link delay/no packet loss flow classes based on routes flows arrive with rate: # of monitors traversed: #bits per flow record: info. rate at node v: network view info. rate: joint compression ratio:
16
16 Joint Trace Compression Results from synthetic networks
17
17 Open Issues how many more bits for network characteristics variable delay/loss/route change distributed compression algorithms lossless v.s. lossy joint routing and compression in trace aggregation
18
18 Future work develop compression algorithms single point compression distributed joint compression different levels of details full packet traces Netflow data SNMP data entropy based applications network monitor placement network anomaly detection
19
19 Questions & Comments ???
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.