Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

Similar presentations


Presentation on theme: "1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel."— Presentation transcript:

1 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel

2 2 Outline  network monitoring/measurement  information theory & compression  single point trace compression  joint network trace compression  future work

3 3 Motivation  service providers, service users  monitoring  anomaly detection  debugging  traffic engineering  pricing, peering, service level agreements  architecture design  application design

4 4 Network of Network Sensors  network monitoring: sensing a network  embedded vs. exogenous  single point vs. distributed  different granularities  full traffic trace: packet headers  flow level record: timing, volume  summary statistics: byte/packet counts  challenges  growing scales: high speed link, large topology  constrained resources: processing, storage, transmission  30G headers/hour at UMass gateway  solutions  sampling: temporal/spatial  compression: marginal/distributed

5 5 Entropy & Compression (I)  Shannon entropy of discrete r.v.  compression of i.i.d. sequence by source coding  coding:  expected code length:  info. theoretic bound:  Shannon/Huffman coding  assign short codeword to frequent outcome  achieve the H(X) bound

6 6 Entropy & Compression (II)  joint entropy  entropy rate of stochastic process   exploit auto-correlation  lower bound on # bits per sample of X  compression ratio: H(X)/M, M – original size per sample  Lempel-Ziv Coding  asymptotically achieve entropy rate of stationary process  universal data compression algorithms: LZ77, gzip, winzip

7 7 Entropy & Compression (III)  joint entropy rate of set of stochastic processes  joint data compression  exploit cross-correlation between sources  joint compression ratio  Slepian-Wolf Coding  distributed compression: encode each process individually, achieve joint entropy rate in limit  require knowledge of cross-correlation structure

8 8 Network Trace Compression  naïve way: treat as byte stream, compress by generic tools  gzip compress UMass traces by a factor of 2  network traces are highly structured data  multiple fields per packet diversity in information richness correlation among fields  multiple packets per flow packets within a flow share information temporal correlation  multiple monitors traversed by a flow most fields unchanged within the network spatial correlation  network trace models  quantify information content of network traces  serves as lower bounds/guidelines for compression algorithms

9 9 Packet Header Trace source IP address destination IP address data sequence number acknowledgment number time stamp (sec.) time stamp (sub-sec.) total lengthToSvers.HLen IPIDflags TTLprotocolheader checksum destination portsource port window sizeHlen fragment offset TCP flags urgent pointerchecksum Timing IP Header TCP Header 01631

10 10 Header Field Entropy source IP address destination IP address data sequence number acknowledgment number time stamp (sec.) time stamp (sub-sec.) total lengthToSvers.HLen IPIDflags TTLprotocolheader checksum destination portsource port window sizeHlen fragment offset TCP flags urgent pointerchecksum Timing IP Header TCP Header 01631 Θ: flow id T: Time

11 11 Single Point Compression T0 Θ0Θ0 T1 Θ1Θ1 T3 Θ0Θ0 Tn ΘnΘn Tm Θ0Θ0  temporal correlation introduced by flows  packets from same flow closely spaced in time  they share header information  packet inter-arrival: # bits per packet: T0 Θ0Θ0 T3 Θ0Θ0 Tm Θ0Θ0  flow based trace:  flow record: Θ0Θ0 KT0 flow ID flow size arrival time packet inter-arrival  # bits per flow id: H( Θ )

12 12 Flow Level Model  Poisson flow arrival rate:  flow inter-arrival: independent packet inter-arrival: K – flow length  # bits per flow:  # bits per second:   H (Φ)  marginal compression ratio

13 13 Empirical Results: single point  1 hour UMass gateway traces  Sept. 22, 2004 to Oct. 23, 2004  1am, 10am, 1pm

14 14 Distributed Network Monitoring  single flow recorded by multiple monitors  spatial correlation: traces collected at distributed monitors are correlated  marginal node view: #bits/sec to represent flows seen by one node, bound on single point compression  network system view: #bits/sec to represent flows cross the network, bound on joint compression  joint compression ratio: quantify gain of joint compression

15 15 Baseline Joint Entropy Model  “perfect” network  fixed routes/constant link delay/no packet loss  flow classes based on routes  flows arrive with rate:  # of monitors traversed:  #bits per flow record:  info. rate at node v:  network view info. rate:  joint compression ratio:

16 16 Joint Trace Compression  Results from synthetic networks

17 17 Open Issues  how many more bits for network characteristics  variable delay/loss/route change  distributed compression algorithms  lossless v.s. lossy  joint routing and compression in trace aggregation

18 18 Future work  develop compression algorithms  single point compression  distributed joint compression  different levels of details  full packet traces  Netflow data  SNMP data  entropy based applications  network monitor placement  network anomaly detection

19 19 Questions & Comments  ???


Download ppt "1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel."

Similar presentations


Ads by Google