Modeling Network Traffic as Images Seong Soo Kim and A. L. Narasimha Reddy Computer Engineering Department of Electrical Engineering Texas A&M University {skim,
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Contents Introduction and Motivation Network Traffic as Images -Visual Representation Requirements for Representing Network Traffic as Images -Sampling Rates -Visual modeling Network Traffic as Images normal traffic, semi-random attacks, random attacks Image Processing for Network Traffic -Validity of intra-frame DCT -Inter-frame differential coding Conclusion
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Contents Introduction and Motivation Network Traffic as Images -Visual Representation Requirements for Representing Network Traffic as Images -Sampling Rates -Visual modeling Network Traffic as Images normal traffic, semi-random attacks, random attacks Image Processing for Network Traffic -Validity of intra-frame DCT -Inter-frame differential coding Conclusion
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Attack/ Anomaly Bandwidth attacks/anomalies, Flash crowds DoS – Denial of Service : –UDP flooding, TCP SYN flooding, ICMP flooding Typical Types: -Single attacker (DoS) -Multiple Attackers (DDoS) -Multiple Victims (Worm) Aggregate Packet header data as signals Signal/image based anomaly/attack detectors
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Motivation (1) Previous studies looked at individual flow’s behavior -Partial state -RED-PD These become ineffective with DDoS Aggregate Link speeds are increasing -currently at G b/s, soon to be at 10~100 G b/s Need simple, effective mechanisms to implement at line speeds. Look at aggregate information of traffic -Use sampling to reduce the cost of processing Process aggregate data to detect anomalies.
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Motivation (2) Signature (rule)-based approaches are tailored to known attacks – Look for packets with port number #1434 (SQL Slammer) -Become ineffective when traffic patterns or attacks change New threats are constantly emerging Do not want to rely on attack specific information Most current monitoring/policing tools are done off-line -Flowscan, FlowAnalyzer, AutoFocus Quick identification of network anomalies is necessary to contain threat Can we design generic (and generalized) mechanisms for attack detection and containment? Measurement (network)-based real-time detection
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Contents Introduction and Motivation Network Traffic as Images -Visual Representation Requirements for Representing Network Traffic as Images -Sampling Rates -Visual modeling Network Traffic as Images normal traffic, semi-random attacks, random attacks Image Processing for Network Traffic -Validity of intra-frame DCT -Inter-frame differential coding Conclusion
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Packet Header Carry a rich set of information -Data : Packet counts, Byte counts, Number of Flows -Domain : source/destination Address, source/destination Port numbers, Protocol numbers Image/Video can represent each data in each domain Image processing/Video analysis decipher the patterns of traffic -single multiple (Worm) : horizontal lines -multiple single (DDoS) : vertical lines
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Domain size Reduction(1) Header fields may have large domain spaces –IPv4 addresses 2 32, IPv6 addresses 2 64 Need to minimize storage and processing complexity for real-time processing Employ “domain folding” For example: A data structure of a 2 dimensional array count[i][j] -To record the packet count for the address j in i th field of the IP address Effects -32-bit address into four 8-bit fields -Smaller memory 2 32 (4G) 4*256 (1K) -Running time O(n) to O(lgn) -Form of hashing -Advantages -It is possible to reverse the hashing to identify the target IP address restrictively
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Simple example IP 1 = , No. of Flows = 3 IP 2 = , No. of Flows = 2 IP 3 = , No. of Flows = 1 IP 4 = , No. of Flows = 10 IP 5 = , No. of Flows = 2 Data structure for reducing domain size (2)
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Simple example IP 1 = , No. of Flows = 3 IP 2 = , No. of Flows = 2 IP 3 = , No. of Flows = 1 IP 4 = , No. of Flows = 10 IP 5 = , No. of Flows = Data structure for reducing domain size (2)
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Visual Representation
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Contents Introduction and Motivation Network Traffic as Images -Visual Representation Requirements for Representing Network Traffic as Images -Sampling Rates -Visual modeling Network Traffic as Images normal traffic, semi-random attacks, random attacks Image Processing for Network Traffic -Validity of intra-frame DCT -Inter-frame differential coding Conclusion
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Image based analysis Generating useful signals based on traffic image Treat the traffic data as images Apply image processing based analysis Enables applying image/video processing for the analysis of network traffic. –Some attacks become clearly visible to the human eye. –Video compression techniques lead to data reduction –Scene change analysis leads to anomaly detection –Motion prediction leads to attack prediction –Pattern recognition leads to anomaly identification
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Sampling Rates –For discriminating current traffic situation based on stationary property, we should select a sampling frequency for deriving the most stable images –The periodicity of traffic Impacts of Design Factors for presenting Network traffic as Images (1)
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Impacts of Design Factors for presenting Network traffic as Images (2) Sampling Rates –The traffic is stationary in normal times and the selection of sampling period is not crucial. –The traffic changes dynamically with time in attack times and the sampling period is a crucial factor. –30 ~ 120 sec. sampling.
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Flow-based Network Traffic Images The number of flows based visual representation –The number of flows in (source/destination) address domain –The black dots/lines illustrate more concentrated traffic intensity. –An analysis is effective for revealing flood types of attacks Image reveals the characteristics of traffic –Normal behavior mode –A single target (DoS) –Semi-random target : a subnet is fixed and other portion of address is changed (Prefix-based attacks) –Random target : horizontal (Worm) and vertical scan (DDoS)
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Network traffic as images – normal network traffic Standard deviation of most significant DCT coefficients of images –energy distribution of number of flows over address domain. At normal traffic state, this signal is at a middle level between later two anomalous cases. Legitimate flows do not form any regular shape due to their random distribution over address domain.
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Network traffic as images – semi-random targeted attacks The difference between attackers (or victims) and legitimate users is remarkable –higher variance than normal traffic The specific area of data structure is shown in a darker shade. –traffic is concentrated on a (aggregated) single destination or a subnet.
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Network traffic as images –random targeted attacks Worm propagation type attack DDoS propagation type attack All of the addresses are exploited in hostscans attacks –Uniform intensity low variances Whole region of the image in uniform intensity. Horizontal/vertical lines indicate anomalies in 2D image Random (sequential, dictionary scan) attacks -Horizontal scan : From the same source aimed at multiple targets -- Worm propagation -Vertical scan : From several machines (in a subnet) to a single destination -- DDOS
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Summary of Visual representation of traffic data Worm attacks – horizontal line in 2D image DDoS attacks – vertical line in 2D image Line detection algorithm Visual images look different in different traffic modes Motion prediction can lead to attack prediction
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Contents Introduction and Motivation Network Traffic as Images -Visual Representation Requirements for Representing Network Traffic as Images -Sampling Rates -Visual modeling Network Traffic as Images normal traffic, semi-random attacks, random attacks Image Processing for Network Traffic -Validity of intra-frame DCT -Inter-frame differential coding Conclusion
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Generation of useful Signal Scene change analysis - DCT We can apply various image processing techniques From generated images, we can generate useful signals through DCT (Discrete Cosine Transform) DCT is effective for storage reduction and approximation of the energy distribution in image Variance of leading DCT coefficients in 8-by-8 blocks Instead of whole DCT coefficients, we can choose only the dominant coefficient
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Impact of Selecting DCT coefficients (1) TCG ( G T ) : Transformation Coding Gain –TCG measures the amount of energy packed in the low frequency (leading) coefficient –The higher TCG leads to smaller intra-frame MSE and higher compression
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Intra_frame DCT –Random traffic can be packed within fewer coefficients than semi- random traffic –Using inter-frame differential coding,we can improve the G T –For MSE of , the required coefficients reduce from 42 to 3 –TCG increases 2.6 times Impacts of Selecting DCT coefficients (2)
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Impacts of Design Factors for presenting Network traffic as Images Sampling rates on DCT coefficients –A sampling rate of 60 seconds maintains the minimum intra- frame MSE over the entire range of retained DCT coefficients -We can choose 30 ~ 120 sec. as appropriate sampling period.
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Attack Estimation (1) - Motion prediction Step 1: complexity reduction –Pixels below a mean packet count –Normalized absolute difference similarity Step 2: to find a block of addresses
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Attack Estimation (2) - Motion prediction Step 3: to calculate the quantitative components –Starting position –Motion vector Step 4: compensating errors
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Advantages Not looking for specific known attacks Generic mechanism Works in real-time –Latencies of a few samples –Simple enough to be implemented inline
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Contents Introduction and Motivation Network Traffic as Images -Visual Representation Requirements for Representing Network Traffic as Images -Sampling Rates -Visual modeling Network Traffic as Images normal traffic, semi-random attacks, random attacks Image Processing for Network Traffic -Validity of intra-frame DCT -Inter-frame differential coding Conclusion
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Conclusion We studied the feasibility of analyzing packet header data through Image and DCT analysis for detecting traffic anomalies. We evaluated the effectiveness of our approach by employing network traffic. Can rely on many tools from signal/image processing area –More robust offline analysis possible –Concise for logging and playback Real-time resource accounting is feasible Real-time traffic monitoring is feasible –Simple enough to be implemented inline
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Thank you !!
Seong Soo Kim and A. L. Narasimha ReddyTexas A & M University ICC Processing and memory complexity Two samples of packet header data 2*P, P is the size of the sample data Summary information (DCT coefficients etc.) over samples S Total space requirement O(P+S) P is 2 32 4*256 = 1024 (1D), 2 64 256K (2D) S is 32*32 16 Memory requires 258K Processing O(P+S) Update 4 counters per domain Per-packet data-plane cost low.