Presentation is loading. Please wait.

Presentation is loading. Please wait.

A research work by: Charles V. Wright, Scott E. Coull, Fabian Monrose

Similar presentations


Presentation on theme: "A research work by: Charles V. Wright, Scott E. Coull, Fabian Monrose"— Presentation transcript:

1 A research work by: Charles V. Wright, Scott E. Coull, Fabian Monrose
Traffic Morphing: An Efficient Defense Against Statistical Traffic Analysis A research work by: Charles V. Wright, Scott E. Coull, Fabian Monrose Presenter: Chandra Bhatt CSCE715 Mar 15th 2017

2 Agenda Introduction Why Traffic Morphing? Related Works
Morphing Matrix Convex Optimization technique Morphing Constraints Potential Pitfalls Proposal Evaluation In context of Encrypted voice over IP In context of Web page identification Conclusion

3 Introduction Encrypted traffic properties such as ‘Packet Sizes’ and ‘Timing’ can reveal significant information about the traffic contents. The research work proposes a novel method for obstructing traffic analysis algorithms by optimally morphing one class of traffic to look like another class. Use of techniques such as Convex optimization to modify data packets in order to reduce the accuracy of traffic classifiers is presented. As we all know network traffic analysis in increasingly common means of identifying security threats. Encryption doesn’t greatly help with it comes to traffic analysis techniques. Identities of web pages can be inferred by examining sizes of packets in encrypted connections. Breaches user privacy. Adversaries get to know about user activities, web page visits and several other personal information. Case of VoIP(Voice over Internet Protocol) where traffic analysis can reveal languages spoken.

4 Introduction Traffic classifiers: These are the algorithms which detect/categorize the traffic contents on basis of their packet size , source and destination variables. Padding technique: This is a conventional morphing technique that aims to maintain uniformity in packet sizing and maintaining anonymity of the data packets. This technique is not efficient since it incurs immense amounts of memory overheads. Padding technique also hampers the efficiency and performance of the underlying network protocols. Increases overheads. Padding data packet to Max. transmission unit(MTU), doubles the amount of data sent. Performance of the encrypted network is always preferred over privacy in practical applications. This paper proposes a new method of optimally balancing privacy and efficiency with the use of mathematical programming/optimisation methods i.e. Convex Optimization.

5 Why Traffic Morphing? If a user over the external or internal network accesses the web- page say ‘WebMD’ for any of his personal health related needs, adversaries could track down user activity and breach the user privacy. Traffic morphing technique modifies the data packet properties over the network as desired. Using morphing technique, user would modify the data packets which will closely resemble data packets of say Espn.com. Such a modification would mislead adversaries that user actually intends to open Espn.com. Here the user tries to download a webpage WebMD.com which is morphed and portrayed to the adversary as ESPN.com in traffic analysis.

6 Related Works One of the related research on examining network traffic suggests that inter-arrival times of the packets in SSH connections can be used to infer information about user’s keystrokes. This can efficiently track down login passwords. Polymorphic Blending: This malicious technique enables the attackers to evade detection by altering the byte sequences to look like normal, non-malicious packets. In contrast to all these related works, proposed traffic morphing technique will be able to handle arbitrary input traffic. Especially the real time streaming traffic that encodes input from the user.

7 Traffic Morphing Traffic Morphing aims to closely resemble source webpage to the target webpage which we intend to portray(for desired features). Modification of packets by padding to increase their sizes or even large packets sliced to maintain the anonymity induces immense overheads. The main aim of traffic morphing is to provide data security without incurring overheads. For each pair of source and target processes, user applies optimization techniques to eventually derive a Morphing Matrix which has least overheads and optimal efficiency. By packet slicing or padding. Minimal overhead in confusing the classifier. This paper discusses efficiency of morphing against two traffic classifiers: VoIP language classifier and Web Page classifier. Morphing immensely reduces the accuracy of these classifiers. Efficient traffic morphing technique must be able to handle arbitrary traffic input.

8 Morphing Matrix Morphing matrices guides the alteration required in the source process to match it with the target process. Matrices are generated offline which acts as a proxy between applications and the network stack. According to the matrix, the data packets are padded or split into several smaller packets. The altered data is then sent to the network stack and continues its normal transfer over the network. Matrix A gives the probability of morphing packet of size Sj to size Si denoted as aij. Convex Optimisation technique can be used to find matrix A that morphs the source process with minimizing the cost function(no. of bytes overhead). After sampling of packet sizes, algorithm pads the zeros if Si > Sj or splits the data into multiple packets if Si < Sj. The only operation that adds latency to the process is generating random numbers for sampling input packets.

9 Morphing Matrix contd.. To morph X to Y, user must find a positive (n*n) matrix, i.e. A=[aij] such that Y= AX. Where X and Y are column matrices of ‘n’ packet sizes. A sample of n packets labelled as X = [x1, x2, , xn]T, has been assumed where xi is the probability of the ith largest packet size Target distribution is assumed as Y = [y1, y2, , yn]T A is considered as a Morphing Matrix and each column of A is a valid probability mass function over the n packet sizes A = [aij], where Y=AX. To sample the target size, the algorithm first sums the probabilities into a cumulative distribution function such that the cumulative probability of target size si is equal to the sum of the probabilities for all sizes ≤ si . Then, the algorithm runs a pseudorandom number generator to get a random number r ∈ [0, 1], and selects first target size with a cumulative probability .

10 Convex Optimization From A matrix we have ‘n2 ‘ unknowns.
Y=AX where A is a Probability Mass Function represented as and defined as shown below. These both equations correspond to ‘n’ equations each resulting to a total of 2n equations. The below two equations are the two equality constraints which ensures that the matrix A is a valid Morphing Matrix.

11 Convex Optimization contd..
The goal is to minimize cost function f0(A) Subject to Solving convex optimization functions Example: Overall cost matrix A defined as: The optimization problem can be represented as: F0 is the expected number of additional bytes that a user must transmit while using the morphing matrix A.

12 Additional Morphing Constraints
A potential pitfall of over specifying the morphing using more constraints is that the user runs the risk of getting a no solution to the matrix equation. Convex optimization serves the purpose of optimally balancing the efficiency of morphing along with satisfying the specified constraints. The another solution for optimal morphing is using Multi-level Programming methods.

13 Handling Additional Constraints
Multi-level programming can efficiently handle over specified morphing requirements deriving appropriate Morphing matrix. In Multi-level programming a matrix (A’) is used to first create morphed distribution which is close to the target distribution and satisfies all the constraints, then another matrix (A) is used to minimize the amount of overheads. With the use of multi-level programming, an optimized matrix for any number of Cost Functions satisfying several constraints can be efficiently derived. Mostly Convex Optimization technique is used to derive A’.

14 Problem with Large Sample Spaces
The complexity of determining the morphing matrix immensely increases when value of ‘n’ is high. Growth of constraints incorporated with the large value of ‘n’ can lead to the morphing failure since these large values are unmanageable by the algorithms. Divide and Conquer strategy: This involves handling of large sample spaces by finding morphing matrices for sub-spaces rather than finding a single global morphing matrix. This process can recursively be applied to handle any amount of large sample spaces. Since the value of n*n in matrix A has a very large value.

15 Potential Pitfalls Potential pitfalls which can be encountered when morphing techniques are applied to a broad class of network traffic includes: Short Network Sessions Variation in Source Distribution Reducing Packet Sizes There are protocols that are not guaranteed to generate a sufficient number of packets (e.g. Http where web pages have relatively low number of packets). In cases of low packet rates, deterministic padding is a suggested technique instead of morphing. For many network protocols, such as streaming video, Voice over IP, or file sharing, it is safe to assume that the number of packets generated willbe quite large. VoIP, for instance, can generate hundreds of packets in only a few seconds.

16 Potential Pitfalls contd..
Morphing techniques only produces the correct target distribution if the distribution closely resembles the one which was used to create the morphing matrix. Variation in source distributions naturally affects the resultant morphed traffic. The another pitfall is when there are large data packets and need to be sliced into multiple small packets. Slicing into smaller packets matches the target distribution but deteriorates the quality/degree of resolution of these data packets.

17 Proposal Evaluation Proposed morphing technique is evaluated against two traffic classifiers: Encrypted Voice over IP. Web Page Identification. The evaluation focuses on testing the efficiency of morphing techniques to deceive a binary classifier. The proposed morphing algorithm proves to be successful in misleading the binary classifiers in web-page identification.

18 Encrypted Voice over IP
In the encrypted voice over IP classifiers, the observer in the network could identify the language spoken in connections. The protocol uses 22 languages which have 9 distinct packet sizes. The White box morphing includes all modifications to packet size which occur after a bit rate has been selected for the packet but before the actual compression has been performed. The size of the packet can be increased of decreased by modifying the encoder’s bit rate. In the white box morphing, all modifications to packet size occur in the codec after a bit rate has been selected for the packet but before the actual compression has been performed

19 Encrypted Voice over IP contd..
Black Box morphing is performed after the compression has completed but before the packet is encrypted. We can only increase the size of the packet by padding, and also we must solve for morphing matrices with constraint aij =0 for si<sj. Defeating the original classifier: The classifiers detection accuracy is significantly reduced in case of morphed traffic to about 45%. The white box morphing model achieves larger accuracy reduction than the black box morphing model for the bigram and trigram classifiers in Voice over IP. In fact, white box morphing offers nearly the same privacy provided by padding all packets to the same size, but with less than half as much overhead. Black box morphing, on the other hand, is less effective at providing indistinguishability, though the overhead is substantially lower than padding

20 Encrypted Voice over IP contd..
Evaluating Indistinguishability: Here the traffic morphing technique is evaluated for the classifiers which are aware of the morphing and its working. The classifiers in the evaluation have been trained with these morphing algorithms, source languages which have been morphed to resemble target distributions. With the use of knowledge of morphing technique and a higher order model than that used by the morpher, the trigram classifier is able to recognize languages in morphed traffic as accurately as unmodified data.

21 The White box morphing includes all modifications to packet size which occur after a bit rate has been selected for the packet but before the actual compression has been performed.

22 Black Box morphing is performed after the compression has completed but before the packet is encrypted.

23 Web Page Identification
Web Page Identification classifiers: According to a web page classifier, it is possible to accurately identify a web page using only the size of the packets and the direction in which they travelled. Padding technique tends to be futile for these classifiers since its inefficient to limit the amount of information available in a data packet. For each (direction, size) entry, classifier keeps a count of number of packets of that type sent during a single download, called an Instance. Several such instances are used to train naïve classifiers.

24 Web Page Identification contd..
Defeating the original classifier: The web page identification classifier achieves an accuracy of 98.4% on unaltered data. The accuracy is reduced to just 4.5% by the proposed morphing method. Evaluating Indistinguishability : Evaluation is done using web page classifier which is trained on both the morphed data as well as the unaltered data. Proposed method achieves 63.4% accuracy with 39.9% overheads. While, the padding technique for the same achieves 86.2% accuracy and % overheads.

25 Web Page Identification contd..
The direction of the packets provides a useful information to detect the web pages locating the client and server. Proposed morphing technique morphs both sides of the connection independently and provides a robust security solution from the classifiers.

26 Conclusion Traffic morphing, chooses the best way to alter the features of a packet optimally balancing the privacy provided to the user with the amount of overhead incurred. Proposed technique ensures minimal overheads in terms of latency and bytes of additional data sent. Privacy and efficiency are balanced through the use of convex optimization techniques. Works in real-time and makes no assumptions on the traffic classifier. The process is also evaluated against several traffic classifiers that use packet sizes as a feature.

27 Thank You.


Download ppt "A research work by: Charles V. Wright, Scott E. Coull, Fabian Monrose"

Similar presentations


Ads by Google