Download presentation
Presentation is loading. Please wait.
Published byLoren Ball Modified over 6 years ago
1
Statistical Identification of Encrypted Web-Browsing Traffic
Qixiang Sun Stanford University Daniel R. Simon, Yi-Min Wang, Wilf Russell, Venkata N. Padmanabhan, Lili Qiu Microsoft Research
2
Outline Motivation & Problem Intuition Hypothetical Attacker
Attacker’s Success Rate Countermeasures Conclusion
3
Anonymous Web Browsing
Protect personal information from Attacker’s Inference Medical (Online support group) Questionable Activities Question: Is this REALLY anonymous? R1 R2 R3 R4
4
What’s Different? In anonymous Web browsing Implication:
The chain of routers are used for both sending and receiving data Can link HTTP requests and responses! The target Web pages are publicly accessible Responses are known! Implication: The first link/router is an exploitable weakness.
5
What Information is Available?
HTTP Get Response Browser 1st Router Number of objects Object sizes Ordering of the objects Delay between packets R1 R2 R3 R4
6
Intuition Number of objects and object sizes are sufficient to identify a Web page! On average, a Web page has 11 objects with each object yielding 8.4 bits of information 8.4*11 – log2(11!) 67 bits 1020 possibilities!! Currently, there are about 109 Web pages
7
An Hypothetical Attacker
List of target Sensitive sites URLs Programmatic Access to URL & Traffic recording Traffic pattern Construction & Database update Traffic Pattern Database R1 Traffic recording & Pattern construction Traffic Pattern Browser History Similarity scores Calculation Decision module Negative Positive
8
Guts of the Pattern Matching
Given two multisets of object sizes S1 and S2 Sim(S1, S2) = S1 S2 / S1 S2 Decision module uses an absolute threshold. Traffic Pattern Database Similarity scores Calculation Decision module For example: S1 = {3KB, 3KB, 5KB} S2 = {3KB, 5KB, 5KB} Sim(S1, S2) = = 0.5 | {3KB, 5KB} | | {3KB, 3KB, 5KB, 5KB} |
9
Experiment Setup Approximately 100,000 Web pages in total (URLs obtained from the Open Directory Project). The hypothetical attacker chooses about 2200 pages as target pages. Goal: Can these 2200 pages be identified without causing many false positives?
10
What is a Success and Failure?
Successful Identification: A target page passes the similarity threshold and is not confused with other pages in the target set. False Positive: A non-target page is incorrectly identified as one of the target pages. Potential False Positive: A page passes the similarity threshold when compared with a single selected target page.
11
Attacker’s Success Rate
A threshold of 0.5 is sufficient. 80.4% Is this small enough? 2.1%
12
A Detailed Look Inside False-positives are NOT generated uniformly!
HTTP 404s Common-looking pages 0-identifiable pages
13
Dynamism in Web Pages Most pages are relatively static
One-day-old pattern database is sufficient
14
Countermeasures Padding Morphing Mimicking Individual objects
Add random-sized objects Morphing Pipelining the HTTP GET requests Pre-fetching Mimicking Common templates or Web-hosting services
15
Padding Object Size Linear – Nearest multiple of padding size
Exponential – Nearest power of 2
16
Padding Random Objects
17
Two-chunk Pipelining Approximately 36% of the target pages are 0-identifiable. Very close to the theoretical limit of 1/e (assuming traffic patterns are random) Implication: Can harness the total entropy in the Web page traffic patterns.
18
One-chunk Pipelining
19
Conclusion Encrypted Web browsing can be identified by
the target page’s “unique” traffic pattern.
21
Linear Padding
22
Exponential Padding
23
Pad Random Objects
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.