Practical Censorship Evasion Leveraging Content Delivery Networks Presented by Nicole Hippolite
Motivation Internet censorship Censorship circumvention Limits access to websites Censorship circumvention Tor: onion routing Psiphon: VPN SSH and HTTP proxies VPN’s: Give access to private networks as if you were connected directly All above circumvention systems use proxies and can be detected and blocked by censors CDNBrowsing: Makes use of CDN platforms share a set of IP addresses Can still be detected by censors and blocked Website fingerprinting attacks on CDN Browsers Allows Censors to block circumvention attempts
Background –Content Delivery Network
Background CDNBrowsing CDNBrowsing Advantages IP filtering is no use Can only browse content hosted on CDNs No Proxies, just edge servers Location of edge servers CDNBrowsing Advantages Better QoS Lower cost of operation Better sustainability Doesn’t rely on third party entities to run proxies Ease of deployment CDN: Content delivery network
Background Circumvention detection methods IP filtering (doesn't usually work on CDNBrowsing) DNS interference (arbitrary edge browser to host web page mitigates this) Deep Packet Inspection Keyword/URL filtering Generally stopped by HTTPS encryption but we will talk about leakage in the next slide
Problem Content publishers need to delegate TLS certificates to edge servers Destination leakage in CDN HTTPS deployment leading to detection Shared TLS certificates CDN Domain certificates are when CDN provider obtains a certificate to certify its wildcard domain. CDN customer just needs to use subdomain to publish content SAN (Subject alternative name) certificate is an X.509 extension that allows multiple domain names to be included on a single certificate Individual TLS certificates SNI (server name identification) is an extension to the TLS protocol that allows a web server hosting multiple HTTPS domain names to return an individual TLS cert to a client requesting one of these domains Dedicated IP addresses are set to specific IP addresses to each of their customers allowing them to serve individual certificates for their customer content publishers Leakage: TLS certificates returned by an edge browser may reveal the customers domain name which allows the censors to use DPI SNI field carries the domain name in plain text, so censorship of forbidden domain names works Dedicated IP’s allow censors to identify forbidden CDNBrowsing connections based on the mapping between IP address and the forbidden customers (IP address filtering)
Problem Domain based website fingerprinting Webpages consist of CSS, JavaScript objects and advertisements Browser makes multiple HTTP requests to load a webpage Fingerprint based on the number of packets it exchanges with various domains
Solution: CDNReaper New CDNBrowsing system that Protects against discovery Rserver: local proxy ProxyServer: MITM interception Locally generated/privately stored trusted cert. Resolver: replaced Domain name with IP Scrambler: drops or adds traffic Bootstrapper: obtains CDN browser info Local Database: caches how to deal with connection
Solution CDNReaper applies one or more of the following techniques based on hosting CDN If the shared edge server of the CDN accept HTTPS requests for arbitrary customer websites, ask forbidden content from an arbitrary edge server. Edge server will respond with a CDN wildcard domain certificate If the dedicated edge servers accept HTTPS requests for other customer websites, contact the dedicated IP address of a non forbidden domain to request content for forbidden domain If edge servers allow connections to have empty SNI fields, remove the SNI entry in forbidden HTTPS connections If the edge servers allow non-matching SNI entries, replace a forbidden connections SNI with a non- forbidden domain name
Solution Removing HTTPS leakage Defeats traffic analysis Not one size fits all SNI leakage: CDNReaper replaces forbidden domain names with non forbidden names Deterministic IP addresses: HTTPS website can be accessed through edge servers. CDNReaper picks the edge server IP address TLS leakage: above mitigation also mitigate TLS leakage Defeats traffic analysis Scrambler modifies traffic by injecting decoy requests or removing redundant traffic from other domains
Solution Extend CDNBrowsing Classifying internet sites 6% Class 1: full CDN, protected HTTPS 7% Class 2: Full CDN, leaking HTTPS 15% Class3: full CDN, HTTPS only 64% Class 4: Partial CDN Class 5: private CDN 7% Class 6: non CDN Supporting partial CDN webpages Content wrappers Dynamic mirroring of dynamic non-CDN content Designed “MirrorMySite” User creates Heroku account and enter URL to be mirrored
Solution Supporting private CDN’s Use instances of sibling content publishers who share private CDNs and use the non-censored publishers to access forbidden content publishers
Criticism Overall very good results for censorship circumvention Bootstrapper for CDNReaper ISSUE: There is still 6% of webpages that have no content hosted by shared CDN, or is private CDN, so the unavailable webpage would be stored in Local Database as unavailable, or if there is insufficient information. IMPROVEMENT: Perhaps when that webpage has been accessed a certain number of times, analytics could be sent to them to show how many users they’re losing due to no CDN hosting/lack of information Dropping Traffic ISSUE: Advertisements and analytic requests have little impact on user experience so could be dropped to combat domain based website fingerprinting. This could cause websites to decrease CDN hosting? IMPROVEMENT: The client can modify the list of advertisement and analytics that can’t get dropped so this list should be updated regularly with a lower limit of advertisements that have to stay on the website Available for Chrome and Firefox Future work would be to implement CDNReaper as other browser plugins Comments on the article Limitation of proposed idea and how to overcome it Limitation should be my own, not one that’s already listed in the article
Thank you Questions??