Side-Channel Leaks in Web Applications: a Reality Today, a Challenge Tomorrow By Kalyan Manda Chang Seok Bae
INTRODUCTION Network SaaS -> Software as a Service
Side-channel information leaks To protect the information in critical applications against network sniffing, a common practice is to encrypt their network traffic. However, as discovered in our research, serious information leaks are still a reality. Even though the communications generated during these state transitions are protected by HTTPS, their observable attributes, such as packet sizes and timings, can still give away the information about the user’s selection.
Contributions Analysis of the side-channel weakness in web applications. We present a model to analyze the side-channel weakness in web applications and attribute the problem to prominent design features of these applications. Show concrete vulnerabilities in several high-profile and really popular web applications, which disclose different types of sensitive information through various application features. These studies lead to the conclusion that the side-channel information leaks are likely to be fundamental to web applications. In-depth study on the challenges in mitigating the threat. We evaluated the effectiveness and the overhead of common mitigation techniques such as packet padding. Our research shows that effective solutions to the side-channel problem have to be application-specific, relying on an in-depth understanding of the application being protected. This suggests the necessity of a significant improvement of the current practice for developing web applications.
Related Work In the context of encrypted communications, it has been shown that the side-channel information, such as packet timing and sizes, allows a network eavesdropper to break: – cryptographic systems or infer keystrokes in SSH, – spoken phrases in VoIP – movie titles in video-streaming systems – the attacker can fingerprint web pages by their side- channel characteristics, then eavesdrop on the victim user’s encrypted traffic to identify which web pages the user visits.
MOTIVATION Our work is motivated by these anonymity studies, but is different in a number of major aspects: (1) our study focuses on web applications and the sensitive user data leaked out from them, rather than the identifiability of individual web pages; (2) application state-transitions and semantics are the focal point of our analyses, while the prior studies are agnostic to them; (3) our target audience is the developers of sensitive web applications, while the natural audience of the web-anonymity research is the providers of anonymity channels, as their objective is directly confronted by the anonymity issue studied in the prior research.
FUNDAMENTALS OF WEB APPLICATION INFORMATION LEAKS Ambiguity Set Reduction Process: If the ambiguity-set (U)can be reduced to 1/ ℜ of its original size, we say that log 2 ℜ bits of entropy of the data are lost.
Model Abstraction A web application can be modeled as a quintuple (S, ∑, ∂, f, V): S -> set of program states that describe application data both on browser and server. ∑ -> set of inputs the application accepts. ∂: S x ∑ -> S -> models a state transition from one state to another driven by the input the former receives. f: S x ∑ -> V f -> the observable attributes such as packet sizes, number of packets used to characterize the original state and its inputs. V -> set of web flow vectors that describe the observable characteristics of the encrypted traffic.
Inference of Sensitive Points The attacker is able to identify one of the k/(αβ) input sets, which the actual input belongs to.
Threat Analysis over Web Application Properties Some prominent features of application design making this threat realistic are: Low entropy inputs for better interactions. Stateful Communications. Significant traffic distinctions.
Significant traffic distinctions
Actual Information Leaks in High Profile Applications - OnlineHealth 1) Add Health Records:
Input by mouse selecting – a caveat of hierarchical organization of user choices
Find a Doctor Selection from the drop- down list gives a very- low-entropy input: there are only 94 specialties. We tested all the specialties in “south bend, IN”, and found that x was within [596, 1660], i.e., density = 0.089, and every specialty is uniquely identifiable.
Remaining Topics Remaining leaks cases in applications – OnlineTax – OnlineInvest – Online Search Mitigating side-channel treats – Evaluation of mitigation efforts – Impacts on application development process
OnlineTax Leaks Tax-preparation wizard – calculate Adjusted Gross Income Inference – High entropy – Facts User’s eligibilities for credit/deduction, which can be identified form tax laws Such eligibilities affect state transition
OnlineTax: Symmetric Path Based on the value of b, transition can be determined to one of three possible states According to tax law, taxpayer’s AGI falls into one of three ranges
OnlineTax: Asymmetric Path If user is partially of fully qualified, the path of state transitions will be longer than that for those not eligible
Graphical visualization of data Mutual fund list – Low entropy – Size of chart and detail pages are distinct (type d=0.044, detailed page d=0.010) OnlineInvest Leak
OnlineInvest: Fund Allocation Higher entrophy –State transitions over a multiple-day period can give significant reduction power
OnlineInvest: Fund Allocation Algorithm – Function to calculates today’s percentage, given yesterday’s allocation Lossless compression algorithm in GIF give same compression data Experimental result – Given initial ambiguity set contains – Pie chart size reduces by more-than-one order of magnitude (4 different days)
Query Word Leaks Web flow vector – Auto suggestions are easy to identify Ambiguity set size – Possible query words are huge, but number of guesses are only number of queries Caching effect – Missing character due to caching brings only 2- letter combinations, 26x26 – In some cases of Google, caching disappears
See if problem can be solved without analyzing individual application Application-agnostic manner: Padding – Rounding: – Random padding: – Average overhead: – Given, reduction power being calculated after pdding Universal Mitigation Policies
Mitigation: OnlineHealth Leaks Rounding Mitigation incurs a large overhead, one third of the bandwidth Still, cannot fully subdue the information leak (mouse- selection) Leveraged information can achieve bigger reduction power (probabilistic correlations among conditions)
Mitigation: OnlineTax Leaks Rounding Superfluous packets, 21074% overhead Merge multiple state to be indistinguishable Remaining 7 identifiable ranges, due to asymmetric path
Mitigation: Search Engine Leaks Hard to mitigate – Huge-volume search traffic cost of mitigation to entire traffic unsustainably high – Application scenarios Proxy vs. non-proxy network – Proxy requires traffic to be decompressed Incur considerations – Random padding – When to pad (before or after the compression) – Where to pad (header or body)
Two leak problems cannot be addressed without padding Random padding comparison Each generating random numbers, in case 7 above, degrade to Enforcement difficulty Price history charts are fetched from public website Mitigation: OnlineInvest Leaks
Application Development Practice Identifying Vulnerabilities – Stateful communication with user input or stored data – Information flow and background knowledge Specifying Mitigation Policies – Necessary collaborative effort Vendors of browsers and web servers Communicate to protocol layer about policies
Policy specification and enforcement Preliminary Infrastructure
Future Development Process
Conclusion Web application – Low entropy input, stateful communication, and significant traffic distinction Mitigation policy have to be application specific – Identification of vulnerabilities – Specify mitigation policy Application semantics, Information flow, network traffic pattern, and public domain knowledge