PRIVACY-PRESERVING COLLABORATIVE NETWORK ANOMALY DETECTION Haakon Ringberg.

Slides:



Advertisements
Similar presentations
Intrusion Detection Systems (I) CS 6262 Fall 02. Definitions Intrusion Intrusion A set of actions aimed to compromise the security goals, namely A set.
Advertisements

Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
Detectability of Traffic Anomalies in Two Adjacent Networks Augustin Soule, Haakon Ringberg, Fernando Silveira, Jennifer Rexford, Christophe Diot.
Anomaly Detection Steven M. Bellovin Matsuzaki ‘maz’ Yoshinobu 1.
FLAME: A Flow-level Anomaly Modeling Engine
Nick Duffield, Patrick Haffner, Balachander Krishnamurthy, Haakon Ringberg Rule-Based Anomaly Detection on IP Flows.
Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.
Rule-based Anomaly Detection on IP Flows
BotMiner Guofei Gu, Roberto Perdisci, Junjie Zhang, and Wenke Lee College of Computing, Georgia Institute of Technology.
Building a Peer-to-Peer Anonymizing Network Layer Michael J. Freedman NYU Dept of Computer Science Public Design Workshop September 13,
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
Lesson 18-Internet Architecture. Overview Internet services. Develop a communications architecture. Design a demilitarized zone. Understand network address.
EECS Presentation Web Tap: Intelligent Intrusion Detection Kevin Borders.
The Case for Network-Layer, Peer-to-Peer Anonymization Michael J. Freedman Emil Sit, Josh Cates, Robert Morris MIT Lab for Computer Science IPTPS’02March.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Cryptography and Network Security Third Edition by William Stallings Lecture slides by Lawrie Brown.
Collaborative, Privacy-Preserving Data Aggregation at Scale Michael J. Freedman Princeton University Joint work with: Benny Applebaum, Haakon Ringberg,
Collaborating Against Common Enemies Sachin Katti Balachander Krishnamurthy and Dina Katabi AT&T Labs-Research & MIT CSAIL.
Internet Quarantine: Requirements for Containing Self-Propagating Code David Moore et. al. University of California, San Diego.
BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection F. Tegeler, X. Fu (U Goe), G. Vigna, C. Kruegel (UCSB)
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
BY- NIKHIL TRIPATHI 12MCMB10.  What is a FIREWALL?  Can & Can’t in Firewall perspective  Development of Firewalls  Firewall Architectures  Some Generalization.
Tracking Port Scanners on the IP Backbone Tao Ye Sprint Burlingame, CA Avinash Sridharan University of Southern California.
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 8 – Denial of Service.
Differences between In- and Outbound Internet Backbone Traffic Wolfgang John and Sven Tafvelin Dept. of Computer Science and Engineering Chalmers University.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Network Flow-Based Anomaly Detection of DDoS Attacks Vassilis Chatzigiannakis National Technical University of Athens, Greece TNC.
Software-Defined Networks Jennifer Rexford Princeton University.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
CSCI 530 Lab Intrusion Detection Systems IDS. A collection of techniques and methodologies used to monitor suspicious activities both at the network and.
Connect. Communicate. Collaborate Experiences with tools for network anomaly detection in the GÉANT2 core Maurizio Molina, DANTE COST TMA tech. Seminar.
1 CHAPTER 2 LAWS OF SECURITY. 2 What Are the Laws of Security Client side security doesn’t work Client side security doesn’t work You can’t exchange encryption.
Learning Rules for Anomaly Detection of Hostile Network Traffic Matthew V. Mahoney and Philip K. Chan Florida Institute of Technology.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
Heuristics to Classify Internet Backbone Traffic based on Connection Patterns Wolfgang John and Sven Tafvelin Dept. of Computer Science and Engineering.
Mapping Internet Sensors with Probe Response Attacks Authors: John Bethencourt, Jason Franklin, Mary Vernon Published At: Usenix Security Symposium, 2005.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin In First Workshop on Hot Topics in Understanding Botnets,
1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.
Content-oriented Networking Platform: A Focus on DDoS Countermeasure ( In incremental deployment perspective) Authors: Junho Suh, Hoon-gyu Choi, Wonjun.
CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
Online Identification of Hierarchical Heavy Hitters Yin Zhang Joint work with Sumeet SinghSubhabrata Sen Nick DuffieldCarsten Lund.
Open-Eye Georgios Androulidakis National Technical University of Athens.
Understanding the network level behavior of spammers Published by :Anirudh Ramachandran, Nick Feamster Published in :ACMSIGCOMM 2006 Presented by: Bharat.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
ASTUTE: Detecting a Different Class of Traffic Anomalies Fernando Silveira 1,2, Christophe Diot 1, Nina Taft 3, Ramesh Govindan 4 1 Technicolor 2 UPMC.
Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,
Selective Packet Inspection to Detect DoS Flooding Using Software Defined Networking Author : Tommy Chin Jr., Xenia Mountrouidou, Xiangyang Li and Kaiqi.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Security System for KOREN/APII-Testbed
Mapping Internet Sensor With Probe Response Attacks Authors: John Bethencourt, Jason Franklin, and Mary Vernon. University of Wisconsin, Madison. Usenix.
An Analysis of Using Reflectors for Distributed Denial-of- Service Attacks Paper by Vern Paxson.
Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.
Role Of Network IDS in Network Perimeter Defense.
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
Network Anomaly Detection Using Autonomous System Flow Aggregates Thienne Johnson 1,2 and Loukas Lazos 1 1 Department of Electrical and Computer Engineering.
Cryptography and Network Security
DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.
NetFlow Analyzer Best Practices, Tips, Tricks. Agenda Professional vs Enterprise Edition System Requirements Storage Settings Performance Tuning Configure.
Collaborative, Privacy-Preserving Data Aggregation at Scale
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
Anonymous Communication
Northwestern Lab for Internet and Security Technology (LIST) Yan Chen Department of Computer Science Northwestern University.
Anonymous Communication
Anonymous Communication
When Machine Learning Meets Security – Secure ML or Use ML to Secure sth.? ECE 693.
SPINE: Surveillance protection in the network Elements
Presentation transcript:

PRIVACY-PRESERVING COLLABORATIVE NETWORK ANOMALY DETECTION Haakon Ringberg

Unwanted network traffic Haakon Ringberg 2  Problem  Attacks on resources (e.g., DDoS, malware)  Lost productivity (e.g., instant messaging)  Costs USD billions every year  Goal: detect & diagnose unwanted traffic  Scale to large networks by analyzing summarized data  Greater accuracy via collaboration Protect privacy using cryptography

Network Challenges with detection  Data volume  Some commonly used algorithms analyze IP packet payload info  Infeasible at edge of large networks 3 Haakon Ringberg

Challenges with detection  Data volume  Attacks deliberately mimic normal traffic  e.g., SQL-injection, application-level DoS 1 4 Haakon Ringberg Network I’m not sure about Beasty Let me in! 1 [Srivatsa TWEB ’08], 2 [Jung WWW ’02] Anomaly Detector

Challenges with detection  Data volume  Attacks deliberately mimic normal traffic  e.g., SQL-injection, application-level DoS 1  Is it a DDoS attack or a flash crowd? 2  A single network in isolation may not be able to distinguish 5 Haakon Ringberg 1 [Srivatsa TWEB ’08], 2 [Jung WWW ’02] Network

CNN.com FOX.com Collaborative anomaly detection  “Bad guys tend to be around when bad stuff happens” 6 Haakon Ringberg I’m just not sure about Beasty :-/

Collaborative anomaly detection  “Bad guys tend to be around when bad stuff happens”  Targets (victims) could correlate attacks/attackers 1 7 Haakon Ringberg 1 [Katti IMC ’05], [Allman Hotnets ‘06], [Kannan SRUTI ‘06], [Moore INFOC ‘03] 2 George W. Bush Fool us once, shame on you. Fool us, we can’t get fooled again! “Fool us once, shame on you. Fool us, we can’t get fooled again!” 2 CNN.com FOX.com

Corporations demand privacy  Corporations are reluctant to share sensitive data  Legal constraints  Competitive reasons 8 Haakon Ringberg I don’t want FOX to know my customers CNN.com FOX.com

Common practice Haakon Ringberg 9 AT&T Sprint Every network for themselves!

-like system Greater scalability Provide as a service System architecture Haakon Ringberg 10 AT&T Sprint Collaboration infrastructure For greater accuracy Protects privacy N.B. collaboration could also be performed between stub networks

Dissertation Overview Haakon Ringberg 11 Providing Technologies Venue Collaboration Infrastructure Privacy of participants and suspects Cryptography Submitted ACM CCS ‘09 Detection at a single network Scalable Snort- like IDS system Machine Learning Presented IEEE Infocom ’09 Collaboration Effectiveness Quantifying benefits of coll. Analysis of Measurements To be submitted

Chapter I: scalable signature-based detection at individual networks Work with at&t labs: Nick Duffield Patrick Haffner Balachander Krishnamurthy 12

 Intrusion Detection Systems (IDSes)  Protect the edge of a network  Leverage known signatures of traffic  e.g., Slammer worm packets contain “MS-SQL” (say) in payload  or AOL IM packets use specific TCP ports and application headers 13 IP header TCP header App header Payload Background: packet & rule IDSes Enterprise

A predicate is a boolean function on a packet feature e.g., TCP port = 80 A signature (or rule) is a set of predicates  Leverage existing community  Many rules already exist  CERT, SANS Institute, etc  Classification “for free”  Accurate (?) Benefits 14 Background: packet and rule IDSes

 Too many packets per second  Packet inspection at the edge requires deployment at many interfaces Drawbacks 15 A predicate is a boolean function on a packet feature e.g., TCP port = 80 A signature (or rule) is a set of predicates Network

 Too many packets per second  Packet inspection at the edge requires deployment at many interfaces  DPI (deep-packet inspection) predicates can be computationally expensive Drawbacks 16 Packet has: Port number X, Y, or Z Contains pattern “foo” within the first 20 bytes Contains pattern “bar” within the last 40 bytes A predicate is a boolean function on a packet feature e.g., TCP port = 80 A signature (or rule) is a set of predicates Background: packet and rule IDSes

src IP dst IP src Port dst Port Durat ion # Packets A B5 min36 ……………… Our idea: IDS on IP flows 17 How well can signature-based IDSes be mimicked on IP flows? Efficient Only fixed-offset predicates Flows are more compact Flow collection infrastructure is ubiquitous IP flows capture the concept of a connection

Idea IDSes associate a “label” with every packet 2. An IP flow is associated with a set of packets 3. Our system associates the labels with flows

Snort rule taxonomy 19 Header-onlyMeta- Information Payload dependent Inspect only IP flow header Inexact correspondence Inspect packet payload e.g., port numberse.g., TCP flagse.g., ”contains abc” Relies on features that cannot be exactly reproduced in IP flow realm

Simple translation Our systems associates the labels with flows Simple rule translation would capture only flow predicates Low accuracy or low applicability dst port = MS SQL contains “Slammer” 20 dst port = MS SQL Snort rule: Only flow predicates: Slammer Worm

Machine Learning (ML) Our systems associates the labels with flows Leverage ML to learn mapping from “IP flow space” to label e.g., IP flow space = src port * # packets * flow duration : if raised otherwise src port # packets

Boosting 22 Boosting combines a set of weak learners to create a strong learner h1h1 h2h2 h3h3 H final sign

dst port = MS SQL contains “Slammer” Benefit of Machine Learning (ML)  ML algorithms discover new predicates to capture rule  Latent correlations between predicates  Capturing same subspace using different dimensions 23 dst port = MS SQL Snort rule:Only flow predicates:ML-generated rule: Slammer Worm dst port = MS SQL packet size = 404 flow duration

Evaluation 24  Border router on OC-3 link  Used Snort rules in place  Unsampled NetFlow v5 and packet traces  Statistics  One month, 2 MB/s average, 1 billion flows  400k Snort alarms

Accuracy metrics  Receiver Operator Characteristic (ROC)  Full FP vs TP tradeoff  But need a single number  Area Under Curve (AUC)  Average Precision (AP) 25 AP of p 1 - p p FP per TP 25

 Training on week 1, testing on week n  Minimal drift within a month  High degree of accuracy for header and meta 26 5 FP per 100 TP 43 FP per 100 TP Classifier accuracy Rule classWeek1-2Week1-3Week1-4 Header rules Meta- information Payload

Variance within payload group  Accuracy is a function of correlation between flow and packet-level features 27 RuleAverage Precision MS-SQL version overflow1.00 ICMP PING speedera0.82 NON-RFC HTTP DELIM0.48

Computational efficiency Machine learning (boosting) 33 hours per rule for one week of OC48 2. Classification of flows 57k flows/sec 1.5 GHz Itanium 2 Line rate classification for OC48 Our prototype can support OC48 (2.5 Gbps) speeds:

Chapter II: Evaluating the effectiveness of collaborative anomaly detection Work with: Matthew Caesar Jennifer Rexford Augustin Soule 29

Methodology 1. Identify attacks in IP flow traces 2. Extract attackers 3. Correlate attackers across victims 1) 2)3) 30

Identifying anomalous events  Use existing anomaly detectors 1  IP scans, port scans, DoS  e.g., IP scan is more than n IP addresses contacted  Minimize false positives  Correlate with DNS BL  IP addresses exhibiting open proxy or spambot behavior 1 [Allan IMC ’07], [Kompella IMC ’04] 31

Cooperative blocking  A set ‘S’ of victims agree to participate  Beasty is blocked following initial attack  Subsequent attacks by Beasty on members of ‘S’ are deemed ineffective Beasty is very bad! 32

DHCP lease issues  Dynamic address allocation  IP address first owned by Beasty  Then owned by innocent Tweety  Should not block Tweety’s innocuous queries ? 33

DHCP lease issues  Dynamic address allocation  IP address first owned by Beasty  Then owned by innocent Tweety  Should not block Tweety’s innocuous queries Update DNS BL hourly Block IP addresses for a period shorter than most DHCP leases 1 1 [Xie SIGC ’07] 34

Methodology  IP flow traces from Géant  DNS BL to limit FP  Cooperative blocking of attackers for Δ hours  Metric is fraction of potentially mitigated flows 35

Blacklist duration parameter Δ  Collaboration between all hosts  Majority of benefit can be had with small Δ 36

Number of participating victims  Randomly selecting n victims to collaborate in scheme  Reported number average of 10 random selections 37

Number of participating victims  Collaboration between most victimized hosts  Attackers are more like to continue to engage in bad action “x” than a random other action 38

Chapter conclusion  Repeat-attacks often occur within one hour  Substantially less than average DHCP lease  Collaboration can be effective  Attackers contact a large number of victims  10k random hosts could mitigate 50%  Some hosts are much more likely victims  Subsets of victims can see great improvement 39

Chapter III: Privacy-preserving collaborative anomaly detection Work with: Benny Applebaum Matthew Caesar Michael J Freedman Jennifer Rexford 40

E( ) Secure Correlation Privacy-Preserving Collaboration Haakon Ringberg 41 CNN FOX Google E( ) Protect privacy of Participants: do not reveal who suspected whom Suspects: only reveal suspects upon correlation

System sketch  Trusted third party is a point of failure  Single rogue employee  Inadvertent data leakage  Risk of subpoena 42 Haakon Ringberg Secure Correlation CNN FOX Google MSFT

System sketch  Trusted third party is a point of failure  Single rogue employee  Inadvertent data leakage  Risk of subpoena  Fully distributed impractical  Poor scalability  Liveness issues 43 Haakon Ringberg CNN FOX Google MSFT

 Managed by separate organizational entities  Honest but curious proxy, DB, participants (clients)  Secure as long as proxy and DB do not collude Haakon Ringberg 44 CNN FOX ProxyDB Split trust Recall: Participant privacy Suspect privacy

1. Clients send suspect IP addrs (x)  e.g., x = DB releases IPs above threshold Protocol outline 45 Client / Participant Proxy DB x# x But this violates suspect privacy! Recall: Participant privacy Suspect privacy

Protocol outline 1. Clients send suspect IP addrs (x) 2. DB releases IPs above threshold 46 Client / Participant Proxy DB H(x)# Still violates suspect privacy! Hash of IP address H(x) Recall: Participant privacy Suspect privacy

Protocol outline 1. Clients send suspect IP addrs (x) 2. IP addrs blinded w/F s (x)  Keyed hash function (PRF)  Key s held only by proxy 3. DB releases IPs above threshold 47 F s (x)# F s (1)23 F s (3)2 Client / Participant Proxy DB F s (x) Still violates suspect privacy! Keyed hash of IP address Recall: Participant privacy Suspect privacy

Protocol outline 1. Clients send suspect IP addrs (x) 2. IP addrs blinded w/E DB (F s (x))  Keyed hash function (PRF)  Key s held only by proxy 3. DB releases IPs above threshold 48 F s (x)# F s (1)23 F s (3)2 Client / Participant Proxy DB E DB (F s (x)) But how do clients learn E DB (F s (x))? Encrypted keyed hash of IP address Recall: Participant privacy Suspect privacy

Protocol outline 1. Clients send suspect IP addrs (x) 2. IP addrs blinded w/E DB (F s (x))  Keyed hash function (PRF)  Key s held only by proxy 3. E DB (F s (x)) learned through secure function evaluation 4. DB releases IPs above threshold 49 F s (x)# F s (1)23 F s (3)2 Client / Participant Proxy DB F s (x) x s Recall: Participant privacy Suspect privacy E DB (F s (x)) Possible to reveal IP addresses at the end

Protocol summary  Clients send suspects IPs  Learns F s (x) using secure function evaluation  Proxy forwards to DB  Randomly shuffles suspects  Re-randomizes encryptions  DB correlates using F s (x)  DB forwards bad Ips to proxy 50 F s (x)# F s (3) 12 Client E DB (F s (3)) F s (3) D s (F s (3)) = 3

Architecture  Proxy split into client-facing and decryption oracles  Proxies and DB are fully parallelizable Clients Client-Facing Proxies Proxy Decryption Oracles Front-End DB Tier Back-End DB Storage 51

Evaluation  All components implemented  ~5000 lines of C++  Utilizing GnuPG, BSD TCP sockets, and Pthreads  Evaluated on custom test bed  ~2 GHz (single, dual, quad-core) Linux machines 52 AlgorithmParameterValue RSA / ElGamalkey size1024 bits Oblivious Transferk80 AESkey size256

Scalability w.r.t. # IPs 53  Single CPU core for DB and proxy each

Scalability w.r.t. # clients 54  Four CPU cores for DB and proxy each

Scalability w.r.t. # CPU cores 55  n CPU cores for DB and proxy each

Summary  Collaboration protocol protects privacy of  Participants: do not reveal who suspected whom  Suspects: only reveal suspects upon agreement  Novel composition of crypto primitives  One-way function hides IPs from DB; public key encryption allows subsequent revelation; secure function evaluation  Efficient implementation of architecture  Millions of IPs in hours  Scales linearly with computing resources 56

1. Speed  ML-based architecture supports accurate and scalable Snort-like classification on IP flows 2. Accuracy  Collaborating against mutual adversaries 3. Privacy  Novel cryptographic protocol supports efficient collaboration in privacy-preserving manner Conclusion 57

Future Work Highlights 1. ML-based Snort-like architecture  Cross-site: train on site A and test on site B  Performance on sampled flow records 2. Measurement study  Biased correlation results due to biased DNSBL (ongoing)  Rate at which information must be exchanged  Who should cooperate: end-points or ISPs? 3. Privacy-preserving collaboration  Other applications, e.g., Viacom-vs-YouTube concerns 58

THANK YOU! Collaborators: Jennifer Rexford, Benny Applebaum, Matthew Caesar, Nick Duffield, Michael J Freedman, Patrick Haffner, Balachander Krishnamurthy, and Augustin Soule

 Accuracy is a function of correlation between flow and packet-level features w/o dst port w/o mean packet size RuleOverall Accuracy MS-SQL version overflow1.00 ICMP PING speedera0.82 NON-RFC HTTP DELIM0.48 Difference in rule accuracy

Choosing an operating point 61 XZ Y X = alarms we want raised Z = alarms that are raised Precision Y Z Exactness Recall Y X Completeness AP is a single number, but not most intuitive Precision & recall are useful for operators  “I need to detect 99% of these alarms!”

Choosing an operating point 62 RulePrecision w/recall 1.00 Precision w/recall=0.99 MS-SQL version overflow1.00 ICMP PING speedera CHAT AIM receive message AP is a single number, but not most intuitive Precision & recall are useful for operators  “I need to detect 99% of these alarms!”

Quantifying the benefit of collaboration Effectiveness of collaboration is a function of 1. Whether different victims see the same attackers 2. Whether all victims are equally likely to be targeted 63

IP address blinding Haakon Ringberg 64  DB requires injective and one-way function on IPs  Cannot use simple hash  F s (x) is keyed hash function (PRF) on IPs  Key s held only by proxy Client E DB (F s (x))

x F s (x) Secure Function Evaluation Haakon Ringberg 65  IP address blinding can be split into per-IP-bit x i problem  Client must learn E DB (F s (x i ))  Client must not learn s  Proxy must not learn x i  Oblivious Transfer (OT) accomplishes this 1,2  Amortized OT makes asymptotic performance equal to matrix multiplication 3 Client x s E DB (F s (x)) 1 [Naor et al. SODA ’01], 1 [Freedman et al. TCC ’05], 2 [Ishai et al. CRYPTO ’03]

Public key encryption  Clients encrypt suspect IPs (x)  First w/proxy’s pubkey  Then w/DB’s pubkey  Forwarded by proxy  Does not learn IPs  Decrypted by DB  Does not learn IPs  Does not allow for DB correlation due to padding (e.g., OAEP) 66 Haakon Ringberg Client E DB (E PX (x)) E PX (x)

How client learns F s (x)  Client must learn F s (x)  Client must not learn ‘s’  Proxy must not learn ‘x’  Naor-Reingold PRF  s = { s i | 1 ≤ i ≤ 32}  PRF = g^(∏ x i =1 s i )  Add randomness u i to obscure s i from client Haakon Ringberg 67 Message = u i * s i

How client learns F s (x)  For each bit x i of the IP, the client learns  u i * s i, if x i is 1  u i, if x i is 0  The user also learns ∏ u i Haakon Ringberg 68 x 0 =0x 1 =1x 31 =1 x = u0u0 u 1 * s 1 u 31 * s 31 F s (x) = s0s0 s1s1 s 31 s =

How client learns F s (x)  User multiplies together all values  Divides out ∏ u i  Acquires F s (x) w/o having learned ‘s’ Haakon Ringberg 69 ∏ u i * ∏ x i =1 s i ∏ x i =1 u i * s i * ∏ x i =0 u i ∏ u i * ∏ x i =1 s i / ∏ u i ∏ u i F s (x) = ∏ x i =1 s i

How client learns F s (x)  User multiplies together all values  Divides out ∏ u i  Acquires F s (x) w/o having learned ‘s’ Haakon Ringberg 70 But how does the client learn s i * u i, if x i is 1 u i, if x i is 0 Without the proxy learning the IP x?

Oblivious Transfer (details) 1. Client sends f(x=0) and f(x=1)  Proxy doesn’t learn x 2. Proxy sends  v(0) = E g(f(0)) (1 + r)  v(1) = E g(f(1)) (s + r) 3. Client decrypts v(x) with g(f(x ))  Calculates g(f(x))  Cannot calculate g(f(1-x)) 71 Haakon Ringberg Client x g(f(x)) s Public: f(x) g(x) f(0) f(1) v(0) v(1)

Oblivious Transfer (more details) Haakon Ringberg 72 Proxy chooses random c and r (at startup) Proxy publishes c and g r Client chooses random k (for each bit) Preprocessing: 1.Key x = g k Key 1-x = c * g -k 2.Key x r = (g r ) k Used to decrypt y x 1.Key 0 r = Key 0 r Key 1 r = c r / Key 0 r 2.y 0 = AES Key 1 r (u) y 1 = AES Key 0 r (s * u) Key 0 y0y1y0y1

Oblivious Transfer (more details) Haakon Ringberg 73 1.Key x = g k Key 1-x = c * g -k 2.Key x r = (g r ) k Used to decrypt y x 1.Key 0 r = Key 0 r Key 1 r = c r / Key 0 r 2.y 0 = AES Key 1 r (u) y 1 = AES Key 0 r (s * u) Key 0 y0y1y0y1 Proxy never learns x Client can calculate Key x r = (g r ) k easily, but cannot calculate c r (due to lack of r), which is needed for Key 1-x r = c r * (g r ) -k

Other usage scenarios 1. Cross-checking certificates  e.g., Perspectives 1  Clients = end users  Keys = Hash of certificates received 2. Distributed ranking  e.g., Alexa Toolbar 2  Clients = Web users  Keys = Hash of web pages 74 1 [Wendlandt USENIX ’08], 2 [