Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/40 Quantifying and Preventing Privacy Threats in Wireless Link Layer Protocols Thesis Proposal Jeffrey Pang.

Similar presentations


Presentation on theme: "1/40 Quantifying and Preventing Privacy Threats in Wireless Link Layer Protocols Thesis Proposal Jeffrey Pang."— Presentation transcript:

1 1/40 Quantifying and Preventing Privacy Threats in Wireless Link Layer Protocols Thesis Proposal Jeffrey Pang

2 2/40 They leave “digital fingerprints” that reveal who we are –And thus where we’ve been and what we’ve been doing “Bob@Intel” Why is Bob over there? Motivation: The Mobile Wireless Landscape

3 3/40  Location privacy is growing concern Wireless Privacy Protection Act [U.S. House Bill ‘05] Geographic location/privacy working group [IETF] Motivation: The Mobile Wireless Landscape

4 4/40 Motivation: The Mobile Wireless Landscape A well known technical problem –Devices have unique and consistent addresses –e.g., 802.11 devices have MAC addresses  fingerprinting them is trivial! MAC address now: 00:0E:35:CE:1F:59 MAC address later: 00:0E:35:CE:1F:59 tcpdump time

5 5/40 Motivation: The Mobile Wireless Landscape The widely proposed technical solution –Pseudonyms: Change addresses over time 802.11: Gruteser ’05, Hu ’06, Jiang ’07 Bluetooth: Stajano ’05 RFID: Juels ‘04 GSM: already employed MAC address now: 00:0E:35:CE:1F:59 MAC address later: 00:AA:BB:CC:DD:EE tcpdump ? time

6 6/40 Motivation: The Mobile Wireless Landscape Thesis: Pseudonyms are not enough –Implicit identifiers: identifying characteristics of traffic –Can reveal identity, history, associations, and more –New techniques are needed to hide implicit identifiers 00:0E:35:CE:1F:5900:AA:BB:CC:DD:EE tcpdump Probe: “Bob’s Home Network” Packets  Intel Email Server Probe: “Bob’s Home Network” Packets  Intel Email Server time

7 7/40 Thesis Outline Completed Work Proposed Work Implicit identifiers in 802.11 –What causes implicit identifiers? –How accurate are implicit identifiers? Private service discovery –How can a client find a service privately? Framework for masking side-channels –How can we prevent packet sizes and timing from revealing private information? [Mobicom 2007] [HotNets 2007 Submitted to Mobisys 2008]

8 8/40 Completed Work Proposed Work Thesis Outline Implicit identifiers in 802.11 –What causes implicit identifiers? –How accurate are implicit identifiers? Private service discovery –How can a client find a service privately? Framework for masking side-channels –How can we prevent packet sizes and timing from revealing private information?

9 9/40 802.11 Networks: PublicHomeEnterprise Network destinations SSIDs in probes Broadcast pkt sizes MAC header fields e.g. supported rates, auth algs. e.g. NetBIOS, mDNS queries (e.g., iTunes) e.g. home, work 802.11 network names e.g. email, vpn server, browser homepage Implicit Identifier Summary tcpdump email packet: 169.45.73.45 802.11 SSID: djw mDNS packet sizes: 245, 239 802.11 rates: 11, 2, 1Mbps time email packet: 169.45.73.45 802.11 SSID: djw mDNS packet sizes: 245, 239 802.11 rates: 11, 2, 1Mbps

10 10/40 Implicit Identifier Summary 802.11 Networks: PublicHomeEnterprise Network destinations SSIDs in probes Broadcast pkt sizes MAC header fields Visible even with WEP/WPA TKIP encryption More implicit identifiers exist  Results we present establish a lower bound Identifying even if devices have identical drivers

11 11/40 Tracking 802.11 Users Many potential tracking applications: –Was user X here today? –Where was user X today? –When was user X here? –Etc.

12 12/40 Tracking 802.11 Users Tracking scenario: –Every users changes pseudonyms every hour –Adversary monitors some locations  One hourly traffic sample from each user in each location ? ? ? tcpdump Build a profile from training samples: First collect some traffic known to be from user X and from random others tcpdump..... ? ? ? Then classify observation samples Traffic at 2-3PM Traffic at 3-4PM Traffic at 4-5PM Traffic at 2-3PM Traffic at 3-4PM Traffic at 4-5PM Traffic at 2-3PM Traffic at 3-4PM Traffic at 4-5PM user X

13 13/40 Sample Classification Algorithm Core question: –Did traffic sample s come from user X? A simple approach: naïve Bayes classifier –Derive probabilistic model from training samples –Given s with features F, answer “yes” if: Pr[ s from user X | s has features F ] > T for a selected threshold T. –F = feature set derived from implicit identifiers

14 14/40 Deriving features F from implicit identifiers Set similarity (Jaccard Index), weighted by frequency: IR_Guest SAMPLE FROM OBSERVATION Sample Classification Algorithm linksys djw SIGCOMM_1 PROFILE FROM TRAINING Rare Common w(e) = low w(e) = high

15 15/40 Evaluating Classification Effectiveness Simulate tracking scenario –SIGCOMM, UCSD wireless traces Question: Is observation sample s from user X? Evaluation metrics: –True positive rate (TPR) Fraction of user X’s samples classified correctly –False positive rate (FPR) Fraction of other samples classified incorrectly Fix T for FPR = 0.01 Measure TPR

16 16/40 Results: Individual Feature Accuracy TPR  60% TPR  30% Individual implicit identifiers give evidence of identity 1.0

17 17/40 Results: Multiple Feature Accuracy Users with TPR >50%: Public: 63% Home: 31% Enterprise: 27% We can identify many users in all environments PublicHomeEnterprise netdests ssids bcast fields

18 18/40 What Causes Implicit Identifiers? Not all link layer protocol information is encrypted –Straw man: encrypt everything? –Challenge: efficient message processing (e.g., for service discovery) Encryption does not prevent traffic analysis –Straw man: uniform cover traffic? –Challenge: shared medium  large performance hit Implementation and configuration variation –Straw man: standardize? –Challenge: implementation flexibility

19 19/40 Completed Work Proposed Work Thesis Outline Implicit identifiers in 802.11 –What causes implicit identifiers? –How accurate are implicit identifiers? Private service discovery –How can a client find a service privately? Framework for masking side-channels –How can we prevent packet sizes and timing from revealing private information?

20 20/40 Local Service Discovery Used to find: 802.11 networks consumer electronics local services other applications

21 21/40 How Discovery is Done Today Services send announcements and/or clients send probes (typically via unencrypted broadcast) Important properties: Plug-and-play networking –Can proceed automatically without user input Disconnected operation –Requires only communication medium between client and service iTunes “Bob” is hereIs iTunes “Bob” here? “Bob” is hereConnect to “Bob”

22 22/40 Prove I am “Alice” (e.g., credential) Prove I am “Alice’s Network” Association succeeded Discovery is Not Authenticated Authentication occurs only after discovery –Problem: Anyone can elicit a response, even if they are not trusted to access the service Is “Alice’s Network” here? “Alice’s Network” is here Associate Association denied

23 23/40 Constraints: –Clients and services may want to hide from third parties –Plug-and-play networking, disconnected operation Is iTunes “Bob” here? “Bob” is hereConnect to “Bob” iTunes “Bob” is here Solution Requirements

24 24/40 F (Alice, Bob, Is iTunes “Bob” here?) F (Bob, Alice, “Bob” is here) Is iTunes “Bob” here? “Bob” is here Goal: Procedure F such that A sends to B: msg_private = F (A, B, msg_orig) Desired security properties –Unlinkability: only A and B can link msg_privates sent at different times to same sender and receiver –Authenticity: B can verify A created msg_private –Confidentiality: only A and B can determine msg_orig –Integrity: B can verify msg_orig not modified Solution Requirements

25 25/40 Public Key Protocol Probe “Alice” ClientService Key-private encryption (e.g., ElGamal) K Alice Check signature: Try to decrypt K -1 Alice K Bob Based on [Abadi ’04] K -1 Bob Sign: timestamp

26 26/40 Problems Must obtain public keys for new services/clients –May be disconnected during discovery –Don’t want to involve extra user action Must try to decrypt every message –Public key decryption is slow (>100ms on typical AP)  delays processing of other messages  susceptible to low-rate denial-of-service attacks

27 27/40 Observations Must obtain public keys for new services/clients –Existing pairing techniques for initial key exchange –We present two “automated” mechanisms [HotNets ‘07] Must try to decrypt every message –Common case is to rediscover known services –Can negotiate a secret symmetric key the first time

28 28/40 Symmetric Key Protocol Probe “Alice” ClientService Symmetric encryption (e.g., AES w/ random IV) Check MAC: MAC:K’ Shared K Shared K’ Shared timestamp Try to decrypt with each shared key K Shared1 K Shared2 K Shared3 …

29 29/40 Observations Must obtain public keys for new services/clients –Existing pairing techniques for initial key exchange –We present two “automated” mechanisms [HotNets ‘07] Must try to decrypt every message –Common case is to rediscover known services –Can negotiate a secret symmetric key the first time –Linkability at short timescales is usually OK –Can use temporary unlinkable addresses

30 30/40 Tryst Protocol Probe “Alice” ClientService Symmetric encryption (e.g., AES w/ random IV) Check MAC: MAC:K’ Shared K Shared K’ Shared ATAT ATAT K Shared Lookup A T in a table to get K Shared timestamp Try to decrypt with each shared key K Shared1 K Shared2 K Shared3 … A T-1 A0A0 ATAT AES K’’ Shared (0) A T+1 …… AES K’’ Shared (T-1)AES K’’ Shared (T)AES K’’ Shared (T+1)

31 31/40 Evaluation Implementation –SlyFi “identifier-free” wireless link layer –Linux kernel driver –Software crypto Evaluation –Deployed on Soekris low-power devices –Measure link setup time link setup

32 32/40 Results: Link Setup Time Tryst has link setup times comparable to WPA public key symmetric key wpa tryst wifi-open

33 33/40 Completed Work Proposed Work Thesis Outline Implicit identifiers in 802.11 –What causes implicit identifiers? –How accurate are implicit identifiers? Private service discovery –How can a client find a service privately? Framework for masking side-channels –How can we prevent packet sizes and timing from revealing private information?

34 34/40 Side Channel Privacy Leaks Packet sizes and timing can reveal sensitive information –Passwords used [Song ’01] –Webpages visited [Sun ’02] –Videos watched [Saponas ’07] –Languages spoken (over VoIP) [Wright ’07] –Identity (e.g., broadcast packet sizes) [Pang ’07] time Broadcast transmission sizes time Broadcast transmission sizes 300 250 200 100 500 120 Example: Broadcast packet sizes used as a fingerprint

35 35/40 Previous Work Information prevented from leaking all potential Application transparency none code modification opaque knowledge of traffic patterns Trace-based cover traffic [Newman-Wolfe ‘92, Guan ‘01] Specific attack countermeasures [Timmerman ’99, Smart ‘00] Language-based information flow security [Volpano ’96, Agat ’00, Meyers ‘99] Status quo Proposal: Framework to implement select countermeasures –Enable overhead / privacy protection trade-off –Similar to signature-based anti-virus and IDS overhead Naïve cover traffic

36 36/40 Part I: Rule-based Masking Example: masking packet sizes  time Input transmissions 300 250 200 100 120 time Output transmissions 400  Input transmissions Masking rules: “output size independent of input size” Performance constraints: “minimize delay”

37 37/40 System Overview  definition Masking rules Perf. constraints  output Output traffic profile

38 38/40 Primary Challenges  definition: masking rule language –Must be flexible enough for real countermeasures Describe packet size, inter-packet spacing Describe sequences, frequencies, periodicity Describe multiple time granularities –Must be uniform enough to enable rule composition e.g. break up all packets so they have uniform size  express all rules in terms of inter-packet spacing  output: satisfying multiple masking rules –Must handle infeasible constraints gracefully Allow the rule language to describe some slack e.g. “make output as independent as possible of input”

39 39/40 Part II: Learning Masking Rules APs learn location dependent rule parameters –Traffic profiles become location rather than user dependent –Mimic local traffic patterns to customize overhead Challenges: –How to learn parameters over time –How to minimize performance impact of adversarial clients learner input traffic profiles home masking rules learner input traffic profiles airport masking rules learner input traffic profiles starbucks masking rules

40 40/40 Completed Work Proposed Work Summary and Thesis Timeline Implicit identifiers in 802.11 –What causes implicit identifiers? –How accurate are implicit identifiers? Private service discovery –How can a client find a service privately? Framework for masking side-channels –How can we prevent packet sizes and timing from revealing private information? JanFebMarAprMayJunJulAugSepOctNovDec CCS/OSDINDSS/S&P Proposed Part IProposed Part IIWrite ThesisSlack 2008 [Mobicom 2007] [HotNets 2007 Submitted to Mobisys 2008]

41 41/40 === BACKUP SLIDES: II ===

42 42/40 Implicit Identifier Related Work Other Implicit Identifiers –Device driver fingerprints [Franklin ’06] –Clock-skew fingerprints [Kohno ’05] –Click-prints [Padmanabhan ’06] –RF antenna fingerprints [Hall ’04] Our work: –802.11 fingerprints for individual users –Tracking with only commodity hardware/software –Better coverage than some previous work –Procedure to combine implicit identifiers

43 43/40 Wireless Traces Simulate tracking scenario with wireless traces: –Split each trace into training and observation phases –Simulate pseudonym changes for each user X DurationProfiled UsersTotal Users SIGCOMM conf. (2004) 4 days377465 UCSD office building (2006) 1 day153615 Apartment building (2006) 14 days39196

44 44/40 Evaluating Classification Effectiveness

45 45/40 Results: Multiple Feature Accuracy Some users much more distinguishable than others Public networks: ~20% users identified >90% of the time PublicHomeEnterprise netdests ssids bcast fields

46 46/40 One Application Question: Was user X here today? More difficult to answer: –Suppose N users present each hour –Over an 8 hour day, 8N opportunities to misclassify  Decide user X is here only if multiple samples are classified as his Revised: Was user X here today for a few hours?

47 47/40 Results: Individual Feature Accuracy Individual implicit identifiers give evidence of identity TPR  50% Other implicit identifiers distinguish groups of users

48 48/40 Results: Tracking with 90% Accuracy Majority of users can be identified if active long enough Of 268 users (71%): 75% identified with ≤4 samples 50% identified with ≤3 samples 25% identified with ≤2 samples

49 49/40 Results: Tracking with 90% Accuracy Many users can be identified in all environments

50 50/40 === BACKUP SLIDES: SD ===

51 51/40 SD Related work SmokeScreen [Cox ‘07] – access control for discovering friends –Similar to Tryst protocol –No authentication, address computation more expensive –Uses online social network to exchange secret keys SSDS [Czerwinski ‘00] – secure service discovery architecture –Relies on trusted infrastructure –Not necessarily confidential Broadcast Encryption [Fiat ‘93] – encrypt message to many users –Making this private is an open problem JFK [Aiello ‘93] – efficient Internet key exchange –No service privacy … –… or not resilient to man-in-the-middle attacks

52 52/40 ? New services in trusted domains Trusted Trusts: alice@mac.com “alice@mac.com/ds” “alice@mac.com/laptop” “bob@gmail.com/zune” “bob@gmail.com/psp” “bob@gmail.com/laptop” Anonymous Identity Based Encryption public key=“alice@mac.com/ipod” Key provider preloads devices = private key

53 53/40 PublicParams, MasterSecret = Setup () ciphertext=Encrypt (K AliceiPod, PublicParams, plaintext) plaintext=Decrypt (K -1 AliceiPod, PublicParams, ciphertext) Anonymous identity based encryption publicly publishedknown only by key provider “alice@mac.com/ipod” Extract(“alice@mac.com/ipod”, MasterSecret) = = Some assumptions over traditional public key crypto –Alice and Bob trust key provider not to reveal secret keys to third parties Can instead trust that no t of n providers collude (use threshold crypto) May also be able act as their own key providers (anonymity unproven) –Revoking my public key implies changing my identity since identity = key Can instead use temporary identities (“alice@mac.com/ipod.nov.2007”) Only need to use protocol until first discovery [Boneh & Franklin ’01]

54 54/40 New services transitively trusted “Alice’s Home” Trust Transitive Trust Alice trusts bob.laptop Alice’s secret Alice trusts “Alice’s Home” Alice’s secret Find networks that Alice trusts Attestation

55 55/40 SD is widely used Example 1: Application Protocols (OSDI 2006) Example 2: 85% devices send WiFi discovery probes (SIGCOMM 2004)

56 56/40 Problem: SD Reveals History Probes can reveal services you have used Problem: Network names can be correlated with location (e.g., using a wardriving database) http://www.wigle.net Is 802.11 network “djw” here?

57 57/40 Problem: SD Reveals History Probes can reveal services you have used Problem: Network names can be correlated with location (e.g., using a wardriving database) 23% of devices at SIGCOMM 2004 probed for an name that WiGLE isolates to one city All 4 known home networks located to within 2 blocks

58 58/40 Key problem: messages can be linked Consistent naming enable correlation of SD messages Opaque names prevent some problems… but not all: –Example: location can be correlated with other databases Is “Juvenile Detention Classroom” here? Is “010294859” here? 010294859

59 59/40 Tryst Packet Format

60 60/40 Background Probing Rates

61 61/40 Address Recomputation Time

62 62/40 === OLD SLIDES ===

63 63/40 Solution Requirements Goal: Procedure F such that A sends to B: msg_private = F (A, B, msg_orig) Desired security properties –Unlinkability: only A and B can link msg_privates sent at different times to same sender and receiver –Authenticity: B can verify A created msg_private –Confidentiality: only A and B can determine msg_orig –Integrity: B can verify msg_orig not modified Primary challenge –Efficient message processing

64 64/40 Summary and Thesis Timeline Implicit identifiers can reveal sensitive information –Analysis shows that they are often sufficient to track users We propose efficient ways to mask the two main types: –Names exchanged during service discovery –Characteristics exposed via packet size/timing side-channels Thesis timeline: JanFebMarAprMayJunJulAugSepOctNovDec CCS/OSDINDSS/S&P Proposed Part IProposed Part IIWrite ThesisSlack 2008


Download ppt "1/40 Quantifying and Preventing Privacy Threats in Wireless Link Layer Protocols Thesis Proposal Jeffrey Pang."

Similar presentations


Ads by Google