Download presentation
Presentation is loading. Please wait.
Published byBrenda Black Modified over 9 years ago
1
David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology Arnaud Legout, INRIA Sophia-Antipolis ReCon: Revealing and Controling PII Leaks in Mobile Network Systems DTL Workshop, Nov. 2015 Sponsored by:
2
Motivation 2 Mobile devices Rich sensors Ubiquitous connectivity Key questions What personal information is transmitted? To whom does it go? What can average users do about it?
3
How Frequently Is PII Leaked? 3 Basic tracking is common Significant fraction of very personal information leaked across all platforms PII leakage is pervasive! Fraction of top 100 apps leaking PII (Tested in September, 2015)
4
How to Detect PII Leaks in Mobile? 4 At the OS Information flow analysis (static/dynamic/hybrid) Ok solutions, but not perfect or easily deployable In the network Independent of OS, app store Easy to detect if you know what PII to search for What if you don’t know the PII a priori?
5
ReCon: Automatically Identifying PII Leaks 5 Hypothesis: PII leaks have distinguishing characteristics Is it just simple key/value pairs (e.g., “ user=R3C0N ”)? Nope, this leads to high FP/FN rates Need to learn the structure of PII leaks Approach: Build ML classifiers to reliably detect leaks Does not require knowing PII in advance Resilient to changes in PII leak formats over time We built ReCon Machine learning to reveal PII leaks from mobile devices Software middleboxes to intercept and control leaks Works on all major platforms (iOS, Android, Windows Phone)
6
ReCon: Viewing detected leaks 6 PII Category Device Identifiers Contact Information User Identifiers Credentials User Feedback Correct Incorrect Not sure Not about me
7
Where They Know You’ve Been 7 Location information is hard to digest using text alone WTKYB shows just how pervasive location tracking is Creepiness factor to help users care more about privacy(?)
8
Mitigating PII Leaks 8 ReCon gives users control over leaks Example simple strategies Block PII Modify PII Randomize identifiers Coarsen locations Advanced mitigation (under dev) Mock user profiles Provide k-anonymity
9
How does ReCon work? 9 Key challenges for ML-based PII detection Which classifier do we use? C4.5 Decision Tree is best trade-off between speed and accuracy How do we train the classifier? Use traces from real users and controlled experiments Break flows into separate words that may indicate a leak Feature selection for scalability How well are we doing? Controlled experiments In the wild: Only the users themselves know for sure! Crowdsourced reinforcement
10
Key Results: ReCon accuracy 10 How accurate is ReCon? 99% overall accuracy from controlled experiments FPR: 2.2%, FNR: 3.5% Why? Per-domain classifiers Decision tree captures non-trivial cases
11
Key Results: ReCon Has Good Coverage 11 How does it compare to other solutions? ReCon finds significantly more PII than IFA solutions ReCon successfully idenifies missing leaks after retraining Fraction of total leaks found
12
Key Results: User study 12 IRB-approved user study 24 iOS, 13 Android devices 20/26 responses: system useful & behavior change 165 cases of credential leaks, 94 verified Average leaks: iOS > Android Unexpected, suspicious leaks Recipe/cooking app tracks location Video/Game/News app leaks gender And more… Check out http://recon.meddle.mobihttp://recon.meddle.mobi
13
Summary 13 ReCon: Provides transparency/control over PII leaks Relies only on access to network traffic (OS independent) Machine learning to automatically identify PII leaks Crowdsourced reinforcement with user feedback Works today! Check out http://recon.meddle.mobihttp://recon.meddle.mobi Sponsor: Questions? David Choffnes choffnes@ccs.neu.edu
14
Backups 14
15
Encryption and ReCon 15 What is your answer for increasing use of encryption? Recon needs access only to plaintext flows mcTLS, BlindBox Route to trusted middlebox that can do MITM Works for most apps, but usually not logins Haystack (on Android device)
16
Encryption: What is leaked? 16 Leaks over SSL (not much) Send PII to trackers over SSL (100 apps/device) 6 iOS 2 Android 1 Windows Problem with SSL Certification pining Not working with VPN enabled Obfuscation Little evidence in controlled experiment using IFA
17
Other applications of ReCon 17 K-anonymity Explicit sharing Allow users to control how much shared to third-parties Obfuscation Retrain classifiers to identify obfuscated leaks Use static/dynamic to analysis tools that are resilient to evasion techniques
18
Deployment models 18 ReCon only needs access to network flows VPN proxy (current deployment): tunnel to proxy server Currently supported by all mobile OSes Can run VMs anywhere in the world Raspberry Pi In home network Enables HTTPS decryption with minimal additional risks On device Haystack on Android In network Awazza and other APN/middlebox deployment models
19
Methodology Details 19 Controlled experiments as ground truth Text classification approaches Problem: Given a network flow, whether it contains PII information or not? Feature Extraction: Bag-of-word model Example.com /someevent?x=1&y=2 {“z”:”xx@y”} Words: someevent, x, 1, y, 2, z, xx@y, Per-Domain classifiers (e.g. Google-Analytics) Faster (compared to one-for-all) More accurate Library: weka
20
Why Run ReCon? 20 User incentives Control over data leaks! Blocking unwanted content k-anonymity for increased privacy
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.