Privacy in P2P based Data Sharing Muhammad Nazmus Sakib CSCE 824 April 17, 2013
Outline Problem Description Background ◦ Privacy ◦ P2P Type of Privacy ◦ Location based ◦ Content based Summary
Problem Description Privacy concerns in P2P networks ◦ User’s ability to control disclosure of personal information Our Goal ◦ Assess the current privacy exposures in existing networks ◦ Discuss the existing solutions to counter them
Privacy The right of individuals to determine for themselves when, how and what extent of information about them is communicated to others Alan Westin, Columbia University
Overview of P2P Distributed application architecture Partitions tasks and workloads Peers are both supplier & consumer No or little centralized control Types ◦ Structured Uses DHT (Distributed Hash Table) Example - Kad ◦ Unstructured Ad hoc fashion Example – Freenet, Gnutella.
Types of Privacy Location Privacy ◦ Controlling disclosure of IP address, geo- graphic location, identity, etc. Content Privacy ◦ Controlling disclosure of personal data files and user behavior.
Location Privacy The problem ◦ Gnutella, eDonkey ◦ Kaaza ◦ Skype + BitTorrent Solutions ◦ Freenet ◦ OneSwarm ◦ I2P
Location Privacy:Problem Gnutella/eDonkey ◦ Change from protocol V.0.4 to V.0.6 increased privacy vulnerability ◦ Users can be monitored by IP address DNS name Software versions Shared files Queries
Location Privacy:Problem Kaaza ◦ No support for anonymity Skype + BitTorrent ◦ It is possible to determine the IP address and file sharing usage of a particular user Blond et al.
Skype + BitTorrent Finding the IP address ◦ Find the target person’s Skype ID ◦ Inconspicuously call this person ◦ Extract callee’s IP address from packet headers ◦ Skype privacy settings fail to protect against this scheme ◦ Observe mobility of the Skype users
Skype + BitTorrent Linking internet usage ◦ Skype tracker employs ten tracking clients to daily collect the IP address for the 100,000 users ◦ Infohash crawler determines the infohashes (file IDs) of the 50,000 most popular BitTorrent swarms ◦ BitTorrent crawler collects the IP addresses participating in the 50,000 most popular swarms ◦ Verifier attempts to initiate P2P communications with the two applications in order to verify that the same user is indeed running both of them
Location Privacy: Solutions Freenet ◦ Protects anonymity of both producers and consumers ◦ Identical nodes collectively pool their storage space to store data files ◦ Dynamically replicated files are referred to in a location-independent manner ◦ Infeasible to discover the true origin or destination of a passing file
Location Privacy: Solutions Freenet ◦ Weakness TTL value of the packets can be used to gain knowledge about the source of the file Surrounding a node with all malicious nodes can monitor incoming and outgoing of packets Slower performance than traditional P2P networks
:Location Privacy: Solutions OneSwarm ◦ Makes a trade-off between performance and anonymity Better performance than Freenet Better privacy than BitTorrent ◦ Control of Privacy is on the users ◦ Data transferred through disposable addresses ◦ Prevents monitoring of user behavior
OneSwarm
OneSwarm Weakness ◦ Timing attack is possible with only two attacking nodes ◦ 15% attacking peers can make 90% peers vulnerable ◦ Thwarting attacks will increase response time greater than Freenet ◦ 25% attackers can monitor 98% peers ◦ A TCP-based attack with only one attacker can identify source of data
Location Privacy: Solutions I2P (Invisible Internet Project) ◦ Network layer allowing communication pseudonymously ◦ Implemented through I2P routers ◦ End-to-end encryption ◦ P2P implementations I2P over BitTorrent iMule (Invisible eMule) I2Phex
I2P Attacks ◦ Timpanaro et al. developed a large scale monitoring architecture ◦ It reveals that a large scale system can compromise its anonymity ◦ Still a better choice than Tor or Freenet
Content Privacy Kaaza Kad Personal Health Information
Content Privacy Kaaza ◦ Good et al. conducted experiments to Find out whether users are sharing personal files Find out whether the shared files are downloaded ◦ Results indicate (24 hour period) 156 distinct users shared their inbox 19 out of 20 users shared files 9 users shared web browser cache 5 users shared word processing documents 2 users shared financial documents Shared dummy files were downloaded by 4 distinct users
Content Privacy Kad Network ◦ Dragonfly monitoring system Passively monitor sharing and downloading events ◦ Within 2 weeks 5000 private files related to 10 distinct keywords ◦ Honey files 192 distinct attackers tried to download 45 attackers tried to hack into the honey accounts 125 times ◦ Solution eMule plugin – Numen
Content Privacy Personal Health Information (PHI) ◦ Emam et al. designed a system to download files from P2P networks ◦ Results show 0.4% Canadian IP had PHI 0.6% US IP had PHI Personal Financial Information (PFI) ◦ Same experiment 1.7% Canadian IP had PFI 4.7% US IP had PFI Experiments performed over ◦ FastTrack (Kaaza) ◦ Gnutella ◦ eDonkey
Summary Considerable amount of privacy exposures are present in current P2P systems for both location and content privacy Several solutions have been proposed to provide anonymity, while very few solutions for content privacy Flaws are present in the existing solutions
Questions?