The Index Poisoning Attack in P2P File Sharing Systems Keith W. Ross Polytechnic University
Jian LiangNaoum Naoumov Joint work with:
Internet Traffic CF: CacheLogic
File Distribution Systems: 2005
Attacks on P2P: Decoying Two types: File corruption: pollution Index poisoning Investigated in two networks: FastTrack/Kazaa –Unstructured P2P network Overnet –Structured (DHT) P2P network –Part of eDonkey
File Pollution pollution company polluted content original content
File Pollution pollution company pollution server pollution server pollution server pollution server file sharing network
File Pollution Unsuspecting users spread pollution !
File Pollution Unsuspecting users spread pollution ! Yuck
Index Poisoning index title location bigparty smallfun heyhey file sharing network
Index Poisoning index title location bigparty smallfun heyhey index title location bigparty smallfun heyhey bighit
Overnet: DHT (version_id, location) stored in nodes with ids close to version_id (hash_title, version_id) stored in nodes with ids close to hash_title First search hash_title, get version_id and metada Then search version_id, get location
Overnet Publish Query Download
FastTrack Overlay Each SN maintains a local index ON = ordinary node SN = super node SN ON
FastTrack Query ON = ordinary node SN = super node SN ON
FastTrack Download ON = ordinary node SN = super node SN ON HTTP request for hash value
FastTrack Download ON = ordinary node SN = super node SN ON P2P file transfer
Attacks: How Effective? For a given title, what fraction of the copies are –Clean ? –Poisoned? –Polluted? Brute-force approach: –attempt download all versions –For those versions that download, listen/watch each one How do we determine pollution levels without downloading?
Titles, versions, hashes & copies The title is the title of song/movie/software A given title can have thousands of versions Each version has its own hash Each version can have thousands of copies A title can also have non-existent versions, each identified by a hash
Definition of Pollution and Poisoning Levels (t, t+ Δ): investigation interval V: set of all versions of title T V 1, V 2, V 3 : sets of poisoned, polluted, clean versions C v : number of advertised copies of version v
How to Estimate? Need C v, vєV Need V 1, V 2, V 3 –Dont want to download and listen to files! Solution: Harvest C v, vєV, and copy locations –Overnet: Insert node, receive publish msgs –FastTrack: Crawl Heuristic for V 1, V 2, V 3
Copies at Users FastTrackOvernet
Heuristic Identify heavy and light publishers H h = set of hashes from heavy publishers H l = set of hashes from light publishers polluted versions clean versions poisoned versions HhHh HlHl
Heuristic: More Evaluation#Download# Success#Accuracy# False Polluted8,4508, %0.6% (positive) Poisoned33,1861, %3.5% (negative) Heuristic is accurate & does not involve any downloading!
FastTrack Versions
FastTrack Copies
Overnet Copies
Blacklisting Assign reputations to /n subnets –Bad reputation to subnets with large number of advertised copies of any title Obtain reputations locally; share with distributed algorithm Locally blacklist /n subnets with bad reputations
Blacklisting: More
The Inverse Attack Attacks on P2P systems: But can also exploit P2P sytems for DDoS attacks against innocent host:
Summary & Thank You!