Download presentation
Presentation is loading. Please wait.
Published byJoel Walsh Modified over 9 years ago
1
SPAM DETECTION IN P2P SYSTEMS Team Matrix Abhishek GhagDarshan Kapadia Pratik Singh
2
OVERVIEW P2P Basics Spam The Spam Detection Problem Approaches to the Spam Detection Problem Proposal References
3
P2P Basics Used to connect nodes or machines via large adhoc connections. No concept of a client or server. All nodes or peers are equal. The equal peer nodes function as both client and server. Classification of P2P:- Centralized P2P network – Napster. Decentralized P2P network – KaZaA. Structured P2P network – CAN. Unstructured P2P network – Gnutella. Hybrid P2P network – JXTA.
4
Advantages of P2P:- All peers provide resources like bandwidth, computing power, storage space, CPU cycles. Replication of data over multiple peers eliminates single point of failure. Applications of P2P:- File Sharing Internet Telephony e.g. Skype. Streaming media files.
5
From http://www.acm.org/crossroads/xrds9-4/gfx/GamestateFidelity1.jpg
6
Spam Spam is any file that is misrepresented deliberately. A well known problem in P2P file sharing systems. Used to manipulate established retrieval and ranking techniques. Anonymous, decentralized and dynamic in nature.
8
Spam Taken From Malware Prevalence in the KaZaA FileSharing Network Research Paper ACM
9
Taken From Malware Prevalence in the KaZaA FileSharing Network Research Paper ACM Viruses in P2P
10
Why is Spam Harmful? Degrades user search experience. Assists the propagation of viruses in the network. More than 200 viruses use P2P as a propagation vector. Increases the load on the traffic in the network.
11
Spam Hard to detect spam automatically as:- Insufficient and biased information returned as user query. Anonymous, decentralized and dynamic nature. Naïve spam detection technique is download and check manually.
12
Approaches to Spam Detection Problem Mainly two approaches to the spam detection problem. Detection after downloading file User compares the file with the known databases of genuine files. User filters the file so that other user don't get the spammed copy Detection before downloading file Rigid Trust Web of trust Reputation System Blocking IP address
13
Object Reputation:- Involves the user to vote for a file either positively or negatively. Based on the voting evaluation and the voting protocol, the file is regarded as genuine or spam. Disadvantages: - Consumes time and labor. Wastage of bandwidth and computing resources. Risk of opening malware. Thus there arises a need to develop an effective automatic spam detection technique.
14
Goal Automatic Detection of Spam files.
16
Query Processing Client writes a query. Server compares the result. System Identifier and descriptor. The client groups the individual groups by keys. Ranking. The client becomes the server.
17
Spamming Steps 1, 3 and 5. Object Reputation on step 1. Feature based Spam Detection on steps 3 and 5.
18
Feature Based Spam Detection Characterizing Spam. Characterizing Spammers. Then implement techniques that use this characterization to rank the query results.
19
Classification of Spam Type 1:- Files whose replicas have semantically different descriptors. The Spammer might name a file after a currently popular song or might give multiple names to the same file descriptor. Eg: different song titles for a same key 26NZUBS655CC66COLKMWHUVJGUXRPVUF: “ 12 days after christmas.mp3 ” “ i want you thalia.mp3 ” “ come on be my girl.mp3 ” …
20
Classification of Spam Type 2:- Files with long descriptors In this a Spammer inserts a single long descriptor for the file. E.g., a single replica descriptor for key 1200473A4BB17724194C5B9C271F3DC4: “ Aerosmith, Van Halen, Quiet Riot, Kiss, Poison, Acdc, Accept, Def Leappard, Boney M, Megadeth, Metallica, Offspring, Beastie Boys, Run Dmc, Buckcherry, Salty Dog Remix.mp3 ”
21
Classification of Spam Type 3:- Files with descriptors with no query terms. In this, if a server is wishing to share a file, it may return the file regardless of whether it matches the query results. Eg. “ Can you afford 0.09 www.BuyLegalMP3.com.mp3”
22
Classification of Spam Type 4:- Files that are highly replicated on a single peer. Normal users do not create multiple replicas of the same file on a single server. This is aimed at manipulating the group size. It retards processing of query routing techniques used for finding hard to find data. E.g..177 replicas of the file DY2QXX3MYW75SRCWSSUG6GY3FS7N7YC shared on a single peer.
23
Proposal We plan to implement the Feature based Spam Detection technique that characterizes the spam based on various features. It includes a probing technique that aggregates more descriptive information of result files and statistics of peer and ranking functions. Our implementation requires little new functionality in the existing P2P file sharing systems, thus it can be combined easily with other existing techniques.
24
Papers. Author – Dongmei Jia Title – Cost Effective Spam Detection Techniques in P2P File Sharing Systems. Conference -- Proceeding of the 2008 ACM workshop on Large scale Distributed Systems for information retrieval. Date -- October 2008. Publisher -- ACM. URL -- http://portal.acm.org.ezproxy.rit.edu/results.cfm?coll=portal&dl=AC M&CFID=14901064&CFTOKEN=96029385 References
25
Author – Dongmei Jia, Wai Gen Yee, Ophir Frieder Title – Spam Characterization and Detection in Peer to Peer File Sharing Systems. Conference -- Proceeding of the 17th ACM conference on Information and knowledge mining Date -- October 2008. Publisher -- ACM. URL -- http://portal.acm.org.ezproxy.rit.edu/citation.cfm?id=14580 82.1458128&coll=portal&dl=ACM&CFID=14901064&CFTO KEN=96029385
26
References Author – Jia Liang, Rakesh Kumar, Yongjian Xi, Keith W Ross Title – Pollution in P2P File Sharing Systems. Conference -- INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEEINFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE Date -- March 2005. Publisher -- ACM. URL -- http://ieeexplore.ieee.org.ezproxy.rit.edu/stamp/stamp.jsp? arnumber=1498344&isnumber=32100
27
SOURCES http://en.wikipedia.org/wiki/Peer-to-peer
28
Questions???
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.