SPAM DETECTION IN P2P SYSTEMS Team Matrix Abhishek GhagDarshan Kapadia Pratik Singh.

Slides:



Advertisements
Similar presentations
2/66 GET /index.html HTTP/1.0 HTTP/ OK... Clients Server.
Advertisements

Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
SPAM DETECTION IN P2P SYSTEMS Team Matrix Abhishek GhagDarshan Kapadia Pratik Singh.
LSDS-IR’ Cost-Effective Spam Detection in P2P File-Sharing Systems Dongmei Jia Information Retrieval Lab Illinois Institute of Technology.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
FRIENDS: File Retrieval In a dEcentralized Network Distribution System Steven Huang, Kevin Li Computer Science and Engineering University of California,
1 Distributed, Automatic File Description Tuning in Peer-to-Peer File-Sharing Systems Presented by: Dongmei Jia Illinois Institute of Technology April.
Spotlighting Decentralized P2P File Sharing Archie Kuo and Ethan Le Department of Computer Science San Jose State University.
Peer-to-Peer Networking By: Peter Diggs Ken Arrant.
Paul Solomine Security of P2P Systems. P2P Systems Used to download copyrighted files illegally. The RIAA is watching you… Spyware! General users become.
P2P Network is good or bad? Sang-Hyun Park. P2P Network is good or bad? - Definition of P2P - History of P2P - Economic Impact - Benefits of P2P - Legal.
Peer-to-Peer Computing
1 Client-Server versus P2P  Client-server Computing  Purpose, definition, characteristics  Relationship to the GRID  Research issues  P2P Computing.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Peer-to-peer: an overview Selo TE P2P is not a new concept P2P is not a new technology P2P is not a new technology Oct : first transmission.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
B OTNETS T HREATS A ND B OTNETS DETECTION Mona Aldakheel
Cmpe 494 Peer-to-Peer Computing Anıl Gürsel Didem Unat.
Peer to Peer Network Anas Hardan. What is a Network? What is a Network? A network is a group of computers and other devices (such as printers) that are.

Design and Implement an Efficient Web Application Server Presented by Tai-Lin Han Date: 11/28/2000.
14 Publishing a Web Site Section 14.1 Identify the technical needs of a Web server Evaluate Web hosts Compare and contrast internal and external Web hosting.
1 P2P Computing. 2 What is P2P? Server-Client model.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Peer-to-Peer Networking. Presentation Introduction Characteristics and Challenges of Peer-to-Peer Peer-to-Peer Applications Classification of Peer-to-Peer.
Introduction of P2P systems
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Vulnerabilities in peer to peer communications Web Security Sravan Kunnuri.
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong,
Peer to Peer Networks November 28, 2007 Jenni Aaker David Mize.
1 V1-Filename.ppt / / Jukka K. Nurminen Content Search UnstructuredP2P Content Search Unstructured P2P Jukka K. Nurminen *Partly adapted from.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
FastTrack Network & Applications (KaZaA & Morpheus)
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
PEER TO PEER (P2P) NETWORK By: Linda Rockson 11/28/06.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
6° of Darkness or Using Webs of Trust to Solve the Problem of Global Indexes.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
A Simulation Study of P2P File Pollution Prevention Mechanisms Chia-Li Huang, Polly Huang Network & Systems Laboratory Department of Electrical Engineering.
Peer to Peer Network Design Discovery and Routing algorithms
Peer to Peer Computing. What is Peer-to-Peer? A model of communication where every node in the network acts alike. As opposed to the Client-Server model,
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Peer-to-Peer Systems: An Overview Hongyu Li. Outline  Introduction  Characteristics of P2P  Algorithms  P2P Applications  Conclusion.
Peer-to-Peer (P2P) Networks By Bongju Yu. Contents  What is P2P?  Features of P2P systems  P2P Architecture  P2P Protocols  P2P Projects  Reference.
Freenet: Anonymous Storage and Retrieval of Information
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
ANONYMOUS STORAGE AND RETRIEVAL OF INFORMATION Olufemi Odegbile.
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
An example of peer-to-peer application
Peer-to-Peer Internet Networks
EE 122: Peer-to-Peer (P2P) Networks
DATA RETRIEVAL IN ADHOC NETWORKS
Unexpected Peer-to-Peer
InfoShare A Distributed P2P Information Storage & Retrieval System
Presentation transcript:

SPAM DETECTION IN P2P SYSTEMS Team Matrix Abhishek GhagDarshan Kapadia Pratik Singh

OVERVIEW P2P Basics Spam The Spam Detection Problem Approaches to the Spam Detection Problem Proposal References

P2P Basics Used to connect nodes or machines via large adhoc connections. No concept of a client or server. All nodes or peers are equal. The equal peer nodes function as both client and server. Classification of P2P:- Centralized P2P network – Napster. Decentralized P2P network – KaZaA. Structured P2P network – CAN. Unstructured P2P network – Gnutella. Hybrid P2P network – JXTA.

Advantages of P2P:- All peers provide resources like bandwidth, computing power, storage space, CPU cycles. Replication of data over multiple peers eliminates single point of failure. Applications of P2P:- File Sharing Internet Telephony e.g. Skype. Streaming media files.

From

Spam Spam is any file that is misrepresented deliberately. A well known problem in P2P file sharing systems. Used to manipulate established retrieval and ranking techniques. Anonymous, decentralized and dynamic in nature.

Spam Taken From Malware Prevalence in the KaZaA FileSharing Network Research Paper ACM

Taken From Malware Prevalence in the KaZaA FileSharing Network Research Paper ACM Viruses in P2P

Why is Spam Harmful? Degrades user search experience. Assists the propagation of viruses in the network. More than 200 viruses use P2P as a propagation vector. Increases the load on the traffic in the network.

Spam Hard to detect spam automatically as:- Insufficient and biased information returned as user query. Anonymous, decentralized and dynamic nature. Naïve spam detection technique is download and check manually.

Approaches to Spam Detection Problem Mainly two approaches to the spam detection problem. Detection after downloading file User compares the file with the known databases of genuine files. User filters the file so that other user don't get the spammed copy Detection before downloading file Rigid Trust Web of trust Reputation System Blocking IP address

Object Reputation:- Involves the user to vote for a file either positively or negatively. Based on the voting evaluation and the voting protocol, the file is regarded as genuine or spam. Disadvantages: - Consumes time and labor. Wastage of bandwidth and computing resources. Risk of opening malware. Thus there arises a need to develop an effective automatic spam detection technique.

Goal Automatic Detection of Spam files.

Query Processing Client writes a query. Server compares the result. System Identifier and descriptor. The client groups the individual groups by keys. Ranking. The client becomes the server.

Spamming Steps 1, 3 and 5. Object Reputation on step 1. Feature based Spam Detection on steps 3 and 5.

Feature Based Spam Detection Characterizing Spam. Characterizing Spammers. Then implement techniques that use this characterization to rank the query results.

Classification of Spam Type 1:- Files whose replicas have semantically different descriptors. The Spammer might name a file after a currently popular song or might give multiple names to the same file descriptor. Eg: different song titles for a same key 26NZUBS655CC66COLKMWHUVJGUXRPVUF: “ 12 days after christmas.mp3 ” “ i want you thalia.mp3 ” “ come on be my girl.mp3 ” …

Classification of Spam Type 2:- Files with long descriptors In this a Spammer inserts a single long descriptor for the file. E.g., a single replica descriptor for key A4BB C5B9C271F3DC4: “ Aerosmith, Van Halen, Quiet Riot, Kiss, Poison, Acdc, Accept, Def Leappard, Boney M, Megadeth, Metallica, Offspring, Beastie Boys, Run Dmc, Buckcherry, Salty Dog Remix.mp3 ”

Classification of Spam Type 3:- Files with descriptors with no query terms. In this, if a server is wishing to share a file, it may return the file regardless of whether it matches the query results. Eg. “ Can you afford

Classification of Spam Type 4:- Files that are highly replicated on a single peer. Normal users do not create multiple replicas of the same file on a single server. This is aimed at manipulating the group size. It retards processing of query routing techniques used for finding hard to find data. E.g..177 replicas of the file DY2QXX3MYW75SRCWSSUG6GY3FS7N7YC shared on a single peer.

Proposal We plan to implement the Feature based Spam Detection technique that characterizes the spam based on various features. It includes a probing technique that aggregates more descriptive information of result files and statistics of peer and ranking functions. Our implementation requires little new functionality in the existing P2P file sharing systems, thus it can be combined easily with other existing techniques.

Papers. Author – Dongmei Jia Title – Cost Effective Spam Detection Techniques in P2P File Sharing Systems. Conference -- Proceeding of the 2008 ACM workshop on Large scale Distributed Systems for information retrieval. Date -- October Publisher -- ACM. URL -- M&CFID= &CFTOKEN= References

Author – Dongmei Jia, Wai Gen Yee, Ophir Frieder Title – Spam Characterization and Detection in Peer to Peer File Sharing Systems. Conference -- Proceeding of the 17th ACM conference on Information and knowledge mining Date -- October Publisher -- ACM. URL &coll=portal&dl=ACM&CFID= &CFTO KEN=

References Author – Jia Liang, Rakesh Kumar, Yongjian Xi, Keith W Ross Title – Pollution in P2P File Sharing Systems. Conference -- INFOCOM th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEEINFOCOM th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE Date -- March Publisher -- ACM. URL -- arnumber= &isnumber=32100

SOURCES

Questions???