Download presentation
Presentation is loading. Please wait.
1
A Machine Learning Approach
Detecting Spam Blogs: A Machine Learning Approach Pranam Kolari, Akshay Java, Tim Finin, Tim Oates, Anupam Joshi What are Spam Blogs (Splogs)? Blogs hosting machine generated posts, each post contributing to web spam Posts featuring content hijacked from other blogs and/or stuffed keywords Posts with links interspersed between random text Blogs with placement of context based ads only to fool users into clicking ads (See 3) 1 A Case of Content Plagiarism (1) Original Post by Ebiquity (2) Infiltration in Search Results (3) A splog result 2 3 Why are Splogs a Problem? Splogs undermine ranking algorithms(See 6) (Source: MSR Search Defender Report) Splogs water down search results (See 6) Splogs threaten the Web advertising model (See 3) Splogs indulge in “plagiarism” (See 1,2,3) Splogs skew results of social research tools (See 4) Splogs stress the Blogosphere infrastructure of ping servers, blog search engines, etc. (See 4,5) $197 “Holy Grail Of Advertising... “ “Easy Dominate Any Market, Any Search Engine, Any Keyword” 4 This Work Formalizes the Splog Detection Problem Supervised Machine Learning Technique Training set of hand labeled examples – 700 each of positive and negative Effectiveness of Specialized features Local Models for Fast Splog Detection Global link-based models effective for (delayed) Splog Detection Precision/Recall of 87% for bag-of-words See 5 6 Blog Features We, what, was, my, org, flickr, paper, words, me, thank, go, archives Splog Features Find, info, news, website, best, articles, perfect, Products, uncategorized, hot, Resources, inc, copyright Partially supported by NSF award ITR-IIS and ITR-IDM and IBM
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.