Download presentation
Presentation is loading. Please wait.
Published bySamson Hardy Modified over 9 years ago
1
Detecting Web Spam through Backward Propagation of Distrust CS315-Web Search and Mining
2
And Now For Something Completely(?) Different Propaganda: Attempt to modify human behavior, and thus influence people’s actions in ways beneficial to propagandists Theory of Propaganda Developed by the Institute for Propaganda Analysis 1938-42 Propagandistic Techniques (and ways of detecting propaganda) Word games - associate good/bad concept with social entity Glittering Generalities — Name Calling Transfer - use special privileges (e.g., office) to breach trust Testimonial - famous non-experts’ claims Plain Folk - people like us think this way Bandwagon - everybody’s doing it, jump on the wagon Card Stacking - use of bad logic
3
Web Spammers as Propagandists Web Spammers can be seen as employing propagandistic techniques in order to modify the Web Graph There is a pattern on how to spam!
4
YOU Mom Partner Famous Actress Prof. X NYTimes Rev. Y Joe (a plumber) US Pres. Democracy Your Boss Anti-Spam Lessons from Society What would you do if you realize that you should not trust a member of your trust network? ? The Coffee Joint X ? ? ? ? ? ?
5
Anti-Propagandistic Lessons for Web How do you deal with propaganda in real life? Backwards propagation of distrust The recommender of an untrustworthy message becomes untrustworthy Can you transfer this technique to the web?
6
An Anti-Propagandistic Algorithm Start from untrustworthy site s S = {s} Using BFS for depth D do: Find the set U of sites linking to sites in S (using the Google API for up to B b-links/site) Ignore blogs, directories, edu’s S = S + U Find the bi-connected component BCC of U that includes s BCC shows multiple paths to boost the reputation of s
7
Backwards Propagation of Distrust Start from untrustworthy site s S = {s} Using BFS for depth D do: Find the set U of sites linking to sites in S (using the Google API for up to B b-links/site) Ignore blogs, directories, edu’s S = S + U Find the bi-connected component BCC of U that includes s BCC shows multiple paths to boost the reputation of s
8
BCC vs Periphery Since the BCC reveals multiple paths to boost the reputation of s, we expect it to contain a higher percentage of untrustworthy sites The Periphery of the BCC, on the other hand, should have significantly lower percentage of untrustworthy sites BCC Periphery
9
Explored neighborhoods
10
Evaluated Experimental Results The trustworthiness of starting site is a very good predictor for the trustworthiness of BCC sites The BCC is significantly more predictive of untrustworthiness than the Periphery BCC Periphery
11
Link Farms vs MAS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.