Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al. NEC America TREC 2006 (Blog session) Presentor: Chun-Yuan Teng
Natural Language Processing Lab National Taiwan University Splog characteristics Machine-generated content No Value-addition –No unique information to their readers Hidden agenda, usually an economic goal –Commercial intention
Natural Language Processing Lab National Taiwan University Uniqueness of splogs Dynamic content –Unlike web spam, a splog generates fresh content to drive traffic Non-endorsement link –Hyperlink is an endorsement of other pages –Spammers can create hyperlinks in normal blogs, links in blogs is not endorsement
Natural Language Processing Lab National Taiwan University Features to detect splog Traditional features –Tokenized URL, blog and post titles, homepage content, and post content Temporal regularity –Temporal content regularity/Temporal structural regularity Link regularity –Consistency in target website
Natural Language Processing Lab National Taiwan University Temporal Content Regularity
Natural Language Processing Lab National Taiwan University Temporal Structural Regularity
Natural Language Processing Lab National Taiwan University Link Regularity estimation
Natural Language Processing Lab National Taiwan University Two kinds of spam detection Offline detection –Traditional measurement Online detection –Detect spam online
Natural Language Processing Lab National Taiwan University Experimental Result (Offline)
Natural Language Processing Lab National Taiwan University Experimental results (Offline)
Natural Language Processing Lab National Taiwan University Online indexing in blog search engine
Natural Language Processing Lab National Taiwan University Online test
Natural Language Processing Lab National Taiwan University Online test in this paper
Natural Language Processing Lab National Taiwan University Experimental results
Natural Language Processing Lab National Taiwan University Conclusion and contributions Modeling the splog problem –The uniqueness of splog Regularity based detection –Content and post time Evaluation –Online evaluation