Presentation is loading. Please wait.

Presentation is loading. Please wait.

Review Spam Detection via Temporal Pattern Discovery Sihong Xie, Guan Wang, Shuyang Lin, Philip S. Yu Department of Computer Science University of Illinois.

Similar presentations


Presentation on theme: "Review Spam Detection via Temporal Pattern Discovery Sihong Xie, Guan Wang, Shuyang Lin, Philip S. Yu Department of Computer Science University of Illinois."— Presentation transcript:

1 Review Spam Detection via Temporal Pattern Discovery Sihong Xie, Guan Wang, Shuyang Lin, Philip S. Yu Department of Computer Science University of Illinois at Chicago

2 What’s review spams Created on review websites in order to create positive impressions for bad products/stores, and make profit out of misled customers They are harmful: lead to poor customer experience, ruin reputation of good stores Guidelines to spot fake reviews for human, but it is hard for machines ([1]) Give some examples of spam reviews Also give brief descriptions Give some examples of spam reviews Also give brief descriptions http://consumerist.com/2010/04/how-you-spot-fake-online-reviews.html

3 Human friendly clues of spams Language features [5] 1.All praises 2.say nothing about the product 3.Red flag words 4.mention the name a lot Too hard for machines: Involving natural language processing

4 Machine friendly clues of spams similar reviews (texts and ratings) on one product / product group in a short time [ 1,2 ]

5 Machine friendly clues of spams Group of spammers: Wrote reviews together frequently on the same set of products / stores [3] Reviewer 1 Reviewer 2Reviewer 3

6 How to play the spamming game Duplicated reviews: two reviews are almost the same More sophisticated writings Easy: shingling Use different reviewer ids detection is easy [3] Use different reviewer ids detection based on statistics of reviewer Player 1 Detection systems Player 2 Spammers Group spamming: a group of reviewers frequently write reviews together. Other kinds: targeting on the same product, similar texts/ratings by one id.

7 Failed machine friendly clues of spams If these reviews were posted by the same id, it would have been easy to detect

8 Same id wrote multiple reviews, making it easy to be detected. Smart spammers would avoid this Reviewer 1 Reviewer 2Reviewer 3 Failed machine friendlyclues of spams

9 Spammers like singleton spam Strong motivations to have singleton spams: 1. Need to boost the rating in a short time 2. Need to avoid being caught 3. Post reviews with high rating under different names in a short time

10 Singleton reviews 0+ Each reviewer id contributes only one review for one store only A physical person can register many reviewer ids Reviewer id Store Singleton non-singleton + Spammer Registration 0 Normal reviewer

11 Facts of Singleton reviews Constitute a large portion of all the reviews Over 90% of the reviews are singleton reviews in this paper; similar situations in another dataset [4] More influential, more harmful

12 The challenges Traditional clues shortcomings Review features (bag of words, ratings, brand names reference) [4] Hard for human, not to mention machines Reviewer features (rating behaviors) [1] Poor if one wrote only one review Product/Store features[4] Tell little about individual reviews Review/reviewer/store reinforcements Fails on large number of spam reviews with consistent ratings Group spamming [2,3] No applicable on singleton reviews Singleton reviews detection [7]* Finds suspicious hotels, can’t find individual singleton spam * [7] is a supervised method, and we have contrasting conclusion with theirs

13 The proposed method Recall the motivations of singleton reviews: boost the ratings in a short time and avoid being caught The results: in a short time, many reviewers wrote only one review with a very high rating The correlations between rating and The correlations between rating and volume of (singleton) reviews is the key feature of singleton review spamming

14 Detected burst of singleton spams average rating number of reviews ratio of singleton reviews a suspicious time window

15 The algorithm 1. For each store do A. split the whole period into small time windows B. compute avg rating, total number of reviews, percentage of singleton reviews in each window C. form a three dimension time series D. detect windows with correlated burst patterns E. for each detected window, repeat step A.-D. until window size becomes too small

16 5 The algorithm 1 3 2 4 5 5 4 1 3 2 average rating: 2 review volume: 3 SR volume: 1/3 average rating: 4.6 review volume: 5 SR volume: 5/5 average rating: 2 review volume: 3 SR volume: 3/3 sorted by posting time; divided into groups Multi-dimensional time series the correlated burst

17 Dataset A snapshot of a review website* 408,469 reviews, 343,629 reviewers 310,499 reviewers (> 90%) wrote only one review 76% reviews are singleton reviews Focus on top 53 stores with over 1,000 reviews * www.resellerratings.comwww.resellerratings.com # reviewers # reviews

18 Experimental results 29 stores are regarded as suspicious by least 2 out of 3 human evaluators. The proposed algorithm labeled 39 stores as suspicious. ( recall = 75.86%, precision = 61.11%)

19 Case studies time window size = 30 days correlated bursts detected Period with the detected correlated burst enlarged time window size = 15 days Volume of reviews: 57 Ratio of SRs: 61% rating: 4.56 154 83% 4.79 pin-point the exact time and shape of the bursts

20 Text features: ratio of reviews talking about “customer service/support” Hurry reviewers: wrote only one review at the same time of ID registration Human validation: read the reviews and found a reviewer disclosed being solicited for a 5 star review Case studies (cont’) Most of the later reviews are written by “Hurry Reviewers” more than 80% of the singleton reviews are related to “customer service”

21 References 1. Detecting Product Review Spammers using Rating Behaviors 2. Finding Unusual Review Patterns Using Unexpected Rules 3. Spotting Fake Reviewer Groups in Consumer Reviews 4. Opinion spam and analysis 5. Finding Deceptive Opinion Spam by Any Stretch of the Imagination 6. Review Graph based Online Store Review Spammer Detection 7. Merging Multiple Criteria to Identify Suspicious Reviews

22 the end

23 Examples All praises Posted in a short time say nothing about the product similar ratings Red flag words mention the name a lot Need to set up the animations for all these elements

24 Types of review spams Duplicate (easy, string matching) Advertisements Other easy-to-detect spams (all symbols, numbers, empty, etc.) untruthful (very hard, need machines to understand the intentions of the reviews)

25 Feature based methods Paradigms: Define features, training set and pick a classifier Keys to success: good features, large training data, and powerful classifier

26 Common but not fully investigated Previous methods can not catch them Each reviewer id has only one review Many features used by previous methods simply become meaningless

27 Traditional methods (cont.) [6] uses a graph to describe reinforcement relationships between entities: good / bad reviews influence their authors, who in turn influence the stores, which in turn influence its reviews. reviews storesreviewers

28 Traditional methods (cont.) reviews storesreviewers { If these reviews are posted in a short period with consistent ratings... The store will be regarded as a good one

29 Traditional methods review features rating: 4, bag-of-words: {switch, nook, kindle, why, what, learn} reviewer featuresname: KKX; number of reviews: 1 average rating: 4 store/product featuresKindle average rating: 4/5 stars, price: $199 group spammingKKX wrote one review only, failed the frequency test what can you conclude from these features?


Download ppt "Review Spam Detection via Temporal Pattern Discovery Sihong Xie, Guan Wang, Shuyang Lin, Philip S. Yu Department of Computer Science University of Illinois."

Similar presentations


Ads by Google