Antisocial Behavior in Online Discussion Communities Authors: Justin Cheng, Cristian Danescu-Niculescu-Mizily, Jure Leskovec Presented by: Ananya Subburathinam.

Slides:



Advertisements
Similar presentations
Using Growth Models to improve quality of school accountability systems October 22, 2010.
Advertisements

An analysis of Social Network-based Sybil defenses Bimal Viswanath § Ansley Post § Krishna Gummadi § Alan Mislove ¶ § MPI-SWS ¶ Northeastern University.
Tracking Information Epidemics in Blogspace A paper synopsis Alistair Wright, Ken Tan, Kisan Kansagra, Jenn Houston.
Chapter 8 Flashcards.
Imbalanced data David Kauchak CS 451 – Fall 2013.
Part II Sigma Freud & Descriptive Statistics
A Metric for Software Readability by Raymond P.L. Buse and Westley R. Weimer Presenters: John and Suman.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
1 Yuxiao Dong *$, Jie Tang $, Sen Wu $, Jilei Tian # Nitesh V. Chawla *, Jinghai Rao #, Huanhuan Cao # Link Prediction and Recommendation across Multiple.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Statistical Analysis of the Social Network and Discussion Threads in Slashdot Vicenç Gómez, Andreas Kaltenbrunner, Vicente López Defended by: Alok Rakkhit.
Software Requirements
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Relationships Among Variables
10 Tactics for Building Online Community. Empowering Online Community Since 2001 © 2007 GoLightly, Inc. Introduction - Presenters Heather McKeon Miller.
Ordinal Logistic Regression “Good, better, best; never let it rest till your good is better and your better is best” (Anonymous)
Chemometrics Method comparison
Signatures As Threats to Privacy Brian Neil Levine Assistant Professor Dept. of Computer Science UMass Amherst.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
1 Opinion Spam and Analysis (WSDM,08)Nitin Jindal and Bing Liu Date: 04/06/09 Speaker: Hsu, Yu-Wen Advisor: Dr. Koh, Jia-Ling.
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
Using the Foundation Phase Child Development Assessment Profile Training for Assessment.
ENVS 355 Data, data, data Models, models, models Policy, policy, policy.
Kristina Lerman Aram Galstyan USC Information Sciences Institute Analysis of Social Voting Patterns on Digg.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
Our Twitter Profiles, Our Selves: Predicting Personality with Twitter Daniele Quercia, Michal Kosinski, David Stillwell, Jon Crowcroft COMP4332 Wong Po.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys Bangkok,
Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development.
Can We Predict Eat Out Behavior of a Person from Tweets and Check-ins? Md. Taksir Hasan Majumder ( ) Md. Mahabur Rahman ( ) Department of Computer.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Microblogs: Information and Social Network Huang Yuxin.
Predicting Positive and Negative Links in Online Social Networks
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.
Mining Bulletin Board Systems Using Community Generation Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua Zhou PAKDD’08 Reporter: Che-Wei, Liang Date:
 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.
CEN st Lecture CEN 4021 Software Engineering II Instructor: Masoud Sadjadi Monitoring (POMA)
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Bimal Viswanath § Ansley Post § Krishna Gummadi § Alan Mislove ¶ § MPI-SWS ¶ Northeastern University SIGCOMM 2010 Presented by Junyao Zhang Many of the.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Prediction of Influencers from Word Use Chan Shing Hei.
Poorva Potdar Sentiment and Textual analysis of Create-Debate data EECS 595 – End Term Project.
Twitter Games: How Successful Spammers Pick Targets Vasumathi Sridharan, Vaibhav Shankar, Minaxi Gupta School of Informatics and Computing, Indiana University.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Group Activity I: Spot the Fake ESSIR 2015 Thessaloniki, Sep 3-4, 2015.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Template provided by: “posters4research.com” Data were collected by interviewing the caseload holders at Services A and B with reference to patients case.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Jointly Modeling Topics, Events and User Interests on Twitter Qiming DiaoJing Jiang School of Information Systems Singapore Management University.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Eran Banin. Background User-generated content is critical to the success of any online platforms (CNN, Facebook, StackOverflow). These sites engage their.
Mining information from social media
Copyright © 2014 by Nelson Education Limited Chapter 11 Introduction to Bivariate Association and Measures of Association for Variables Measured.
Jure Leskovec (Stanford), Daniel Huttenlocher and Jon Kleinberg (Cornell)
+ “Introduction to Blogging” Katelyn Jacobsen By WordPress.org.
Peer feedback on a draft TMA answer: Is it usable? Is it used? Mirabelle Walker Department of Communication and Systems.
Advantages of Single Page Websites. Introduction ●Single page websites vs responsive websites, has been the biggest debate in the design community. ●The.
Monitoring Attainment and Progress from September 2016 John Crowley Senior Achievement Adviser.
PREDICTION ON TWEET FROM DYNAMIC INTERACTION Group 19 Chan Pui Yee Wong Tsz Wing Yeung Chun Kit.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Uncovering Social Spammers: Social Honeypots + Machine Learning
By : Namesh Kher Big Data Insights – INFM 750
Effects of User Similarity in Social Media Ashton Anderson Jure Leskovec Daniel Huttenlocher Jon Kleinberg Stanford University Cornell University Avia.
Maths Information Evening
Sentiment analysis tools
Product reliability Measuring
I271B Quantitative Methods
Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang
Deep Learning Research & Application Center
Fast Sequences of Non-spatial State Representations in Humans
Presentation transcript:

Antisocial Behavior in Online Discussion Communities Authors: Justin Cheng, Cristian Danescu-Niculescu-Mizily, Jure Leskovec Presented by: Ananya Subburathinam

Introduction Characterizing antisocial behavior in online communities. Trolling, flaming, bullying, harassment, etc Purpose: Identify and understand antisocial behavior Important to community maintainers

Data Used CNN.com, Breitbart.com, IGN.com 18 months, 1.7 million users 40 million posts, 100 million votes Permanent ban lists Timestamped user activity

Characterizing Antisocial Behavior Compare Never-Banned users(NBU) and Future-Banned Users(FBU) FBU characteristics: Write less similarly Harder to understand Less positive language More profanity Posts more in individual threads

Typology of Antisocial users High deletion rate FBUs vs. Low Deletion Rate FBUs Higher deletion rate users: Write less similarly to others Post more in each discussion they are in Low deletion rate users: Posts spread across many discussions Attract less attention Who gets worse with time, and who gets better?

Predicting Future Banning Features used: Post content User activity Community response Moderators’ actions Reliable prediction from 5 – 10 posts Predict with over 80% AUC

Related Work What’s considered “Antisocial”? Trolling – “negatively marked online behavior”, flaming, griefing. Studies antisocial users and their evolution over time Large-scale data-driven analysis Goals – quantitative insights, early detection of trolls Other works: Detecting Wikipedia vandalism, bad behavior in online games.

Data Preparation Measuring Undesired Behavior Down votes, comments, reports, deletion or bans Deletions and bans are more precise Excludes temporary bans and link spammers

Data Preparation Matching FBUs and NBUs FBU tweets significantly more causing large difference Each FBU is matched with a similar NBU Measuring text quality Flaws with dictionary-based approach, classifiers trained on deleted posts. Human judgement – label 6000 random posts on a 1-5 scale Rating 3 or lower indicates an inappropriate post Train logistic regression model AUC of 0.70

Understanding Antisocial Behavior FBUs – write less less similarly to others have high ARI (less readable) Use less positive words, more swearing

How do FBUs generate activity around themselves? Community dependent Breitbart, IGN – more likely to respond to posts CNN – more likely to start discussions They get more responses than average users They post more per thread than average users

How do FBUs generate activity around themselves?

Evolution over time What happens to FBUs change over time? Degrading text quality Increases animosity from the community

Evolution Over Time First 10% and last 10% posts from 200 random NBUs and FBUs were rated from 1-5 by workers from Mechanical Turk on the basis of appropriateness.

Effect of time Comparison of posts of similar text quality, one from the first 10% of and one from last 10% of user’s life. Time had a significant effect on post deletion. Effect of censorship 2 groups : had 4 or more vs. one or less of first five posts deleted Match users from the two groups based on text quality of these posts Compare quality of subsequent posts Significant difference is observed Evolution Over Time

Types of Antisocial Users Users with differing deletion rates Bimodal – Hi-FBU and Lo-FBU Hi-FBU – get more responses, lower post quality, fewer threads, more posts per thread. Lo-FBU – lower post deletion rate, but rises in second half of their life. First half – more threads, second half – more posts in fewer threads

Antisocial Behavior in Two Phases Fit linear regression over the post deletion rates for the two halves of a user’s life The slopes of the two lines are m1 and m2.

Identifying Antisocial Users Factors: Post features Undesirable content, less readable Activity features more time in individual threads, posts per day, votes Community features Who interacts with a user Moderator features Posts deleted, slopes and intercepts of deletion rates,

Predicting antisocial behavior Predict from first ten posts with accuracy of 0.74 Classifier is robust, even without moderator features (AUC of 0.79) Only using proportion of deleted posts – AUC of 0.73 Baseline bag of words model – AUC of 0.69 (less across communities)

Predicting antisocial behavior

Possible Improvements suggested by authors More fine grained user labeling Analysis of user interactions in posts Effect of groups of users

Questions Is the term "antisocial behavior/user " social platform specific,if yes how would the approach of author scale to other social platforms? What characteristics/features must be taken into account to detect more d eceptive forms of antisocial behavior? Authors claim that "the more posts a user eventually makes, the more difficult it is to predict whether they will get eventually banned later on" is unclear from the paper? Why is it that the more users post, the more difficult it is to predict whether they will get banned later on? If a person had an emotional outburst on a particular topic and is classified as an FBU, the persons every post will be scrutinized more closely. Is that really necessary?

Questions How do you think authors could have added to the features in identifying anti­social behavior? How does the automatic tool for identification of antisocial users by t he first 10 posts work if the user posts positive comments initially but tends to go negative towards the end? As the analysis is limited to permanently banned users, whether this method can guarantee that innocent user is not going to banned or not if we use this classifier? If a user is posting antisocial post once in a while then how to identify that kind of user?