Finding High-Quality Content in Social Media chenwq 2011/11/26.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
SIGIR 2008 Yandong Liu, Jiang Bian, Eugene Agichtein from Emory & Georgia Tech University.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Vote Calibration in Community Question-Answering Systems Bee-Chung Chen (LinkedIn), Anirban Dasgupta (Yahoo! Labs), Xuanhui Wang (Facebook), Jie Yang (Google)
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
The Complex Dynamics of Collaborative Tagging Harry Halpin University of Edinburgh Valentin Robu CWI, Netherlands Hana Shepherd Princeton University WWW.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.
Quality-Aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim, Aixin Sun, and Roger H. L. Chiang. In Proceedings.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
1 Opinion Spam and Analysis (WSDM,08)Nitin Jindal and Bing Liu Date: 04/06/09 Speaker: Hsu, Yu-Wen Advisor: Dr. Koh, Jia-Ling.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Iterative Readability Computation for Domain-Specific Resources By Jin Zhao and Min-Yen Kan 11/06/2010.
Know your Neighbors: Web Spam Detection Using the Web Topology Presented By, SOUMO GORAI Carlos Castillo(1), Debora Donato(1), Aristides Gionis(1), Vanessa.
1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
Microblogs: Information and Social Network Huang Yuxin.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
IR, IE and QA over Social Media Social media (blogs, community QA, news aggregators)  Complementary to “traditional” news sources (Rathergate)  Grow.
Gao Cong, Long Wang, Chin-Yew Lin, Young-In Song, Yueheng Sun SIGIR’08 Speaker: Yi-Ling Tai Date: 2009/02/09 Finding Question-Answer Pairs from Online.
Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
Algorithmic Detection of Semantic Similarity WWW 2005.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Graph Algorithms: Classification William Cohen. Outline Last week: – PageRank – one algorithm on graphs edges and nodes in memory nodes in memory nothing.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Hongbo Deng, Michael R. Lyu and Irwin King
清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu University.
Post-Ranking query suggestion by diversifying search Chao Wang.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Analyzing and Predicting Question Quality in Community Question Answering Services Baichuan Li, Tan Jin, Michael R. Lyu, Irwin King, and Barley Mak CQA2012,
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Neighborhood - based Tag Prediction
Effects of User Similarity in Social Media Ashton Anderson Jure Leskovec Daniel Huttenlocher Jon Kleinberg Stanford University Cornell University Avia.
Brian Whitman Paris Smaragdis MIT Media Lab
A Comparative Study of Link Analysis Algorithms
Postdoc, School of Information, University of Arizona
Machine Learning Telepathy for Shift Right Approach
Topic: Semantic Text Mining
--WWW 2010, Hongji Bao, Edward Y. Chang
Presentation transcript:

Finding High-Quality Content in Social Media chenwq 2011/11/26

Authors Eugene Agichtein Emory University Research: Intelligent Information Access Lab (IRLab) News:our team wins the "Best Paper" award at SIGIR 2011.

Abstract From the early 2000s,user-generated content h as become popular on the web.The quality of u ser-generated content varies drastically from e xcellent to abuse and spam. To separate high-quality content from the rest automatically Graph-based framework –combine the different sources of evidence in a classi fication formulation

MODELING CONTENT QUALITY Related work CONTENT QUALITY ANALYSIS EXPERIMENT & Conclusion Contents

Related work Link analysis in social media Propagating reputation Question/answering portals and forums Expert finding Text analysis for content quality Implicit feedback for ranking

Related work Link analysis in social media –G = (V, E) –V corresponding to the users of a question/answer syste m –a directed edge e = (u, v) ∈ E from a user u ∈ V to a user v ∈ V if user u has answered to at least one question of user v –G’ = (V, E’) PageRank , ExpertiseRank, HITS

MODELING CONTENT QUALITY Related work CONTENT QUALITY ANALYSIS EXPERIMENT & Conclusion Contents

CONTENT QUALITY ANALYSIS ——Intrinsic content quality As a baseline, we use textual features only—w ith all word n-grams up to length 5 that appear in the collection more than 3 times used as fea turesusers

Punctuation and typosSyntactic and semanticGrammaticality 1.Punctuation 2.Capitalization 3.Spacing density 4.Character-level entropy 5.Spelling mistakes 6.Out-of-vocabulary words 1.Average number of syllables per word 2.Entropy of word lengths 3.Readability measures 1.Part-of-speech sequences 2.Formality score 3.Distance between its (trigram) language model and several given language models CONTENT QUALITY ANALYSIS ——Intrinsic content quality

CONTENT QUALITY ANALYSIS ——User relationships items and users Graph user-user Graph uq answer u v u has answered a question from user v

CONTENT QUALITY ANALYSIS ——Usage statistics The number of clicks on some item The dwell time on some item

CONTENT QUALITY ANALYSIS ——classification framework We cast the problem of quality ranking as a bi nary classification –support vector machines –log-linear classifiers –stochastic gradient boosted trees Our goal is to discover interesting,well for-mul ated and factually accurate content

MODELING CONTENT QUALITY Related work CONTENT QUALITY ANALYSIS EXPERIMENT & Conclusion Contents

MODELING CONTENT QUALITY ——user relationships Our dataset, viewed as a graph as illustrated i n Figure 1

MODELING CONTENT QUALITY ——user relationships The relationships between questions, users as king and answering questions, and answers c an be captured by a tripartite graph outlined in Figure 2

MODELING CONTENT QUALITY ——user relationships the unique characteristics of the community q uestion/answering domain

MODELING CONTENT QUALITY ——user relationships Question subtree –Q Features from the question being answered –QU Features from the asker of the question being answe red –QA Features from the other answers to the same questio n

MODELING CONTENT QUALITY ——user relationships User subtree –UA Features from the answers of the user –UQ Features from the questions of the user –UV Features from the votes of the user –UQA Features from answers received to the user’s quest ions –U Other user-based features

MODELING CONTENT QUALITY ——user relationships Question features

MODELING CONTENT QUALITY ——user relationships Implicit user-user relations G = (V,E) –E = Ea ∪ Eb ∪ Ev ∪ Es ∪ E+ ∪ E− Gx = (V,Ex) –h x the vector of hub scores on the vertices V –a x the vector of authority scores –p x the vector of PageRank scores –p´ x the vector of PageRank scores in the transposed gra ph

MODELING CONTENT QUALITY ——user relationships Implicit user-user relations

MODELING CONTENT QUALITY ——user relationships Content features for QA –to identify the most salient features for the specific t asks of question or answer quality classification the KL-divergence between the language models of the two texts their non-stopword overlap the ratio between their lengths

MODELING CONTENT QUALITY ——user relationships Usage features for QA –number of item views (clicks) –Metadata of question how long ago the question was posted –derived statistics the expected number of views for a given ca tegory the deviation from the expected number of v iews –other second-order statistics the click frequency

MODELING CONTENT QUALITY Related work CONTENT QUALITY ANALYSIS EXPERIMENT & Conclusion Contents

Experiment & Conclusions ——EXPERIMENTAL SETTING Dataset Edges induced from the whole dataset.

MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics

Thanks for attention!