Download presentation
Presentation is loading. Please wait.
1
Finding High-Quality Content in Social Media chenwq 2011/11/26
2
Authors Eugene Agichtein Emory University Research: Intelligent Information Access Lab (IRLab) News:our team wins the "Best Paper" award at SIGIR 2011.
3
Abstract From the early 2000s,user-generated content h as become popular on the web.The quality of u ser-generated content varies drastically from e xcellent to abuse and spam. To separate high-quality content from the rest automatically Graph-based framework –combine the different sources of evidence in a classi fication formulation
4
MODELING CONTENT QUALITY Related work CONTENT QUALITY ANALYSIS EXPERIMENT & Conclusion 1 1 2 2 3 3 4 4 Contents
5
Related work Link analysis in social media Propagating reputation Question/answering portals and forums Expert finding Text analysis for content quality Implicit feedback for ranking
6
Related work Link analysis in social media –G = (V, E) –V corresponding to the users of a question/answer syste m –a directed edge e = (u, v) ∈ E from a user u ∈ V to a user v ∈ V if user u has answered to at least one question of user v –G’ = (V, E’) PageRank , ExpertiseRank, HITS
7
MODELING CONTENT QUALITY Related work CONTENT QUALITY ANALYSIS EXPERIMENT & Conclusion 1 1 2 2 3 3 4 4 Contents
8
CONTENT QUALITY ANALYSIS ——Intrinsic content quality As a baseline, we use textual features only—w ith all word n-grams up to length 5 that appear in the collection more than 3 times used as fea turesusers
9
Punctuation and typosSyntactic and semanticGrammaticality 1.Punctuation 2.Capitalization 3.Spacing density 4.Character-level entropy 5.Spelling mistakes 6.Out-of-vocabulary words 1.Average number of syllables per word 2.Entropy of word lengths 3.Readability measures 1.Part-of-speech sequences 2.Formality score 3.Distance between its (trigram) language model and several given language models CONTENT QUALITY ANALYSIS ——Intrinsic content quality
10
CONTENT QUALITY ANALYSIS ——User relationships items and users Graph user-user Graph uq answer u v u has answered a question from user v
11
CONTENT QUALITY ANALYSIS ——Usage statistics The number of clicks on some item The dwell time on some item
12
CONTENT QUALITY ANALYSIS ——classification framework We cast the problem of quality ranking as a bi nary classification –support vector machines –log-linear classifiers –stochastic gradient boosted trees Our goal is to discover interesting,well for-mul ated and factually accurate content
13
MODELING CONTENT QUALITY Related work CONTENT QUALITY ANALYSIS EXPERIMENT & Conclusion 1 1 2 2 3 3 4 4 Contents
14
MODELING CONTENT QUALITY ——user relationships Our dataset, viewed as a graph as illustrated i n Figure 1
15
MODELING CONTENT QUALITY ——user relationships The relationships between questions, users as king and answering questions, and answers c an be captured by a tripartite graph outlined in Figure 2
16
MODELING CONTENT QUALITY ——user relationships the unique characteristics of the community q uestion/answering domain
17
MODELING CONTENT QUALITY ——user relationships Question subtree –Q Features from the question being answered –QU Features from the asker of the question being answe red –QA Features from the other answers to the same questio n
18
MODELING CONTENT QUALITY ——user relationships User subtree –UA Features from the answers of the user –UQ Features from the questions of the user –UV Features from the votes of the user –UQA Features from answers received to the user’s quest ions –U Other user-based features
19
MODELING CONTENT QUALITY ——user relationships Question features
20
MODELING CONTENT QUALITY ——user relationships Implicit user-user relations G = (V,E) –E = Ea ∪ Eb ∪ Ev ∪ Es ∪ E+ ∪ E− Gx = (V,Ex) –h x the vector of hub scores on the vertices V –a x the vector of authority scores –p x the vector of PageRank scores –p´ x the vector of PageRank scores in the transposed gra ph
21
MODELING CONTENT QUALITY ——user relationships Implicit user-user relations
22
MODELING CONTENT QUALITY ——user relationships Content features for QA –to identify the most salient features for the specific t asks of question or answer quality classification the KL-divergence between the language models of the two texts their non-stopword overlap the ratio between their lengths
23
MODELING CONTENT QUALITY ——user relationships Usage features for QA –number of item views (clicks) –Metadata of question how long ago the question was posted –derived statistics the expected number of views for a given ca tegory the deviation from the expected number of v iews –other second-order statistics the click frequency
24
MODELING CONTENT QUALITY Related work CONTENT QUALITY ANALYSIS EXPERIMENT & Conclusion 1 1 2 2 3 3 4 4 Contents
25
Experiment & Conclusions ——EXPERIMENTAL SETTING Dataset Edges induced from the whole dataset.
26
MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics
27
MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics
28
MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics
29
MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics
30
MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics
31
MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics
32
MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics
33
MODELING CONTENT QUALITY ——EXPERIMENTAL SETTING Dataset statistics
34
Thanks for attention!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.