Download presentation
Presentation is loading. Please wait.
Published byMay Lee Modified over 9 years ago
1
IR, IE and QA over Social Media Social media (blogs, community QA, news aggregators) Complementary to “traditional” news sources (Rathergate) Grow faster than “traditional” web content, gap widening Traditional/published: 4Gb/day; social media: 10gb/day [from Andrew Tomkins/Yahoo!, “Future or Web Search”, May 2007] Research challenges Low(er) quality Content more dynamic User interactions crucial: ratings, comments, link structure to retrieve documents and to evaluate extracted information
2
Finding High Quality Content for IE/QA Goal: find high-quality content (accurate & well-presented) Setting: Community QA (Yahoo! Answers) Classifying social media (e.g., cQA) is substantially different from document classification Sources of information Content analysis Usage data (page views, etc) Community ratings, link analysis General framework for quality estimation in social media Graph-based model of contributor relationships, combined with content and usage analysis Can identify high-quality items with accuracy ~ human agreement E. Agichtein, C. Castillo, D. Donato, A. Gionis, G. Mishne, Finding High Quality Content in Social Media, in Proc. of WSDM 2008
3
Finding Relevant Content for IE/QA Goal: given a query, rank social content (cQA) by expected relevance and quality Approach: Learn ranking functions specifically for social media retrieval Features Textual content: relevance, stylistics, language models User Interactions: link structure, discussion threads User ratings: incorporate user-provided content ratings Method: Gradient boosting (GBrank) Developed a new objective function for learning ranking function using (noisy) preference data. Results: Outperform Yahoo! default ranking or naïve ranking by user votes Can be made robust to ratings spam [same authors, to appear in AIRWeb 2008] J. Bian, Y. Liu, E. Agichtein and H. Zha. Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media, to appear in Proc. of WWW 2008
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.