Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Web Mining.
A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Multimedia Answer Generation for Community Question Answering.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
Vote Calibration in Community Question-Answering Systems Bee-Chung Chen (LinkedIn), Anirban Dasgupta (Yahoo! Labs), Xuanhui Wang (Facebook), Jie Yang (Google)
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Asking Questions on the Internet
The Complex Dynamics of Collaborative Tagging Harry Halpin University of Edinburgh Valentin Robu CWI, Netherlands Hana Shepherd Princeton University WWW.
SFU, CMPT 741, Fall 2009, Martin Ester 418 Outlook Outline Trends in KDD research Graph mining and social network analysis Recommender systems Information.
Evaluating Search Engine
Finding High-Quality Content in Social Media chenwq 2011/11/26.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Computing Trust in Social Networks
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
Web Search – Summer Term 2006 VII. Selected Topics - PageRank (closer look) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Overview of Web Data Mining and Applications Part I
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.
Know your Neighbors: Web Spam Detection Using the Web Topology Presented By, SOUMO GORAI Carlos Castillo(1), Debora Donato(1), Aristides Gionis(1), Vanessa.
Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.
Evaluating Web Pages Techniques to apply and questions to ask.
IR, IE and QA over Social Media Social media (blogs, community QA, news aggregators)  Complementary to “traditional” news sources (Rathergate)  Grow.
Algorithmic Detection of Semantic Similarity WWW 2005.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
ANALYZING THE SOCIAL WEB an introduction 1. OUTLINE 1.Introduction 2.Network Structure and Measures 3.Social Information Filtering 2.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.
Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.
Evaluating Web Pages Techniques to apply and questions to ask.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Kendra Hunter & Charde Johnson EDUC Dr. M. Kariuki.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Search User Behavior: Expanding The Web Search Frontier
Over 1,000 books, journals, videos and reference material
E-Commerce Theories & Practices
A Comparative Study of Link Analysis Algorithms
Postdoc, School of Information, University of Arizona
Web Mining Department of Computer Science and Engg.
AGMLAB Information Technologies
Presentation transcript:

Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI

 Introduction  Main challenges  What we are targeting?  Papers read  Discussion on findings of the base paper  Further plans OUTLINE:

INTRODUCTION:  In the early 1990s, the majority of web users were consumers of content, created by a relatively small amount of publishers.  In the early 2000s, user-generated content has become increasingly popular on the web: more and more users participate in content creation, rather than just consumption.

 Popular user generated content domains include web forums, photo and video sharing communities, as well as social networking platforms such as Facebook and MySpace.  Community-driven question/answering portals are a particular form of user-generated content that is gaining a large audience in recent years.  These portals, in which users answer questions posed by other users, provide an alternative channel for obtaining information on the web: rather than browsing results of search engines.

 The main challenge posed by content in social media sites is the fact that the distribution of quality has high variance: from very high-quality items to low-quality, sometimes abusive content.  This makes the tasks of filtering and ranking in such systems more complex than in other domains.  Our main task is identifying high quality content in community-driven question/answering sites, As a test case, we focus on Yahoo! Answers. MAIN CHALLENGES:

“ Yahoo! Answers ”  Yahoo! Answers, a large community question/answering portal that is particularly rich in the amount and types of content where people ask and answer questions on any topic.  A user can vote for answers of other users, gives rating “thumbs up” or “thumbs down”, mark interesting questions, and even report abusive behavior.

 Each user has a threefold role : asker, answerer & evaluator.  The central element of the Yahoo! Answers system are questions. Each question has a lifecycle :  It starts in an “open” state where it receives answers.  After automatic timeout in the system, the question is considered “closed”  Once a best answer is chosen, the question is “resolved”  For the community question/answering domain, we show that our system is able to separate high-quality items from the rest.

? What are the elements of social media that can be used to facilitate automated discovery of high-quality content ? ? How are these different factors related ? Is content alone enough for identifying high-quality items ? ? Can community feedback approximate judgments of specialists ? WHAT WE ARE TARGETING?

Sr. No. Paper NameAuthorYearConclusion 1 Finding High-Quality Content in Social Media Carlos Castillo2008A general classification framework for quality estimation in social media. 2Internet-scale collection of human-reviewed data. W. C. Baker.2007Conducted experiments analysing data quality of an Internet-scale system for human-reviewed data. 3Automated essay scoring using bayes. L. M. Rudner and T. Liang. 2002Presented a Bayesian approach to essay scoring based on the well developed text classification literature. PAPERS READ:

 Related work  Link analysis in social media. Link-based ranking algorithms is successful in estimating the quality of web pages. It uses User Answers Graph. where - Graph is represented as G. - Vertex is represented as V corresponding to the users of a ques /ans system. - Directed edge e = (u, v) ∈ E from a user u ∈ V to a user v ∈ V if user u has answered to at least one question of user v. The two most prominent link-based ranking algorithms are PageRank and HITS. Discussion On Findings Of Base Paper:

 Text analysis for content quality Some of the features employed in systems are lexical, such as word length, measures of vocabulary irregularity via repetitiveness, Other features take into account usage of punctuation and detection of common grammatical error.  Implicit feedback for ranking Implicit feedback from millions of web users has been shown to be a valuable source of result quality and ranking information.

 CONTENT QUALITY ANALYSIS IN SOCIAL MEDIA Evaluation of content quality is an essential module for performing more advanced information - retrieval tasks on the question/answering system.  Punctuation and typos : A particular form of low visual quality are misspellings and typos; additional features in our set quantify the number of spelling mistakes, as well as the number of out-of-vocabulary words.  Grammaticality : The content is annotated with part-of-speech (POS) tags and use the tag n-grams. This allows us to capture, to some degree, the level of “correctness” of the grammar used.

MODELING APPROACH: Application-specific user relationships :

 The relationships between questions, users asking and answering questions, & answers can be captured by a tripartite.  Since a user is not allowed to answer his/her own questions, there are no triangles in the graph

EXPERIMENTAL SETTING:  Let Q 0 and A 0 be the sets of questions and answers, respectively, included in the evaluation dataset.  Let U 1 be the set of users who have made a question in Q 0 or given an answer in A 0.  Additionally we select Q 1 to be the set of all questions asked by all users in U 1.

EXPERIMENTAL SETTING:  Similarly we select A 1 to be the set of answers given by users in U 1 and A 2 to be the set of all the answers to questions in Q1.  Obviously Q 0 ⊆ Q 1 and A 0 ⊆ A 1.  Our dataset is then defined by the nodes ( Q 1, A 1 ∪ A 2, U 1) and the edges induced from the whole dataset.

There is a positive correlation between question quality and answer quality. Good answers are much more likely to be written in response to good questions, and bad questions are the ones that attract more bad answers.

FURTHER PLAN: Further plan is to refine the analysis of high quality contents. We can integrate the results from all question/answer sites for concluding the best contents amongst them to provide users with more efficient services. Future plan is to more specifically explore the relationships and usage features to automatically identify malicious users.

REFERENCES:  E. Agichtein, E. Brill, S. T. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In SIGIR, pages 3–10,  K. Ali and M. Scarr. Robust methodologies for modeling web click distributions. In www, pages 511–520,  E. B. Page. Computer grading of student prose, using modern concepts and software. Journal of Experimental Education, 62(2), 1994.

THANK YOU !