To trust or not, is hardly the question! Sai Moturu.

Slides:



Advertisements
Similar presentations
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Measuring Reliability in Wikipedia Wen-Yuan Zhu
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Measuring Author Contributions to the Wikipedia B. Thomas Adler, Luca de Alfaro, Ian Pye and Vishwanath Raman Computer Science Dept. UC Santa Cruz, CA,
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
Opinion Spam and Analysis Nitin Jindal and Bing Liu Department of Computer Science University of Illinois at Chicago.
Self Organization of a Massive Document Collection
The Complex Dynamics of Collaborative Tagging Harry Halpin University of Edinburgh Valentin Robu CWI, Netherlands Hana Shepherd Princeton University WWW.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Evaluating Search Engine
Assuming normally distributed data! Naïve Bayes Classifier.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
A Content-Driven Reputation System for the Wikipedia Nan Li
CS246 Search Engine Bias. Junghoo "John" Cho (UCLA Computer Science)2 Motivation “If you are not indexed by Google, you do not exist on the Web” --- news.com.
Recall The Team Skills 1. Analyzing the Problem 2. Understanding User and Stakeholder Needs 3. Defining the System 4. Managing Scope 5. Refining the System.
Title of Book Author Anita Sego. Information About Book When was it written? Who published it? What other books has the author written?
Quality-driven Integration of Heterogeneous Information System by Felix Naumann, et al. (VLDB1999) 17 Feb 2006 Presented by Heasoo Hwang.
TOGETHER EVERYONE ACHIEVES MORE
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Real-Time Odor Classification Through Sequential Bayesian Filtering Javier G. Monroy Javier Gonzalez-Jimenez
Guillaume Rivalle APRIL 2014 MEASURE YOUR RESEARCH PERFORMANCE WITH INCITES.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Multilingual Synchronization focusing on Wikipedia
by B. Zadrozny and C. Elkan
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
Detecting Promotional Content in Wikipedia Shruti Bhosale Heath Vinicombe Ray Mooney University of Texas at Austin 1.
Iterative Readability Computation for Domain-Specific Resources By Jin Zhao and Min-Yen Kan 11/06/2010.
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Build it Tweak it Use it Know it Love it. A tool to collaborate on projects What does Collaborate mean? To work together.
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou KBS Computing.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Sample Size Determination in Studies Where Health State Utility Assessments Are Compared Across Groups & Time Barbara H Hanusa 1,2 Christopher R H Hanusa.
Rev.04/2015© 2015 PLEASE NOTE: The Application Review Module (ARM) is a system that is designed as a shared service and is maintained by the Grants Centers.
WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
REPUTATION SYSTEMS FOR OPEN COLLABORATION CACM 2010 Bo Adler, Luca de Alfaro, Ashutosh Kulshreshtha, Ian Pye Reviewed by : Minghao Yan.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Securing and Sharing Workbooks Lesson 11. The Review Tab Microsoft Excel provides several layers of security and protection that enable you to control.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
The Effective Use of Blogging and Wikis Chris Chandler “Do not confine your children to your own learning, for they were born in another time.” - Chinese.
Doc.: IEEE /0147r0 Submission January 2012 Rolf de Vegt (Qualcomm)) Slide ai Spec Development Process Update Proposal Date:
Software Development Process CS 360 Lecture 3. Software Process The software process is a structured set of activities required to develop a software.
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
FDA Draft Guidance Good Reprint Practices for the Distribution of Medical Journal Articles and Medical or Scientific Reference Publications on Unapproved.
PSY 325 AID Education Expert/psy325aid.com FOR MORE CLASSES VISIT
How to Get Published: Surviving in the Academic World Stephen E. Condrey, Ph.D. Vice President, American Society for Public Administration Editor-in-Chief,
Reputation Systems For Open Collaboration, CACM 2010 Bo Adler, Luca de Alfaro et al. Nishith Agarwal
Effects of User Similarity in Social Media Ashton Anderson Jure Leskovec Daniel Huttenlocher Jon Kleinberg Stanford University Cornell University Avia.
Simulation-Based Approach for Comparing Two Means
Wikipedia Network Analysis: Commonality detection among Wikipedia authors Deepthi Sajja.
Terminal Learning Objective
Revision (Part II) Ke Chen
Revision (Part II) Ke Chen
Making a Change.
COLLABORATING VIA BLOGS AND WIKIS
Junghoo “John” Cho UCLA
Loyola’s Performance Management Process For Employees
INF 141: Information Retrieval
Presentation transcript:

To trust or not, is hardly the question! Sai Moturu

TrustQuality Popularity Reach

Review two articles Briefly summarize other publications

What are the hallmarks of consistently good information? Objectivity: unbiased information Completeness: self explanatory Pluralism: not restricted to a particular viewpoint Define prepositions of trust

Six macro-areas: Quality of user, user distribution and leadership, stability, controllability, quality of editing and importance of an article. Using the ten propositions, 50 sources of trust evidence are identified.

Necessary to control the meaning of each trust factor in relationship to the others IF stability is high AND (length is short OR edit is low OR importance is low) THEN warning IF leadership is high AND dictatorship is high THEN warning IF length is high AND importance is low THEN warning

Featured articles vs. Standard articles

Basic The better the authors, the better the article quality PeerReview Assumption: A contributor reviews the content before modifying it, thereby approving the content that he/she does not edit

ProbReview Improved assumption: A contributor may not review the entire article before modifying it The farther a word is from another that the author has written, the lower the probability that he/she has read it In conflicts, the higher probability is considered Probability is modeled as a monotonically decaying function of the distance between the words Naïve The longer the article is, the better its quality Used as a baseline for comparison

1. Initialize all quality and authority values equally 2. For each iteration Use authority values from previous iteration to compute quality Use quality values to compute authority Normalize all quality and authority values 3. Repeat step 2 until convergence (alternatives: repeat until difference is very small or until maximum iterations have been reached)

Use a set of articles on countries that have been assigned quality labels by Wikipedia’s Editorial team Preprocessing: Bot revisions were removed from the analysis. Consecutive edits by a user were removed and final edit was used.

Normalized discounted cumulative gain at top k Suited for ranked articles that have multiple levels of assessment Spearman’s rank correlation Relevant for comparing the agreement between two rankings of the same set of objects

ProbReview works best with decay scheme 2 or 3. Article length seems to be correlated with article quality Adding this to Basic and PeerReview models showed some improvement but ProbReview did not benefit

Revision trust model may help address Article trust Fragment trust Author trust A dynamic Bayesian network is used to model the evolution of article trust over revisions Wikipedia featured articles, clean-up articles and normal articles are used for evaluation

Uses revision history as well as the reputation of the contributing authors Assigns trust to text

Propose the use of a trust tab in Wikipedia Link-ratio: Ratio between the number of citation and the number of non-cited occurrences of the encyclopedia term Evaluation: compare link ratio values for featured, normal and clean-up articles

Propose a content-driven reputation system for authors Authors gain reputation when their work is preserved by subsequent authors and lose reputation when edits are undone or quickly rolled back Evaluation: Low-reputation authors have larger than average probability of having poor quality as judged by human observers and are undone by later editors

A different question: What are the controversial articles? Uses edit and collaboration history Two Models: Basic and Contributor Rank Contributor Rank model tries to differentiate between disputes due to the article and those due to the aggressiveness of the contributors, with the former being the one that is to be measured Evaluation: Identification of labeled controversial articles

Interesting area to work on Different angles to consider and different questions too Data is available easily and has lots of relevant features Wikipedia editorial team classified articles help evaluation Great scope for more work in this area I want to look at this from the health perspective

Feb 29, 2008