Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Slides:



Advertisements
Similar presentations
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Advertisements

Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Measuring Reliability in Wikipedia Wen-Yuan Zhu
TrustRank Algorithm Srđan Luković 2010/3482
Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
1 Truth Validation and Veracity Analysis with Information Networks Jiawei Han Data Mining Group, Computer Science University of Illinois at Urbana-Champaign.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, and Jiawei Han SIGMOD 2002 Presented by: Eddie Date: 2002/12/23.
Section 2: Science as a Process
Sampling : Error and bias. Sampling definitions  Sampling universe  Sampling frame  Sampling unit  Basic sampling unit or elementary unit  Sampling.
Cluster based fact finders Manish Gupta, Yizhou Sun, Jiawei Han Feb 10, 2011.
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Chapter 1-4: Properties Commutative Property: the order in which you add or multiply numbers does not change the sum or product Ex = * 8.
PARAMETRIC STATISTICAL INFERENCE
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Date: 2013/8/27 Author: Shinya Tanaka, Adam Jatowt, Makoto P. Kato, Katsumi Tanaka Source: WSDM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Estimating.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
ON THE SELECTION OF TAGS FOR TAG CLOUDS (WSDM11) Advisor: Dr. Koh. Jia-Ling Speaker: Chiang, Guang-ting Date:2011/06/20 1.
NUMERICAL ERROR Student Notes ENGR 351 Numerical Methods for Engineers Southern Illinois University Carbondale College of Engineering Dr. L.R. Chevalier.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Characterizing the Uncertainty of Web Data: Models and Experiences Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti Università degli Studi.
Truth Discovery with Multiple Conflicting Information Providers on the Web KDD 07.
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Academic Research Academic Research Dr Kishor Bhanushali M
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
LOGO Identifying the Influential Bloggers in a Community Nitin Agarwal, Huan Liu, Lei Tang and Philip S. Yu WSDM 2008 Advisor : Dr. Koh Jia-Ling Speaker.
Meeting 15 Introduction to Numerical Methods Error Analysis.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
1 Page Quality: In Search of an Unbiased Web Ranking Presented by: Arjun Dasgupta Adapted from slides by Junghoo Cho and Robert E. Adams SIGMOD 2005.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
IMRank: Influence Maximization via Finding Self-Consistent Ranking
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
SEMANTIC VERIFICATION IN AN ONLINE FACT SEEKING ENVIRONMENT DMITRI ROUSSINOV, OZGUR TURETKEN Speaker: Li, HueiJyun Advisor: Koh, JiaLing Date: 2008/5/1.
1 NAME_________________________________ LIBRARY ORIENTATION--DAY EIGHT CRITICAL EVALUATION OF SOURCES “All researchers, students as well as professional.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
1 Blog Cascade Affinity: Analysis and Prediction 2009 ACM Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Semantic Alignment Spring 2009 Ben-Gurion University of the Negev.
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
Mining Data Streams with Periodically changing Distributions Yingying Tao, Tamer Ozsu CIKM’09 Supervisor Dr Koh Speaker Nonhlanhla Shongwe April 26,
Truth Discovery and Veracity Analysis
PageRank and Markov Chains
Discovery of Blog Communities based on Mutual Awareness
28th September 2005 Dr Bogdan L. Vrusias
Preference Based Evaluation Measures for Novelty and Diversity
Presentation transcript:

Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date :

Outline Introduction Problem Definitions Computational Model – Web Site Trustworthiness and Fact Confidence – Iterative Computation Empirical Study Conclusions 2

Introduction World-wide web – a necessary part of our lives. – ex: Amazon.com, ShopZilla.com. Is the world-wide web always trustable? – There is no guarantee for the correctness of information on the web. 3

Introduction Example 1: Authors of books  incomplete!  incorrect! 4

Introduction Ranking web pages – According to authority based on hyperlinks. – Ex: Authority-Hub analysis, PageRank, more general link-based analysis. Does authority or popularity of web sites lead to accuracy of information? 5

Introduction Veracity problem – Discover the true fact about each object. 6

Problem Definitions Define1: Confidence of facts. – The probability of a fact f being correct, denote by s(f). Define2: Trustworthiness of web sites. – The expected confidence of the facts provided by a web site w, denote by t(w). 7

Problem Definitions Facts may be conflict or supportive to each other. – Ex: “Jennifer Widom”, “J. Widom” Concept of implication – imp(f 1 → f 2 ): f 1 ’s influence on f 2 ’s confidence. 8

Basic heuristic 1. Usually there is only one true fact for a property of an object. 2. This true fact appears to be the same or similar on different web sites. 9

Basic heuristic (cont.) Basic heuristic 3. The false facts on different web sites are less likely to be the same or similar. 4. In a certain domain, a web site that provides mostly true facts for many objects will likely provide true facts for other objects. 10

Web Site Trustworthiness and Fact Confidence Trustworthiness t(w) where F(w) is the set of facts provided by w. 11

Web Site Trustworthiness and Fact Confidence more difficult to estimate the confidence of a fact. 12

Web Site Trustworthiness and Fact Confidence Simple case – f 1 is the only fact about object o 1 – assume w 1 and w 2 are independent. Confidence s(f) W(f) is the set of web sites providing f. 13

Web Site Trustworthiness and Fact Confidence Trustworthiness score of a web site τ(w) is between 0 and + ∞, better characterizes how accurate w is. – ex: t(w 1 ) = 0.9, t(w 2 ) = 0.99  t(w 2 ) = 1.1 × t(w 1 )  τ(w 2 ) = 2 × τ(w 1 ) 14

Web Site Trustworthiness and Fact Confidence Confidence score of a fact – Property: 15

Web Site Trustworthiness and Fact Confidence adjusted confidence score of a fact f 16

Web Site Trustworthiness and Fact Confidence Compute the confidence of f based on σ*(f) in the same way as computing it based on σ(f). Different web sites are independent.  add a dampening factor γ, 0 < γ < 1.  incorrect! 17

Web Site Trustworthiness and Fact Confidence Negative-confidence problem – a fact f conflicting with some facts provided by trustworthy web sites.  σ*(f) < 0 and s*(f) < 0. – If γ . σ*(f) > 0, s(f) is very close to s*(f). – If γ . σ*(f) < 0, s(f) is close to zero but still positive.  unreasonable! 18

Iterative Computation T RUTH F INDER - Iterative method – TruthFinder has little information about the web sites and the facts. – Each iteration, improves its knowledge about trustworthiness and confidence. – Stops when the computation reaches a stable state. 19

Empirical Study Compare with VOTING – Which Chooses the fact that is provided by most web sites. Intel PC with a 1.66GHz dual-core processor, 1GB memory, Windows XP Professional. ρ = 0.5 and γ =

Empirical Study 21

Empirical Study 22

Empirical Study 23

Empirical Study 24

Conclusions Introduce and formulate the Veracity problem – resolving conflicting facts from multiple web site. – finding true facts among them. Propose T RUTH F INDER – Utilizes Web site trustworthiness and fact confidence to find trustable web sites and true facts. Experiment achieves high accuracy. 25