Adaptive entity resolution with human computation

Slides:



Advertisements
Similar presentations
CrowdER - Crowdsourcing Entity Resolution
Advertisements

Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Background Reinforcement Learning (RL) agents learn to do tasks by iteratively performing actions in the world and using resulting experiences to decide.
K-means clustering Hongning Wang
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
01 -1 Lecture 01 Artificial Intelligence Topics –Introduction –Knowledge representation –Knowledge reasoning –Machine learning –Applications.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Scalable Text Mining with Sparse Generative Models
QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts,
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
POTENTIAL RELATIONSHIP DISCOVERY IN TAG-AWARE MUSIC STYLE CLUSTERING AND ARTIST SOCIAL NETWORKS Music style analysis such as music classification and clustering.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Graphical models for part of speech tagging
Optimizing Plurality for Human Intelligence Tasks Luyi Mo University of Hong Kong Joint work with Reynold Cheng, Ben Kao, Xuan Yang, Chenghui Ren, Siyu.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.
On Node Classification in Dynamic Content-based Networks.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock,
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.
Consensus Group Stable Feature Selection
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.
1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Unsupervised Streaming Feature Selection in Social Media
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.
Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.
Task assignment of interactive Entity resolution 龚赛赛
Introduction to Machine Learning, its potential usage in network area,
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
Bridging Domains Using World Wide Knowledge for Transfer Learning
Exploring Social Tagging Graph for Web Object Classification
Information Organization: Overview
School of Computer Science & Engineering
Saisai Gong, Wei Hu, Yuzhong Qu
Probabilistic Data Management
Adversarially Tuned Scene Generation
Basic Intro Tutorial on Machine Learning and Data Mining
CS7280: Special Topics in Data Mining Information/Social Networks
Critical Issues with Respect to Clustering
Property consolidation for entity browsing
Type-directed Topic Segmentation of Entity Descriptions
Overview of Machine Learning
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
An Interactive Approach to Collectively Resolving URI Coreference
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Jiawei Han Department of Computer Science
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Learning Probabilistic Graphical Models Overview Learning Problems.
GANG: Detecting Fraudulent Users in OSNs
Summarization for entity annotation Contextual summary
Leverage Consensus Partition for Domain-Specific Entity Coreference
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Information Organization: Overview
Generalized Diagnostics with the Non-Axiomatic Reasoning System (NARS)
Presentation transcript:

Adaptive entity resolution with human computation 龚赛赛 20151026

Contents Background and Motivation Related Work Model Overview

Background ER: identify entities referring to the same real-world objects Hybrid human-machine approaches Leverage human intelligence Improving the quality of user-judged results is important Users usually have diverse competence w.r.t resolving different entities Users have different background and domain knowledge Unfamiliar resolution tasks -> low accuracy So, it is necessary to assign tasks to competent users

Motivation In the Semantic Web, few works consider users’ diverse competence across resolution tasks. In the other communities (e.g. ML and DB), several works for handling crowdsourcing tasks propose their approaches to estimate users’ competence and adaptively assign tasks. However, these approaches did not fully exploit the characteristics of Linked Data and need to be tailored. Our goal: estimate user competency based on the similar completed tasks and adaptively assign tasks to competent users

Related Work Use crowdsourcing to acquire for user contribution with various goals Infer true labels from the crowds when existing unreliable users Reduce expenditure, e.g. Ref. [1] vldb’12 use clustering algorithms to reduce verification tasks for crowd ER Ref. [2] vldb’13 select the best question for crowd ER Others Common algorithms: EM Other ones such as minimax entropy Two important components (unified in a single model or separated) Estimating user competence (/expertise/quality) Adaptive task assignment Heterogeneous vs Homogeneous

Related Work Estimating user competence Adaptive task assignment Based on prior knowledge on probabilistic distribution Based on similar tasks Based on ground truth The fraction of tasks with ground truth users labeled correctly Adaptive task assignment Assign tasks to the best competent users For a task, assign it to the users with highest estimated competence values Global optimization

Related Work Estimating user competence based on prior knowledge on probabilistic distribution (Ref. [3], icdm12) X: instance feature vector N: num of instances Z: true label Y: user label A: user competence M: num of users i: instance index, j: user index

Related Work Estimating user competence based on prior knowledge on probabilistic distribution (cont.) E step M step

Related Work Estimating user competence based on similar tasks (Ref. [4] sigmod15) Similar tasks have similar estimated accuracies Estimated accuracy p need to be similar to real accuracy q Solve by page rank

Related Work Estimating user competence based on similar tasks (Ref[6] aaai14) X’ a learned high level representation of instances from transfer learning

Related Work Adaptive task assignment with global optimization (Ref. [4] Sigmod15 ) More sophisticated ones e.g. online primal-dual technique (Ref. [5]) Optimization target: Greedy assignment: Maximize

Model overview

Model overview Task similarity <ei,ej>, <ea,eb> Sim_ea : Max(sim(ea, ei), sim(ea, ej)) (Sim_ea +Sim_eb)/2 sim(ea, ei), Same datasource and similar types Isub of class names or shortest path length Overlap of properties used in description Isub of property names Connected neighbors Random walk Semantic based e.g. owl:sameAs

Model overview Task selector by uncertainty Task assigner Entropy based Task assigner The top-k users with highest competences

Reference Wang, J., Kraska, T., Franklin, M. J., Feng, J.: CrowdER: Crowdsourcing entity resolution. VLDB, 5(11):1483-1494, 2012 2. Whang, S. E., Lofgren, P., Garcia-Monlina, H.: Question selection for crowd entity resolution. VLDB, 6(6):349-360, 2013 3. Fang M. et al. Sukthankar G. Self taught active learning from crowds. ICDM, 858- 863, 2012 4. Fan, J., Li, G., Ooi, B. C., Tan, K., Feng, J.: iCrowd: An adaptive crowdsourcing framework. In: SIGMOD, pp. 1015-1030, 2015 5. Ho, C. J., Jabbari, S., Vaughan, J. W.: Adapative task assigment for crowsourced classification. In: ICML, pp. 534-542, 2013 6. Fang, M., Yin, J., Tao, D.: Active learning for crowdsourcing using knowledge transfer. In: AAAI, pp. 1809-1815, 2014