Task assignment of interactive Entity resolution 龚赛赛 20151214.

Slides:

Advertisements

Similar presentations

Recommender System A Brief Survey.

Advertisements

Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.

Author Name Disambiguation for Citations Using Topic and Web Correlation Citation : a collection of: coauthor, title, venue, topic, and Web attributes.

CrowdER - Crowdsourcing Entity Resolution

Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.

Background Reinforcement Learning (RL) agents learn to do tasks by iteratively performing actions in the world and using resulting experiences to decide.

K-means clustering Hongning Wang

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.

数据挖掘实验 1 Apriori 算法编程实现. 数据挖掘实验一 (20’) 实验目的：了解关联规则在数据挖掘中的应用，理解和掌握关联挖掘的经典算法 Apriori 算法的基本原理和执行过程并完成程序设计。实验内容：对给定数据集用 Apriori 算法进行挖掘，找出其中的频繁集并生成关联规则。

Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.

Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Holistic Web Page Classification William W. Cohen Center for Automated Learning and Discovery (CALD) Carnegie-Mellon University.

Scalable Text Mining with Sparse Generative Models

QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts,

Example Data Sets Prior Research Join related objects to form independent compound objects, cluster normally (Yin et al., 2005). Use attribute-based distance.

A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.

Large-Scale Cost-sensitive Online Social Network Profile Linkage.

TransRank: A Novel Algorithm for Transfer of Rank Learning Depin Chen, Jun Yan, Gang Wang et al. University of Science and Technology of China, USTC Machine.

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.

Optimizing Plurality for Human Intelligence Tasks Luyi Mo University of Hong Kong Joint work with Reynold Cheng, Ben Kao, Xuan Yang, Chenghui Ren, Siyu.

Entity Resolution for Big Data Lise Getoor University of Maryland College Park, MD Ashwin Machanavajjhala Duke University Durham, NC

Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Hierarchical Affinity Propagation Inmar E. Givoni, Clement Chung, Brendan J. Frey.

ADAPTIVE EVENT DETECTION USING TIME-VARYING POISSON PROCESSES Kdd06 University of California, Irvine.

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

1 Resolving Schematic Discrepancy in the Integration of Entity-Relationship Schemas Qi He Tok Wang Ling Dept. of Computer Science School of Computing National.

Venue Recommendation: Submitting your Paper with Style Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering, Lehigh University.

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.

Unsupervised Streaming Feature Selection in Social Media

Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.

XP Classroom Activities: Evaluation and Reflection College of Foreign Languages, CQU (July 2008)

Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.

Motoki Shiga, Ichigaku Takigawa, Hiroshi Mamitsuka

Introduction to Machine Learning, its potential usage in network area,

Brief Intro to Machine Learning CS539

Bridging Domains Using World Wide Knowledge for Transfer Learning

Model Discovery through Metalearning

Saisai Gong, Wei Hu, Yuzhong Qu

Websoft Research Group

Collective Network Linkage across Heterogeneous Social Platforms

RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,

Knowledge Graph Embedding

Adaptive entity resolution with human computation

Property consolidation for entity browsing

Type-directed Topic Segmentation of Entity Descriptions

Overview of Machine Learning

MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.

An Interactive Approach to Collectively Resolving URI Coreference

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Jiawei Han Department of Computer Science

Effective Entity Recognition and Typing by Relation Phrase-Based Clustering

Binghui Wang, Le Zhang, Neil Zhenqiang Gong

Learning Probabilistic Graphical Models Overview Learning Problems.

GANG: Detecting Fraudulent Users in OSNs

Summarization for entity annotation Contextual summary

Leverage Consensus Partition for Domain-Specific Entity Coreference

Topic Models in Text Processing

A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.

Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^

Deep Structured Scene Parsing by Learning with Image Descriptions

Generalized Diagnostics with the Non-Axiomatic Reasoning System (NARS)

Presentation transcript:

Task assignment of interactive Entity resolution 龚赛赛

Contents Background and Motivation Related Work Model Overview

Background ER: identify entities referring to the same real-world objects Hybrid human-machine approaches Leverage human intelligence Improving the quality of user-judged results is important Users usually have diverse competence w.r.t resolving different entities Users have different background and domain knowledge Unfamiliar resolution tasks -> low accuracy So, it is necessary to assign tasks to competent users

Motivation In the Semantic Web, few works consider users’ diverse competence across resolution tasks. In the other communities (e.g. ML and DB), several works for handling crowdsourcing tasks propose their approaches to estimate users’ competence and adaptively assign tasks. However, these approaches did not fully exploit the characteristics of Linked Data and need to be tailored. Our goal: estimate user competency based on the similar completed tasks and adaptively assign tasks to competent users

Related Work Use crowdsourcing to acquire for user contribution with various goals Infer true labels from the crowds when existing unreliable users Reduce expenditure, e.g. Ref. [1] vldb’12 use clustering algorithms to reduce verification tasks for crowd ER Ref. [2] vldb’13 select the best question for crowd ER Others Common algorithms: EM Other ones such as minimax entropy Two important components (unified in a single model or separated) Estimating user competence (/expertise/quality) Adaptive task assignment Heterogeneous vs Homogeneous

Related Work Estimating user competence Based on prior knowledge on probabilistic distribution Based on similar tasks Based on ground truth The fraction of tasks with ground truth users labeled correctly Adaptive task assignment Assign tasks to the best competent users For a task, assign it to the users with highest estimated competence values Global optimization

Related Work Estimating user competence based on prior knowledge on probabilistic distribution (Ref. [3], icdm12) X: instance feature vector N: num of instances Z: true label Y: user label A: user competence M: num of users i: instance index, j: user index

Related Work Estimating user competence based on prior knowledge on probabilistic distribution (cont.) E step M step

Related Work Estimating user competence based on similar tasks (Ref. [4] sigmod15) Similar tasks have similar estimated accuracies Estimated accuracy p need to be similar to real accuracy q Solve by page rank

Related Work Estimating user competence based on similar tasks (Ref[6] aaai14) X’ a learned high level representation of instances from transfer learning

Related Work Adaptive task assignment with global optimization (Ref. [4] Sigmod15 ) More sophisticated ones e.g. online primal-dual technique (Ref. [5]) Optimization target: Greedy assignment: Maximize

Model overview

β θ Z α T M L Z t S θ~Dir (β) Z| θ ~Mult(θ) T~Mult(Z) L~(1+exp(-α)) -1

Task similarity Data publisher, Jaccard(pld(e1)+pld(e2),pld(e3)+pld(e4)) Class similarity 1.For each entity, get its declared types, find the maximal classes of the types excluding owl:Thing, rdfs:Resource 2.Union the maximal classes of the entity pair, denoted as X 3.Find a label for each class in X, union the labels as the set Y 4.Vector cosine similarity(Pair1’s label set, Pair2’s label set) Property similarity Get the label of each entity, union the labels for the entity pair-> vector Excluding owl:sameAs, rdf:type, rdfs:seeAlso Vector cosine similarity Combine: linear combination, 0.7*classSim + 0.2*DataPublish+0.1*PropSim

过程对于未分配的实体，选取一个 label 进行标记（其实是去读取真实的标记），然后算法进行训练，当所有实体分配完就结束

Results Organization our method, 0.64 compared People our method ; compared Location our method, ; compared Arts our method, compared 0.598

Reference 1.Wang, J., Kraska, T., Franklin, M. J., Feng, J.: CrowdER: Crowdsourcing entity resolution. VLDB, 5(11): , Whang, S. E., Lofgren, P., Garcia-Monlina, H.: Question selection for crowd entity resolution. VLDB, 6(6): , Fang M. et al. Sukthankar G. Self taught active learning from crowds. ICDM, , Fan, J., Li, G., Ooi, B. C., Tan, K., Feng, J.: iCrowd: An adaptive crowdsourcing framework. In: SIGMOD, pp , Ho, C. J., Jabbari, S., Vaughan, J. W.: Adapative task assigment for crowsourced classification. In: ICML, pp , Fang, M., Yin, J., Tao, D.: Active learning for crowdsourcing using knowledge transfer. In: AAAI, pp , 2014