Task assignment of interactive Entity resolution 龚赛赛 20151214.

Task assignment of interactive Entity resolution 龚赛赛 20151214

Contents Background and Motivation Related Work Model Overview

Background ER: identify entities referring to the same real-world objects Hybrid human-machine approaches Leverage human intelligence Improving the quality of user-judged results is important Users usually have diverse competence w.r.t resolving different entities Users have different background and domain knowledge Unfamiliar resolution tasks -> low accuracy So, it is necessary to assign tasks to competent users

Motivation In the Semantic Web, few works consider users’ diverse competence across resolution tasks. In the other communities (e.g. ML and DB), several works for handling crowdsourcing tasks propose their approaches to estimate users’ competence and adaptively assign tasks. However, these approaches did not fully exploit the characteristics of Linked Data and need to be tailored. Our goal: estimate user competency based on the similar completed tasks and adaptively assign tasks to competent users

Related Work Use crowdsourcing to acquire for user contribution with various goals Infer true labels from the crowds when existing unreliable users Reduce expenditure, e.g. Ref. [1] vldb’12 use clustering algorithms to reduce verification tasks for crowd ER Ref. [2] vldb’13 select the best question for crowd ER Others Common algorithms: EM Other ones such as minimax entropy Two important components (unified in a single model or separated) Estimating user competence (/expertise/quality) Adaptive task assignment Heterogeneous vs Homogeneous

Related Work Estimating user competence Based on prior knowledge on probabilistic distribution Based on similar tasks Based on ground truth The fraction of tasks with ground truth users labeled correctly Adaptive task assignment Assign tasks to the best competent users For a task, assign it to the users with highest estimated competence values Global optimization

Related Work Estimating user competence based on prior knowledge on probabilistic distribution (Ref. [3], icdm12) X: instance feature vector N: num of instances Z: true label Y: user label A: user competence M: num of users i: instance index, j: user index

Related Work Estimating user competence based on prior knowledge on probabilistic distribution (cont.) E step M step

Related Work Estimating user competence based on similar tasks (Ref. [4] sigmod15) Similar tasks have similar estimated accuracies Estimated accuracy p need to be similar to real accuracy q Solve by page rank

Related Work Estimating user competence based on similar tasks (Ref[6] aaai14) X’ a learned high level representation of instances from transfer learning

Related Work Adaptive task assignment with global optimization (Ref. [4] Sigmod15 ) More sophisticated ones e.g. online primal-dual technique (Ref. [5]) Optimization target: Greedy assignment: Maximize

Model overview

β θ Z α T M L Z t S θ~Dir (β) Z| θ ~Mult(θ) T~Mult(Z) L~(1+exp(-α)) -1

Task similarity Data publisher, Jaccard(pld(e1)+pld(e2),pld(e3)+pld(e4)) Class similarity 1.For each entity, get its declared types, find the maximal classes of the types excluding owl:Thing, rdfs:Resource 2.Union the maximal classes of the entity pair, denoted as X 3.Find a label for each class in X, union the labels as the set Y 4.Vector cosine similarity(Pair1’s label set, Pair2’s label set) Property similarity Get the label of each entity, union the labels for the entity pair-> vector Excluding owl:sameAs, rdf:type, rdfs:seeAlso Vector cosine similarity Combine: linear combination, 0.7*classSim + 0.2*DataPublish+0.1*PropSim

过程对于未分配的实体，选取一个 label 进行标记（其实是去读取真实的标记），然后算法进行训练，当所有实体分配完就结束

Results Organization our method, 0.64 compared 0.627 People our method 0.61792; compared 0.607 Location our method, 0.602 ; compared 0.591 Arts our method, 0.608 compared 0.598

Reference 1.Wang, J., Kraska, T., Franklin, M. J., Feng, J.: CrowdER: Crowdsourcing entity resolution. VLDB, 5(11):1483-1494, 2012 2. Whang, S. E., Lofgren, P., Garcia-Monlina, H.: Question selection for crowd entity resolution. VLDB, 6(6):349-360, 2013 3. Fang M. et al. Sukthankar G. Self taught active learning from crowds. ICDM, 858- 863, 2012 4. Fan, J., Li, G., Ooi, B. C., Tan, K., Feng, J.: iCrowd: An adaptive crowdsourcing framework. In: SIGMOD, pp. 1015-1030, 2015 5. Ho, C. J., Jabbari, S., Vaughan, J. W.: Adapative task assigment for crowsourced classification. In: ICML, pp. 534-542, 2013 6. Fang, M., Yin, J., Tao, D.: Active learning for crowdsourcing using knowledge transfer. In: AAAI, pp. 1809-1815, 2014

Task assignment of interactive Entity resolution 龚赛赛 20151214.

Similar presentations

Presentation on theme: "Task assignment of interactive Entity resolution 龚赛赛 20151214."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Task assignment of interactive Entity resolution 龚赛赛 20151214.

Similar presentations

Presentation on theme: "Task assignment of interactive Entity resolution 龚赛赛 20151214."— Presentation transcript:

Similar presentations

About project

Feedback