Download presentation
Presentation is loading. Please wait.
Published byAmi Hamilton Modified over 8 years ago
1
Task assignment of interactive Entity resolution 龚赛赛 20151214
2
Contents Background and Motivation Related Work Model Overview
3
Background ER: identify entities referring to the same real-world objects Hybrid human-machine approaches Leverage human intelligence Improving the quality of user-judged results is important Users usually have diverse competence w.r.t resolving different entities Users have different background and domain knowledge Unfamiliar resolution tasks -> low accuracy So, it is necessary to assign tasks to competent users
4
Motivation In the Semantic Web, few works consider users’ diverse competence across resolution tasks. In the other communities (e.g. ML and DB), several works for handling crowdsourcing tasks propose their approaches to estimate users’ competence and adaptively assign tasks. However, these approaches did not fully exploit the characteristics of Linked Data and need to be tailored. Our goal: estimate user competency based on the similar completed tasks and adaptively assign tasks to competent users
5
Related Work Use crowdsourcing to acquire for user contribution with various goals Infer true labels from the crowds when existing unreliable users Reduce expenditure, e.g. Ref. [1] vldb’12 use clustering algorithms to reduce verification tasks for crowd ER Ref. [2] vldb’13 select the best question for crowd ER Others Common algorithms: EM Other ones such as minimax entropy Two important components (unified in a single model or separated) Estimating user competence (/expertise/quality) Adaptive task assignment Heterogeneous vs Homogeneous
6
Related Work Estimating user competence Based on prior knowledge on probabilistic distribution Based on similar tasks Based on ground truth The fraction of tasks with ground truth users labeled correctly Adaptive task assignment Assign tasks to the best competent users For a task, assign it to the users with highest estimated competence values Global optimization
7
Related Work Estimating user competence based on prior knowledge on probabilistic distribution (Ref. [3], icdm12) X: instance feature vector N: num of instances Z: true label Y: user label A: user competence M: num of users i: instance index, j: user index
8
Related Work Estimating user competence based on prior knowledge on probabilistic distribution (cont.) E step M step
9
Related Work Estimating user competence based on similar tasks (Ref. [4] sigmod15) Similar tasks have similar estimated accuracies Estimated accuracy p need to be similar to real accuracy q Solve by page rank
10
Related Work Estimating user competence based on similar tasks (Ref[6] aaai14) X’ a learned high level representation of instances from transfer learning
11
Related Work Adaptive task assignment with global optimization (Ref. [4] Sigmod15 ) More sophisticated ones e.g. online primal-dual technique (Ref. [5]) Optimization target: Greedy assignment: Maximize
12
Model overview
13
β θ Z α T M L Z t S θ~Dir (β) Z| θ ~Mult(θ) T~Mult(Z) L~(1+exp(-α)) -1
14
Task similarity Data publisher, Jaccard(pld(e1)+pld(e2),pld(e3)+pld(e4)) Class similarity 1.For each entity, get its declared types, find the maximal classes of the types excluding owl:Thing, rdfs:Resource 2.Union the maximal classes of the entity pair, denoted as X 3.Find a label for each class in X, union the labels as the set Y 4.Vector cosine similarity(Pair1’s label set, Pair2’s label set) Property similarity Get the label of each entity, union the labels for the entity pair-> vector Excluding owl:sameAs, rdf:type, rdfs:seeAlso Vector cosine similarity Combine: linear combination, 0.7*classSim + 0.2*DataPublish+0.1*PropSim
15
过程 对于未分配的实体,选取一个 label 进行标记(其实是去读取真实 的标记),然后算法进行训练,当所有实体分配完就结束
16
Results Organization our method, 0.64 compared 0.627 People our method 0.61792; compared 0.607 Location our method, 0.602 ; compared 0.591 Arts our method, 0.608 compared 0.598
17
Reference 1.Wang, J., Kraska, T., Franklin, M. J., Feng, J.: CrowdER: Crowdsourcing entity resolution. VLDB, 5(11):1483-1494, 2012 2. Whang, S. E., Lofgren, P., Garcia-Monlina, H.: Question selection for crowd entity resolution. VLDB, 6(6):349-360, 2013 3. Fang M. et al. Sukthankar G. Self taught active learning from crowds. ICDM, 858- 863, 2012 4. Fan, J., Li, G., Ooi, B. C., Tan, K., Feng, J.: iCrowd: An adaptive crowdsourcing framework. In: SIGMOD, pp. 1015-1030, 2015 5. Ho, C. J., Jabbari, S., Vaughan, J. W.: Adapative task assigment for crowsourced classification. In: ICML, pp. 534-542, 2013 6. Fang, M., Yin, J., Tao, D.: Active learning for crowdsourcing using knowledge transfer. In: AAAI, pp. 1809-1815, 2014
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.