Adaptive entity resolution with human computation 龚赛赛 20151026
Contents Background and Motivation Related Work Model Overview
Background ER: identify entities referring to the same real-world objects Hybrid human-machine approaches Leverage human intelligence Improving the quality of user-judged results is important Users usually have diverse competence w.r.t resolving different entities Users have different background and domain knowledge Unfamiliar resolution tasks -> low accuracy So, it is necessary to assign tasks to competent users
Motivation In the Semantic Web, few works consider users’ diverse competence across resolution tasks. In the other communities (e.g. ML and DB), several works for handling crowdsourcing tasks propose their approaches to estimate users’ competence and adaptively assign tasks. However, these approaches did not fully exploit the characteristics of Linked Data and need to be tailored. Our goal: estimate user competency based on the similar completed tasks and adaptively assign tasks to competent users
Related Work Use crowdsourcing to acquire for user contribution with various goals Infer true labels from the crowds when existing unreliable users Reduce expenditure, e.g. Ref. [1] vldb’12 use clustering algorithms to reduce verification tasks for crowd ER Ref. [2] vldb’13 select the best question for crowd ER Others Common algorithms: EM Other ones such as minimax entropy Two important components (unified in a single model or separated) Estimating user competence (/expertise/quality) Adaptive task assignment Heterogeneous vs Homogeneous
Related Work Estimating user competence Adaptive task assignment Based on prior knowledge on probabilistic distribution Based on similar tasks Based on ground truth The fraction of tasks with ground truth users labeled correctly Adaptive task assignment Assign tasks to the best competent users For a task, assign it to the users with highest estimated competence values Global optimization
Related Work Estimating user competence based on prior knowledge on probabilistic distribution (Ref. [3], icdm12) X: instance feature vector N: num of instances Z: true label Y: user label A: user competence M: num of users i: instance index, j: user index
Related Work Estimating user competence based on prior knowledge on probabilistic distribution (cont.) E step M step
Related Work Estimating user competence based on similar tasks (Ref. [4] sigmod15) Similar tasks have similar estimated accuracies Estimated accuracy p need to be similar to real accuracy q Solve by page rank
Related Work Estimating user competence based on similar tasks (Ref[6] aaai14) X’ a learned high level representation of instances from transfer learning
Related Work Adaptive task assignment with global optimization (Ref. [4] Sigmod15 ) More sophisticated ones e.g. online primal-dual technique (Ref. [5]) Optimization target: Greedy assignment: Maximize
Model overview
Model overview Task similarity <ei,ej>, <ea,eb> Sim_ea : Max(sim(ea, ei), sim(ea, ej)) (Sim_ea +Sim_eb)/2 sim(ea, ei), Same datasource and similar types Isub of class names or shortest path length Overlap of properties used in description Isub of property names Connected neighbors Random walk Semantic based e.g. owl:sameAs
Model overview Task selector by uncertainty Task assigner Entropy based Task assigner The top-k users with highest competences
Reference Wang, J., Kraska, T., Franklin, M. J., Feng, J.: CrowdER: Crowdsourcing entity resolution. VLDB, 5(11):1483-1494, 2012 2. Whang, S. E., Lofgren, P., Garcia-Monlina, H.: Question selection for crowd entity resolution. VLDB, 6(6):349-360, 2013 3. Fang M. et al. Sukthankar G. Self taught active learning from crowds. ICDM, 858- 863, 2012 4. Fan, J., Li, G., Ooi, B. C., Tan, K., Feng, J.: iCrowd: An adaptive crowdsourcing framework. In: SIGMOD, pp. 1015-1030, 2015 5. Ho, C. J., Jabbari, S., Vaughan, J. W.: Adapative task assigment for crowsourced classification. In: ICML, pp. 534-542, 2013 6. Fang, M., Yin, J., Tao, D.: Active learning for crowdsourcing using knowledge transfer. In: AAAI, pp. 1809-1815, 2014