Saisai Gong, Wei Hu, Yuzhong Qu

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

ECG Signal processing (2)

Random Forest Predrag Radenković 3237/10

Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.

Large-Scale Entity-Based Online Social Network Profile Linkage.

Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.

SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.

1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.

Ensemble Learning: An Introduction

Learning Object Identification Rules for Information Integration Sheila Tejada Craig A. Knobleock Steven University of Southern California.

1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.

Robust Real-Time Object Detection Paul Viola & Michael Jones.

Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Rotation Forest: A New Classifier Ensemble Method 交通大學電子所蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.

Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.

BOF Trees Visualization  Zagreb, June 12, 2004 BOF Trees Visualization  Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability.

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

CLASSIFICATION: Ensemble Methods

Automatic Detection of Social Tag Spams Using a Text Mining Approach Hsin-Chang Yang Associate Professor Department of Information Management National.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.

Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.

INTERACTIVELY BROWSING LARGE IMAGE DATABASES Ronald Richter, Mathias Eitz and Marc Alexa.

Ivica Dimitrovski 1, Dragi Kocev 2, Suzana Loskovska 1, Sašo Džeroski 2 1 Faculty of Electrical Engineering and Information Technologies, Department of.

Consensus Group Stable Feature Selection

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.

Probabilistic Equational Reasoning Arthur Kantor

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Experience Report: System Log Analysis for Anomaly Detection

Hyunghoon Cho, Bonnie Berger, Jian Peng Cell Systems

Boosted Augmented Naive Bayes. Efficient discriminative learning of

Bag-of-Visual-Words Based Feature Extraction

Cross Domain Distribution Adaptation via Kernel Mapping

Websoft Research Group

Introduction Feature Extraction Discussions Conclusions Results

Dieudo Mulamba November 2017

Lecture 9: Entity Resolution

Reducing Training Time in a One-shot Machine Learning-based Compiler

Department of Computer Science University of York

Property consolidation for entity browsing

Rui Yang, Wei Hu and Yuzhong Qu

iSRD Spam Review Detection with Imbalanced Data Distributions

MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.

An Interactive Approach to Collectively Resolving URI Coreference

Consensus Partition Liang Zheng 5.21.

Block Matching for Ontologies

Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Ensemble learning Reminder - Bagging of Trees Random Forest

Nearest Neighbors CSC 576: Data Mining.

Leverage Consensus Partition for Domain-Specific Entity Coreference

Actively Learning Ontology Matching via User Interaction

Learning to Rank with Ties

Extracting Why Text Segment from Web Based on Grammar-gram

Hyunghoon Cho, Bonnie Berger, Jian Peng Cell Systems

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

Leveraging Distributed Human Computation and Consensus Partition for Entity Coreference Saisai Gong, Wei Hu, Yuzhong Qu Websoft Research Group, Nanjing University, China

Contents Introduction Overview of approach Consensus partition Ensemble learning Evaluation Conclusion

Introduction Human knowledge is valuable for entity coreference Crowdsourcing and other systems used for acquiring human contribution to coreference Quality of user-judged results is a problem Mistakes and outliers Omissions …

Introduction In this paper, we propose coCoref which Acquire human contribution from web browsing activities Improve quality by aggregating individual results into more robust and comprehensive ones Alleviate user involvement by automatically identifying coreferent entities not judged

Overview of approach

Overview of approach

Overview of approach

Overview of approach

Overview of approach

Overview of approach

Overview of approach

Example NewYorkCity, NY, TheBigApple Manhattan, NewYorkCounty NewYorkCity, TheBigApple, NewYorkCounty NY, Manhattan NewYorkCity, TheBigApple NY, Manhattan

Example Consensus partition NewYorkCity, NY, TheBigApple Manhattan, NewYorkCounty NewYorkCity, TheBigApple, NewYorkCounty NY, Manhattan NewYorkCity, TheBigApple NY, Manhattan Consensus partition NewYorkCounty NewYorkCity, TheBigApple NY, Manhattan

Consensus partition Compute a consensus partition that minimize the number of disagreements with input partitions Obtain more robust and comprehensive coreference results Measure disagreement between partitions using symmetric difference distance NP-complete

Consensus partition Approximation Algorithm 3-approximation algorithm time complexity:

Ensemble learning Given training examples, learn a classifier to identify whether two entities are coreferent Training examples from consensus partition Positive: entity pairs in the same equivalence class Negative: entity pairs across different equivalence classes Feature: property-value similarity All property pairs for feature vector

Ensemble learning Choose suitable learning algorithms when Training examples are relatively small. Still a small amount of mistakes or outliers exist. Potentially important property pairs may not be characterized by training data. We use ensemble learning model Base learner: decision tree Random forest Bagging More suitable for relatively small training data randomness of input features Potential important property pairs can be also used

Evaluation Integration in SView SView finds candidates using owl:sameAs and web services Light-weighted user interaction Just accept or reject a candidate Coreferent entities’ data integrated in browsing immediately Visit SView at http://ws.nju.edu.cn/sview/

Evaluation on SView dataset Target: evaluate how coCoref improves result quality using consensus partition Based on SView dataset (Oct. to Dec. 2013) 36 users’ individual partitions 1,489 entities from 76 distributed data sources (URI namespace) Building reference partition Comparison using Precision, Recall, F-measure, Rand Index, NMI (normalized mutual information) Baseline: average measure values of 36 users’ individual partitions Compare consensus partition with the best individual ones

Evaluation on SView dataset Results Consensus partition is more comprehensive Highest values for Recall, F-measure, Rand Index, NMI Consensus partition is more robust Improve low average accuracy of 36 users’ results by 21% (0.807 to 0.665 ) Achieve close precision to the best (0.863)

Evaluation on OAEI NYT Target: evaluate the effectiveness of consensus partition and ensemble learning algorithm on a large scale Based on OAEI New York Times benchmark Leveraging coreference results of ObjectCoref, Zhishi.links, Knofuss Total 33,914 entities Cluster coreferent entities of each tool’s dataset to construct it partition

Evaluation on OAEI NYT Results (1) Highest accuracy of consensus partition

Evaluation on OAEI NYT Ensemble learning based on consensus partition Use 10% of training data 10-fold cross validation Compare ensemble learning and consensus partition

Evaluation on OAEI NYT Results (2) Identify a similar size of coreferent entities as consensus partition using only 10% of training data

Conclusion Our contributions Consensus partition for improving accuracy of user-judged results Ensemble learning to alleviate user involvement Integration with browsing activities

Thank you for your attention! Supported in part by the National Natural Science Foundation of China under Grant Nos. 61370019 and 61170068, and in part by the Natural Science Foundation of Jiangsu Province under Grant Nos. BK2011189 and BK2012723