Leverage Consensus Partition for Domain-Specific Entity Coreference

Slides:



Advertisements
Similar presentations
Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Advertisements

Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Random Forest Predrag Radenković 3237/10
Clustering Categorical Data The Case of Quran Verses
Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Roberto Perdisci, Igino Corona, David Dagon, Wenke Lee ACSAC.
Imbalanced data David Kauchak CS 451 – Fall 2013.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Learning Visual Similarity Measures for Comparing Never Seen Objects Eric Nowak, Frédéric Jurie CVPR 2007.
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Data Mining Techniques: Clustering
Trust Relationship Prediction Using Online Product Review Data Nan Ma 1, Ee-Peng Lim 2, Viet-An Nguyen 2, Aixin Sun 1, Haifeng Liu 3 1 Nanyang Technological.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Ensemble Learning: An Introduction
Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy.
Learning Object Identification Rules for Information Integration Sheila Tejada Craig A. Knobleock Steven University of Southern California.
Predicting Matchability - CVPR 2014 Paper -
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
Introduction to Machine Learning Approach Lecture 5.
InCob A particle swarm based hybrid system for imbalanced medical data sampling Pengyi Yang School of Information Technologies.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Thien Anh Dinh1, Tomi Silander1, Bolan Su1, Tianxia Gong
Data mining and machine learning A brief introduction.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Interactive Deduplication using Active Learning Sunita Sarawagi and Anuradha Bhamidipaty Presented by Doug Downey.
Ensemble with Neighbor Rules Voting Itt Romneeyangkurn, Sukree Sinthupinyo Faculty of Computer Science Thammasat University.
CS654: Digital Image Analysis
ROCK: A Robust Clustering Algorithm for Categorical Attributes Authors: Sudipto Guha, Rajeev Rastogi, Kyuseok Shim Data Engineering, Proceedings.,
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Probabilistic Equational Reasoning Arthur Kantor
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.
Clustering-based Active Learning on Sensor Type Classification in Buildings Dezhi Hong, Hongning Wang, Kamin Whitehouse University of Virginia 1.
Cell Segmentation in Microscopy Imagery Using a Bag of Local Bayesian Classifiers Zhaozheng Yin RI/CMU, Fall 2009.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Urban Sensing Based on Human Mobility
Saisai Gong, Wei Hu, Yuzhong Qu
Source: Procedia Computer Science(2015)70:
Basic machine learning background with Python scikit-learn
Reconstructing Ancient Literary Texts from Noisy Manuscripts
Associative Query Answering via Query Feature Similarity
Dieudo Mulamba November 2017
Lecture 9: Entity Resolution
Presented by: Prof. Ali Jaoua
Property consolidation for entity browsing
iSRD Spam Review Detection with Imbalanced Data Distributions
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Human Action Recognition Week 8
An Interactive Approach to Collectively Resolving URI Coreference
Approaching an ML Problem
Consensus Partition Liang Zheng 5.21.
Ensemble learning Reminder - Bagging of Trees Random Forest
SView 0.3设计
Filtering Properties of Entities By Class
An Approach to Abstractive Multi-Entity Summarization
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

Leverage Consensus Partition for Domain-Specific Entity Coreference 龚赛赛 saisaigong@gmail.com 2013-12-11

Contents Introduction Overview of Approach Improve quality of labeled data from user feedback Schema independent learning approach Conclusion

Introduction Human intelligence is valuable for entity coreference Manually resolve coreference of given URIs Collect training examples … However, quality of labeled data from user feedback is not always satisfying To improve quality, we use the approach of consensus partition.

Introduction Modeling entity coreference at a high level is considerable Two challenges Usually, property matching not available or enough for entity resolution In many cases, datasets used for statistics not available To deal with the challenges, we build a classifier for entity coreference based on improved labeled data Labeled data can be relatively small c.p. datasets Classifier based on weak features can be enhanced

Related work Improving labeled data Modeling entity resolution Detect good and bad worker Modeling entity resolution Rule based Graph based Learning based …..

Overview of Approach

Overview of Approach Running Example Mike browse u1and label <u1,u2>, <u1,u3> are coreferent. Again, he browse u4 and label <u4,u5> are coreferent. So, his partition is {u1u2u3|u4u5} Similarly, Tom browses u1 and label <u1,u3> coreferent, and browses u5 labeling coreferent pair <u5,u1>. Also he browse u2 and label <u2,u4> coreferent. His partition becomes {u1 u3 u5| u2 u4} Alice’s partition is {u1u3|u2u4} Finally, consensus partition is {u1 u3| u2 u4| u5}

Improve quality of labeled data from user feedback Compute a consensus partition that minimize disagreement between input partitions Using symmetric difference Maximizing Hierarchical clustering (average link)

Schema independent learning approach Learning model: random forest Bagging with decision tree Handle noise Handle imbalanced training data Enhance weak learner Feature: property pair value similarity (schema independent) URIs: vsim=1 iff identical or in equivalent class Numeric literals: vsim=1 iff difference less than threshold Boolean literals: vsim=1 iff value equal Other literals: Jaccard similarity

Conclusion Possible contributions Propose a new method of improving labeled data from user feedback based on consensus partition Propose a novel approach for entity coreference which is schema independent with high accuracy Evaluate the effect of consensus partition to the quality of labeled data Evaluate the performance of our learning approach compared with other approaches Information gain based SVM based

Thank you for you attention !