Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.

Slides:



Advertisements
Similar presentations
ThemeInformation Extraction for World Wide Web PaperUnsupervised Learning of Soft Patterns for Generating Definitions from Online News Author Cui, H.,
Advertisements

Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Large-Scale Entity-Based Online Social Network Profile Linkage.
Google Similarity Distance Presented by: Akshay Kumar Pankaj Prateek.
Machine learning continued Image source:
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Distributed Representations of Sentences and Documents
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
Graph Classification.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Bi-Clustering Jinze Liu. Outline The Curse of Dimensionality Co-Clustering  Partition-based hard clustering Subspace-Clustering  Pattern-based 2.
Very Fast Similarity Queries on Semi-Structured Data from the Web Bhavana Dalvi, William W. Cohen Language Technologies Institute, Carnegie Mellon University.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Learning with Positive and Unlabeled Examples using Weighted Logistic Regression Wee Sun Lee National University of Singapore Bing Liu University of Illinois,
Zero Resource Spoken Term Detection on STD 06 dataset Justin Chiu Carnegie Mellon University 07/24/2012, JHU.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
INFORMATION EXTRACTION SNITA SARAWAGI. Management of Information Extraction System Performance Optimization Handling Change Integration of Extracted Information.
Qual Presentation Daniel Khashabi 1. Outline  My own line of research  Papers:  Fast Dropout training, ICML, 2013  Distributional Semantics Beyond.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Measuring the Similarity between Implicit Semantic Relations using Web Search Engines Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka Web Search and.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Expressing Implicit Semantic Relations without Supervision ACL 2006.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Adding Semantics to Clustering Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang Microsoft Research Asia, Beijing, P.R.China Department of Computer.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
1 SIGIR 2004 Web-page Classification through Summarization Dou Shen Zheng Chen * Qiang Yang Presentation : Yao-Min Huang Date : 09/15/2004.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Ang Sun Director of Research, Principal Scientist, inome
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Text Based Similarity Metrics and Delta for Semantic Web Graphs Krishnamurthy Koduvayur Viswanathan Monday, June 28,
Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Learning Analogies and Semantic Relations Nov William Cohen.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
An Integrated Approach for Relation Extraction from Wikipedia Texts Yulan Yan Yutaka Matsuo Mitsuru Ishizuka The University of Tokyo WWW 2009.
Big data classification using neural network
Trees, bagging, boosting, and stacking
Relation Extraction CSCI-GA.2591
Deep learning and applications to Natural language processing
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Introduction Task: extracting relational facts from text
Presentation transcript:

Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International World Wide Web Conference 2010 Rayleigh, North Carolina, USA

Relation Extraction on the Web  Problem definition  Given a crawled corpus of Web text, identify all the different semantic relations that exist between entities mentioned in the corpus.  Challenges  The number or the types of the relations that exist in the corpus are not known in advance  Costly, if not impossible to create training data  Entity name variants must be handled  Will Smith vs. William Smith vs. fresh prince,…  Paraphrases of surface forms must be handled  acquired by, purchased by, bought by,…  Multiple relations can exist between a single pair of entities

Relational Duality ACQUISITION (Microsoft, Powerset) (Google, YouTube) … X acquires Y X buys Y for $ … Extensional definition Intensional definition DUALITY

Overview of the proposed method Web crawler Text Corpus Sentence splitter POS Tagger NP chunker Pattern extractor Lexical pattern s Syntactic patterns Entity pairs vs. Patterns Matrix (Google, YouTube) (Microsoft, Powerset) : X acquires Y X buys Y : Sequential Co-clustering Algorithm Sequential Co-clustering Algorithm Entity pair clusters Lexico-syntactic pattern clusters Cluster labeler (L1 regularized multi- class logistic regression)

Lexico-Syntactic Pattern Extraction  Replace the two entities in a sentence by X and Y  Generate subsequences (over tokens and POS tags)  A subsequence must contain both X and Y  The maximum length of a subsequence must be L tokens  A skip should not exceed g tokens  Total number of tokens skipped must not exceed G  Negation contractions are expanded and are not skipped  Example  … merger/NN is/VBZ software/NN maker/NN [Adobe/NNP System/NN] acquisition/NN of/IN [Macromedia/NNP]  X acquisition of Y, software maker X acquisition of Y  X NN IN Y, NN NN X NN IN Y

Entity pairs vs. lexico-syntactic pattern matrix  Select the most frequent entity pairs and patterns, and create an entity-pair vs. pattern matrix.

Sequential Co-clustering Algorithm 1. Input: A data matrix, row and column clustering thresholds 2. Sort the rows and columns of the matrix in the descending order of their total frequencies. 3. for rows and columns do:  Compute the similarity between current row (column) and the existing row (column) clusters  If maximum similarity < row (column) clustering threshold:  Create a new row (column) cluster with the current row (column)  else:  Assign the current row (column) to the cluster with the maximum similarity  repeat until all rows and columns are clustered 4. return row and column clusters

Sequential Co-clustering Algorithm (Google, YouTube) (Microsoft, Powerset) (Balmer, Microsoft) (Jobs, Apple) X acquired Y Y CEO X X buys Y for $ X of Y Y head X =8 =13 =7 =16 Row clustering threshold = column clustering threshold = 0.5

Sequential Co-clustering Algorithm (Google, YouTube) (Microsoft, Powerset) (Balmer, Microsoft) (Jobs, Apple) X acquired Y Y CEO X X buys Y for $ X of Y Y head X =11=13 =9 =8 =3

Sequential Co-clustering Algorithm (Google, YouTube) (Microsoft, Powerset) (Balmer, Microsoft) (Jobs, Apple) X acquired Y Y CEO X X buys Y for $ X of Y Y head X =11=13 =9 =8 =3

Sequential Co-clustering Algorithm (Google, YouTube) (Microsoft, Powerset) (Balmer, Microsoft) (Jobs, Apple) X acquired Y Y CEO X X buys Y for $ X of Y Y head X

Sequential Co-clustering Algorithm [(Google, YouTube)] (Microsoft, Powerset) (Balmer, Microsoft) (Jobs, Apple) X acquired Y Y CEO X X buys Y for $ X of Y Y head X

Sequential Co-clustering Algorithm [(Google, YouTube)] (Microsoft, Powerset) (Balmer, Microsoft) (Jobs, Apple) X acquired Y [Y CEO X] X buys Y for $ X of Y Y head X < 0.5

Sequential Co-clustering Algorithm [(Google, YouTube)] (Microsoft, Powerset) [(Balmer, Microsoft)] (Jobs, Apple) X acquired Y [Y CEO X] X buys Y for $ X of Y Y head X 0<0.5

Sequential Co-clustering Algorithm [(Google, YouTube)] (Microsoft, Powerset) [(Balmer, Microsoft)] (Jobs, Apple) [X acquired Y] [Y CEO X] X buys Y for $ X of Y Y head X >0.5

Sequential Co-clustering Algorithm [(Google, YouTube)] (Microsoft, Powerset) [(Balmer, Microsoft), (Jobs,Apple)] [X acquired Y] [Y CEO X] X buys Y for $ X of Y Y head X > 0.5

Sequential Co-clustering Algorithm [(Google, YouTube)] (Microsoft, Powerset) [(Balmer, Microsoft), (Jobs,Apple)] [X acquired Y, X buys Y for $] [Y CEO X] X of Y Y head X 0.99 >

Sequential Co-clustering Algorithm [(Google, YouTube), (Microsoft, Powerset)] [(Balmer, Microsoft), (Jobs,Apple)] [X acquired Y, X buys Y for $] [Y CEO X] X of Y Y head X 0.85 >

Sequential Co-clustering Algorithm [(Google, YouTube), (Microsoft, Powerset)] [(Balmer, Microsoft), (Jobs,Apple)] [X acquired Y, X buys Y for $] [Y CEO X, X of Y] Y head X 0.98 > 0.5 0

Sequential Co-clustering Algorithm [(Google, YouTube), (Microsoft, Powerset)] [(Balmer, Microsoft), (Jobs,Apple)] [X acquired Y, X buys Y for $] [Y CEO X, X of Y, Y head X] Entity pair clusters Lexical-syntactic pattern clusters A greedy clustering algorithm Alternates between rows and columns Complexity O(nlogn) Common relations are clustered first The no. of clusters is not required Two thresholds to determine A greedy clustering algorithm Alternates between rows and columns Complexity O(nlogn) Common relations are clustered first The no. of clusters is not required Two thresholds to determine

Estimating the Clustering Thresholds  Ideally each cluster must represent a unique semantic relation  Number of clusters = Number of semantic relations  Number of semantic relations is unknown  Thresholds can be either estimated via cross-validation (requires training data) OR approximated using the similarity distribution. Similarity distribution is approximated using a Zeta distribution (Zipf’s law) Ideal clustering: inter-cluster similarity = 0  intra-cluster similarity =mean with a large number of data points: average similarity in a cluster ≥ threshold  threshold ≈ distribution mean

Measuring Relational Similarity  Empirically evaluate the clusters produced  Use the clusters to measure relational similarity (Bollegala, WWW 2009)  Distance =  ENT dataset: 5 relation types, 100 instances  Task: query using each entity pair and rank using relational distance

Entity pair clusters Self-supervised Relation Detection  What is the relation represented by a cluster?  Label each cluster with a lexical pattern selected from that cluster C1C1 C2C2 CkCk Train an L1 regularized multi-class logistic regression Model (MaxEnt) to discriminate the k-classes. Select the highest weighted lexical patterns from each class Train an L1 regularized multi-class logistic regression Model (MaxEnt) to discriminate the k-classes. Select the highest weighted lexical patterns from each class

Subjective Evaluation of Relation Labels  Baseline  Select the most frequent lexical pattern in a cluster as its label  Ask three human judges to assign grades  A: baseline is better  B: proposed method is better  C: both equally good  D: both bad

Open Information Extraction  Sent500 dataset (Banko and Etzioni, ACL 2008)  500 sentences, 4 relation types  Lexical patterns 947, syntactic patterns 384  4 row clusters, 14 column clusters

Classifying Relations in a Social Network spysee.jp

Relation Classification  Dataset  790,042 nodes (people), 61,339,833 edges (relations)  Randomly select 50,000 edges and manually classify into 53 classes  11,193 lexical patterns, 383 pattern clusters, 664 entity pair clusters RelationPRF PRF colleagues friends alumni co-actors fan teacher husband wife brother sister Micro Macro

Conclusions  Dual representation of semantic relations leads to a natural co-clustering algorithm.  Clustering both entity pairs and lexico-syntactic patterns simultaneously helps to overcome data sparseness in both dimensions.  Co-clustering algorithm scales nlog(n) with data  Clusters produced can be used to:  Measure relational similarity with performance comparable to supervised approaches  Open Information Extraction Tasks  Classify relations found in a social network.

29