Rachit Saluja (rsaluja@seas.upenn.edu) 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.

Slides:



Advertisements
Similar presentations
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Advertisements

Distant Supervision for Relation Extraction without Labeled Data CSE 5539.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Machine learning continued Image source:
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Distributed Representations of Sentences and Documents
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Source-Selection-Free Transfer Learning
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Unsupervised Relation Detection using Automatic Alignment of Query Patterns extracted from Knowledge Graphs and Query Click Logs Panupong PasupatDilek.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
BOOTSTRAPPING INFORMATION EXTRACTION FROM SEMI-STRUCTURED WEB PAGES Andrew Carson and Charles Schafer.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Improving Collaborative Filtering by Incorporating Customer Reviews Hui Hui Supervisor Prof Min-Yen Kan Dr. Kazunari Sugiyama 1.
Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
DeepWalk: Online Learning of Social Representations
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Automatic Writing Evaluation
Automatically Labeled Data Generation for Large Scale Event Extraction
Scalable Person Re-identification on Supervised Smoothed Manifold
Queensland University of Technology
Recommendation in Scholarly Big Data
AP CSP: Cleaning Data & Creating Summary Tables
Sentiment analysis algorithms and applications: A survey
System for Semi-automatic ontology construction
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
A Brief Introduction to Distant Supervision
Semantic Processing with Context Analysis
School of Computer Science & Engineering
It’s All About Me From Big Data Models to Personalized Experience
Cross Domain Distribution Adaptation via Kernel Mapping
Relation Extraction CSCI-GA.2591
Erasmus University Rotterdam
Lecture 24: Relation Extraction
Entity- & Topic-Based Information Ordering
NYU Coreference CSCI-GA.2591 Ralph Grishman.
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Improving a Pipeline Architecture for Shallow Discourse Parsing
Distant supervision for relation extraction without labeled data
Summarizing Entities: A Survey Report
Mining and Analyzing Data from Open Source Software Repository
Social Knowledge Mining
Applying Key Phrase Extraction to aid Invalidity Search
Semantic Interoperability and Data Warehouse Design
Presented by: Prof. Ali Jaoua
Topic Oriented Semi-supervised Document Clustering
Data Integration for Relational Web
Disambiguation Algorithm for People Search on the Web
Word Embedding Word2Vec.
Introduction Task: extracting relational facts from text
iSRD Spam Review Detection with Imbalanced Data Distributions
Review-Level Aspect-Based Sentiment Analysis Using an Ontology
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
Interpretation of Similar Gene Expression Reordering
Word embeddings (continued)
Hierarchical, Perceptron-like Learning for OBIE
How Much is 131 Million Dollars
The Winograd Schema Challenge Hector J. Levesque AAAI, 2011
Introduction to Sentiment Analysis
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Embedding based entity summarization
CS249: Neural Language Model
KnowItAll and TextRunner
Presentation transcript:

Rachit Saluja (rsaluja@seas.upenn.edu) 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew McCallum, Benjamin M. Marlin NAACL 2013 Rachit Saluja (rsaluja@seas.upenn.edu) 03/20/2019

Problem & Motivation This paper tries to tackle the problem of Relation Extraction. Relation extraction involves determining relations between predicates. Text: Mark works in the history department at Harvard. Relation extraction: [Mark, Relation: is a historian, Harvard] More Traditional Techniques use Supervised Learning which is very time consuming to annotate. Multiclass classification on a closed set of relations.

Problem and Motivation (cont.) Why do the algorithms not perform well? The dataset used to train the supervised learning algorithm may not even have that context. (works in history dept. -- > professor) Even if people did figure out a way to find all these relations, it would be impossible to annotate all of them. This paper tries to solve these problems!

Contents: Previous approaches and how did we get there? Matrix Factorization and Universal Schemas Contributions of this work How does the Matrix look like? Objective Function and Matrix Factorization Data Used Evaluation Shortcomings Conclusions and Future Work

Previous approaches and how did we get here? Approach 1: Culotta and Sorensen (2004)  Uses a predefined, finite and fixed schema of relation types, usually some textual data is labeled, and then we use Supervised Learning. Problem: Labelling it is very difficult, and time consuming, doesn’t generalize. Example: Culotta and Sorensen (2004) use SVMs to find the similarity between dependency trees and detect relations on the Automatic Content Extraction (ACE) corpus.

Previous approaches and how did we get here? Approach 2: (Distant Supervision) One aligns existing database records with the sentences, creates the labels in a semi supervised form (distantly supervised) and then we use Supervised Learning. Example: (Mintz et al., 2009) For each pair of entities that appears in some large semantic relation database (Freebase), they find all sentences containing those entities in a large unlabeled corpus and extract textual features to train a relation classifier. /location/location/contains ---- > Paris, Montmartre

Previous approaches and how did we get here? Problems: 1. Does not include surface patterns 2. large databases are hard to get. 3. Its extent to capture relations only remain to the pair of predicates that are present in database. “Mozart was born in 1756.” “Gandhi (1869 -1948)...” “<NAME> was born in <BIRTHDATE>

Previous approaches and how did we get here? Approach 3: (Etzioni et al., 2008) [Getting Better] The need for pre-existing datasets can be avoided by using language itself. Here surface patterns between mentions of concepts serve as relations. (OpenIE) Problem: extracts facts mentioned in text, but does not predict potential facts not mentioned in text. For example, OpenIE may find Name–historian-at–HARVARD but does not know Name–is-a-professor-at– HARVARD. Because this fact was not explicitly mentioned.

Previous approaches and how did we get here? Approach 4: (Yao et al., 2011) Way to improvement is to cluster textual surface forms that have similar meaning based on given database. Example: Cluster could have (historian-at, professor-at, scientist- at, worked-at) But that scientist-at does not necessarily imply professor-at and worked-at certainly does not imply scientist-at. 

Matrix Factorization and Universal Schemas Step 1: Defining the schema to be the union of all source schemas: original input forms, e.g. variants of surface pat- terns similarly to OpenIE, as well as relations in the schemas of many available pre-existing structured databases.  Schema is defined as the database + corpus that the system uses to extract relations.

Matrix Factorization and Universal Schemas Step 2: Represent the probabilistic knowledge base as a matrix with entity-entity pairs in the rows and relations in the columns. The probabilities are obtained by the logistic function on the scores obtained for a tuple and relation.

Matrix Factorization and Universal Schemas Step 3: Matrix factorization and collaborative filtering Intuition to use Matrix Factorization is to find the missing relations that previous models can’t capture. (Like recommendation systems)

How does the Matrix look like? The rows are a pair predicates or words. (like user ratings ) The columns are the relations. (like move names) We try to predict which relation(movie) would be applicable (liked by) to a pair (user)

Contributions of this work Latent Features in Matrix Factorization. Captures missing relationships (like missing values in RecSys) Neighborhood approach. Captures features that are analogous to genre. (historian-at, professor-at, scientist- at, worked-at)

Contributions of this work Entity Model Captures understanding that (p1 – relation – p2), can only have a small set of (p1 and p2) which helps in learning. [(Name) – scientist – Penn] and we know that it can never be [(Place) – scientist – Penn ]

Objective Function and Matrix Factorization Neighborhood features Latent features Entity Model r = relation, t = tuple

Objective Function Bayesian Personalized Ranking. SGD Used to provide more weight to positive examples and less to negative examples.

Data They use Freebase here, with NYT corpus. Freebase helps to capture the relations and predicate pairs easily. Easier to make the Matrix in Matrix Factorization. NYT corpus, has a lot text, which helps to build a very big dataset.

Data and Preprocessing Freebase Evaluation: Articles from NYT after 2000 are used as training corpus, articles from 1990 to 1999 as test corpus. Freebase facts 50/50 into train and test facts, and their corresponding tuples into train and test tuples. Coupled together. 200k Training set, 200k Test set (10k For evaluation) Surface patterns: Extract lexicalized dependency paths between predicates to produce 4k more examples. They take the union of both.

Evaluation on Freebase Dataset Lorem MAP Weighted MAP MI09 0.32 0.48 YA11 0.42 0.52 SU12 0.56 0.57 N 0.45 F 0.61 0.66 NF 0.67 NFE 0.63 0.69

Evaluation on Surface Patterns

Shortcomings: Haven't done any comparative analysis with any other algorithm for Surface Patterns. Have not released an explicit dataset, for future competitions and benchmarks, even though it is a very big dataset. They claim that the algorithm moves closer to generalization, but don’t explicitly define what generalization is. Have not discussed about transitivity relations.

Conclusions and Important Contributions Populating a database using Matrix Factorization. Well done experiments to show how neighborhood and entity feature maps can affect the accuracy. Great tool for information extraction. Because it captures more relations between predicates than any other algorithm at the time Code is available. Uses surface patterns to improve the accuracy. This is key, it helps capture relation like: “Mozart was born in 1756.” “Gandhi (1869 -1948)...” “<NAME> was born in <BIRTHDATE>

Future Work Matrix Factorization can be used for Textual Entailment too! As graphs can be written in the form of an adjacency matrix. Autoencoders? Instead of traditional Matrix Factorization. Could it use Scalable ILPs (Berant et al.)?