By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS

Slides:

Advertisements

Similar presentations

Latent Variables Naman Agarwal Michael Nute May 1, 2013.

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng.

Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.

Totally Unimodular Matrices

CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Mar, 4, 2015 Slide credit: some slides adapted from Stuart.

1 MERLIN A polynomial solution for the Traveling Salesman Problem Dr. Joachim Mertz, 2005.

A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan RothWen-tau Yih Department of Computer Science University of Illinois.

Totally Unimodular Matrices Lecture 11: Feb 23 Simplex Algorithm Elliposid Algorithm.

Approximation Algorithms

Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.

Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding ILPs with Branch & Bound ILP References: ‘Integer Programming’

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Decision Procedures An Algorithmic Point of View

Latent (S)SVM and Cognitive Multiple People Tracker.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.

Updated 21 April2008 Linear Programs with Totally Unimodular Matrices.

Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.

CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Oct, 30, 2015 Slide credit: some slides adapted from Stuart.

1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

Global Inference via Linear Programming Formulation Presenter: Natalia Prytkova Tutor: Maximilian Dylla

Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.

Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Lecture 7: Constrained Conditional Models

Inference and Learning via Integer Linear Programming

Chapter 7. Classification and Prediction

Lecture 15. Pattern Classification (I): Statistical Formulation

Lap Chi Lau we will only use slides 4 to 19

LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.

Deep Feedforward Networks

Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.

Large Margin classifiers

Integer Linear Programming Formulations in Natural Language Processing

Part 2 Applications of ILP Formulations in Natural Language Processing

Topics in Algorithms Lap Chi Lau.

Erasmus University Rotterdam

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Improving a Pipeline Architecture for Shallow Discourse Parsing

Data Mining Lecture 11.

Margin-based Decomposed Amortized Inference

Machine Learning Week 1.

Chapter 6. Large Scale Optimization

CS 2750: Machine Learning Support Vector Machines

Collaborative Filtering Matrix Factorization Approach

CSCI B609: “Foundations of Data Science”

Ying shen Sse, tongji university Sep. 2016

Haplotype Inference Yao-Ting Huang Kun-Mao Chao.

Haplotype Inference Yao-Ting Huang Kun-Mao Chao.

Michal Rosen-Zvi University of California, Irvine

Generally Discriminant Analysis

Chapter 2. Simplex method

Simplex method (algebraic interpretation)

Dan Roth Computer and Information Science University of Pennsylvania

Dan Roth Department of Computer Science

The Improved Iterative Scaling Algorithm: A gentle Introduction

Haplotype Inference Yao-Ting Huang Kun-Mao Chao.

Chapter 6. Large Scale Optimization

Chapter 2. Simplex method

“Easy” Integer Programming Problems: Network Flow Problems

Machine Learning: Lecture 5

Presentation transcript:

By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS 700-006 Global Inference for Entity and Identification via a Linear Programming Formulation By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS 700-006

Problem Setup Goal: Want to identify both the named entities in a sentence, as well as the relations between them. Previous works had attempted to find these locally. This leads to global inconsistencies. This paper discusses formulating the problem as a linear program. Types of named entities: person, location, organization, etc. Types of relations between named entities: born_in, spouse, works_for, etc.

Example E1 E2 E3 R12 R23 Bernie’s wife, Jane, is a native of Brooklyn other 0.05 per 0.85 loc 0.10 other 0.05 per 0.85 loc 0.10 other 0.10 per 0.60 loc 0.30 other 0.10 per 0.60 loc 0.30 other 0.05 per 0.50 loc 0.45 other 0.05 per 0.50 loc 0.45 other 0.05 per 0.50 loc 0.45 Bernie’s wife, Jane, is a native of Brooklyn E1 E2 E3 R12 R23 irrelevant 0.05 spouse_of 0.45 born_in 0.50 irrelevant 0.05 spouse_of 0.45 born_in 0.50 irrelevant 0.05 spouse_of 0.45 born_in 0.50 irrelevant 0.10 spouse_of 0.05 born_in 0.85 irrelevant 0.10 spouse_of 0.05 born_in 0.85 In this example, there are three entities: Bernie, Jane, and Brooklyn. This means that there will be six total relations between each pair of entities, but let’s consider two of them for now. From Dan Roth’s Introductory Lecture

Objective Function From Yih (2004)

Objective Function xa,k is a binary variable. xa,k = 1 when variable a is assigned label k. Otherwise, xa,k = 0. From Yih (2004)

Objective Function ca,k is the confidence score of variable a having label k LE is the set of entity labels, and LR is the set of relation labels. From Yih (2004)

Constraints Definition: A constraint is a function that maps a relation label and an entity label to either 0 or 1. Can also constrain the labels of two relation variables. As an example, we can define constraints on the mutual relation spouse_of: (spouse_of, spouse_of) = 1 (spouse_of, lr) = 0, for lr ≠ spouse_of (lr, spouse_of) = 0, for lr ≠ spouse_of 0 means that the labels contradict the constraint, and 1 means that the labels satisfy the constraint. Spouse_of is a mutual relation, because if Barack Obama is the spouse of Michelle Obama, then it follows that Michelle Obama is necessarily the spouse of Barack Obama. This does not always occur; for example, born_in is not a mutual relation.

Example Integer Linear Program From Yih (2004)

Example Integer Linear Program The first two constraints simply forcing the x’s to be binary variables. From Yih (2004)

Example Integer Linear Program The second two constraints prevent the model from selecting more than one label for each entity and relation. From Yih (2004)

Example Integer Linear Program The final two constraints impose global consistency. The first set shows that the only time that spouse_of can occur is when there are two entities labeled as person. The second set shows that the only time that born_in can occur is when there is an entity labeled as person, and another entity labeled as a location. From Yih (2004)

Overall Cost Function For the general ILP, the paper converts the problem to minimize the cost.

Overall Cost Function For the general ILP, the paper converts the problem to minimize the cost. This is the cost associated with labeling variable u with label fu. cu(fu) = -log(p), where p is the posterior probability. P is the posterior probability that the model gives for labeling variable u with label fu. d1() is the cost function for the first argument, and d2() is the cost function for the second argument. This cost function is computationally intractable, so we need to first relax into a linear programming formulation of the problem, and then solve the corresponding ILP.

Overall Cost Function For the general ILP, the paper converts the problem to minimize the cost. These are the cost functions for the first and second arguments of a relation. 0 if the constraint is obeyed, and ∞ otherwise.

Full Objective This is just the expanded objective function.

Full Constraints These are all the constraints we have seen in the toy example we looked at, just generalized.

Full Constraints The first two constraints again require that each variable can only be assigned one label.

Full Constraints The next three make sure that the assignment to each variable is consistent with the assignment to its neighboring variables.

Full Constraints If we have two entities Ei, Ej with relation Rij, then Ei = N1(Rij) and E2 = N2(Rij)

Full Constraints The last three constraints are constraining each variable to be binary.

Linear Program Relaxation To solve the ILP (an NP-hard problem), we can relax the integral constraints as follows: If a linear program returns an integer solution, than it is the optimal solution to the original problem.

Unimodular Theorem: If A is an (m,n)-integral matrix with full row rank m, then {x | x ≥ 0, Ax = b} for each integral vector b, iff A is unimodular. Definition: A matrix A is unimodular if all entries of A are integers, and the determinant of every square submatrix of A of order m is 0, +1, or -1 This is a theorem that will come up in more detail in a paper that we will cover later, but it is also noted here. In this paper, A was not proven to be unimodular. However, in practice this linear program still always returned integer solutions for every case the paper tested.

Experiments - Data This paper annotated the named entities and relations in sentences from the TREC documents. Chose 1,437 sentences that have at least one active relation. Overall statistics: 5,336 entities, 19,048 pairs of entities Active relation: a relation that is not classified as other. Other relations outnumber all others, because most pairs of entities are not related.

Experiments - Data Entity Type Occurrences person 1,685 location 1,968 organization 978 other 705 Relation Type Occurrences located_in 406 work_for 394 orgBased_in 451 live_in 521 kill 268 other 17,007

Approaches First approach was to train the entity and relation classifiers separately. The learning algorithm uses a regularized variation of the Winnow updated rule from SNoW. SNoW is a multi-class classifier tailored for large- scale learning tasks. This paper uses the raw activation value SNoW outputs to estimate the posterior probabilities. Entity classifier: Extracted a set of features from words around the target phrase. Included words, POS tags, and bigrams/trigrams of the mixture of words/tags. Relation classifier: Features that are extracted from the two argument entities of the relation (similar to entity classifier). Also included conjunctions of the features, and patterns extracted from the sentence. SNoW is a multi-class classifier tailored for large scale learning tasks.

Approaches Also tested several pipeline models. E  R R  E R ⇔ E First trains the entity classifier. Includes the predictions on the two entity arguments of a relation as features for the relation classifier. R  E Instead trains the relation classifier first. R ⇔ E Uses the entity classifier in the R  E model and the relation classifier in the E  R model.

Entity Results This paper tests using 5-fold cross validation. Omniscient: Trains two classifiers separately. Assumes that the entity classifier knows the correct relation labels, and the relation classifier knows the correct entity labels.

Entity Results

Entity Results We can see in all of these models that using global inference improves the accuracy. An interesting thing to note is that even the Omniscient model, which may be able to learn the constraints from the data, still sees improvement from employing inference.

Relation Results

Relation Results

Relation Results

Relation Results

Relation Results

Additional Experiments Quality of the decisions When the inference procedure is not applied, 5-25% of the predictions are incoherent. The global inference procedure never generates incoherent predictions. Forced decision test Assumes that the system knows which sentence have the “kill” relation at decision time. Includes the constraint: Improves the F1 score of the “kill” relation to 86.2%. A coherent prediction occurs if, for a relation variable and its two corresponding entity variables, the labels of these variables are predicted correctly and the relation is active. Quality is the number of coherent predictions divided by the sum of coherent and incoherent predictions. The system does not know which pair of entities have the kill relation.

References D. Roth and W. Yih, A Linear Programming Formulation for Global Inference in Natural Language Tasks. CoNLL 2004 Wen-tau Yih. Notes on Global Inference as Integer Linear Programming. 2004.