Global Learning of Type Entailment Rules Jonathan Berant, Ido Dagan, Jacob Goldberger June 21 st, 2011.

Slides:

Advertisements

Similar presentations

Latent Variables Naman Agarwal Michael Nute May 1, 2013.

Advertisements

Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.

Solving IPs – Cutting Plane Algorithm General Idea: Begin by solving the LP relaxation of the IP problem. If the LP relaxation results in an integer solution,

Tutorial at ICCV (Barcelona, Spain, November 2011)

Introduction to Algorithms

Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.

Experiments We measured the times(s) and number of expanded nodes to previous heuristic using BFBnB. Dynamic Programming Intuition. All DAGs must have.

Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,

EMIS 8373: Integer Programming Valid Inequalities updated 4April 2011.

1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.

1 Fast Primal-Dual Strategies for MRF Optimization (Fast PD) Robot Perception Lab Taha Hamedani Aug 2014.

IJCAI Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh,

The number of edge-disjoint transitive triples in a tournament.

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Distributed Message Passing for Large Scale Graphical Models Alexander Schwing Tamir Hazan Marc Pollefeys Raquel Urtasun CVPR2011.

Lecture 21: Spectral Clustering

Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.

Recent Development on Elimination Ordering Group 1.

Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.

A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.

Approximation Algorithms

Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.

Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.

2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:

Integer Programming Difference from linear programming –Variables x i must take on integral values, not real values Lots of interesting problems can be.

The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.

(work appeared in SODA 10’) Yuk Hei Chan (Tom)

Hardness Results for Problems

Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding ILPs with Branch & Bound ILP References: ‘Integer Programming’

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

OPTIMIZATION WITH PARITY CONSTRAINTS: FROM BINARY CODES TO DISCRETE INTEGRATION Stefano Ermon*, Carla P. Gomes*, Ashish Sabharwal +, and Bart Selman* *Cornell.

CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.

Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.

Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.

Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.

Design Techniques for Approximation Algorithms and Approximation Classes.

Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.

Planar Cycle Covering Graphs for inference in MRFS The Typhon Algorithm A New Variational Approach to Ground State Computation in Binary Planar Markov.

1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.

Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant,

June 21, 2007 Minimum Interference Channel Assignment in Multi-Radio Wireless Mesh Networks Anand Prabhu Subramanian, Himanshu Gupta.

Approximation Algorithms Department of Mathematics and Computer Science Drexel University.

Final Exam Review Final exam will have the similar format and requirements as Mid-term exam: Closed book, no computer, no smartphone Calculator is Ok Final.

Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London.

CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.

Column Generation By Soumitra Pal Under the guidance of Prof. A. G. Ranade.

NP-Complete problems.

CS270 Project Overview Maximum Planar Subgraph Danyel Fisher Jason Hong Greg Lawrence Jimmy Lin.

Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.

Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.

Machine Learning Queens College Lecture 7: Clustering.

EMIS 8373: Integer Programming Column Generation updated 12 April 2005.

A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.

Solving and Analyzing Side-Chain Positioning Problems Using Linear and Integer Programming Carleton L. Kingsford, Bernard Chazelle and Mona Singh Bioinformatics.

Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.

Tightening LP Relaxations for MAP using Message-Passing David Sontag Joint work with Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss.

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.

Approximation Algorithms based on linear programming.

TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.

Energy minimization Another global approach to improve quality of correspondences Assumption: disparities vary (mostly) smoothly Minimize energy function:

Learning Deep Generative Models by Ruslan Salakhutdinov

Data Driven Resource Allocation for Distributed Learning

Deep Feedforward Networks

EMIS 8373: Integer Programming

Integer Linear Programming

Fast Min-Register Retiming Through Binary Max-Flow

Presentation transcript:

Global Learning of Type Entailment Rules Jonathan Berant, Ido Dagan, Jacob Goldberger June 21 st, 2011

Task: Entailment (inference) Rules between Predicates Binary predicates: specify relations between a pair of arguments: – X cause an increase in Y – X treat Y Entailment rules: – Y is a symptom of X  X cause Y – X cause an increase in Y  X affect Y – X’s treatment of Y  X treat Y

Local Learning Sources of information (monolingual corpus): 1.Lexicographic: WordNet (Szpektor and Dagan, 2009), FrameNet (Ben-aharon et al, 2010) 2.Pattern-based (Chklovsky and Pantel, 2004) 3.Distributional similarity (Lin and Pantel, 2001; Sekine, 2005; Bhagat et al, 2007;Yates and Etzioni, 2009; Poon and Domingos 2010; Schoenmackers et al., 2010) Input: pair of predicates (p 1,p 2 ) Question: p 1 →p 2 ? (or p 1 ↔p 2 )

Global Graph Learning Nodes: predicates Edges: entailment rules X affect Y Entailment is transitive Strong connectivity components represent “equivalence”. X treat Y X lower Y X reduce Y The DIRT rule-base (Lin and Pantel, 2001) uses only pairwise information.

Corpus (P,a 1,a 2 ) 1 (P,a 1,a 2 ) 2 … WordNet Local Entailment Classifier – Step 1 Positives: Treat  Affect … Negatives: Raise  Lower … Treat  Affect: (DIRT=0.3, Binc=0.1, Cos=0.9,…) Raise  Lower: (DIRT=0.0, Binc=0.05, Cos=0.01,…) Classifier Distant supervision Dist Sim: 1.DIRT 2.Binc 3.Cos 4.SR

Global Learning of Edges – step 2 Problem is NP-hard: – Reduction from “Transitive Subgraph” (Yannakakis, 1978) Integer Linear Programming (ILP) formulation: – Optimal solution (not approximation) – Often an LP relaxation will provide an integer solution Input: Set of nodes V, weighting function f:V  V  R Output: Set of directed edges E respecting transitivity that maximizes sum of weights

Integer Linear Program (ACL 2010) Indicator variable X uv for every pair of nodes Objective function maximizes sum of edge scores Transitivity and background-knowledge provide constraints u w v 1 1 0

Typed Entailment Graphs province of (place,country) become part of (place,country) annex (country,place) invade (country,place) be relate to (drug,drug) be derive from (drug,drug) be process from (drug,drug) be convert into (drug,drug) A graph is defined for every pair of types “single-type” graphs contain “direct-mapping” edges and “transposed-mapping” edges Problems: How to represent “single-type” graphs Hard to solve graphs with >50 nodes

ILP for “single-type graphs” relate to (drug,drug) convert to (drug,drug) derive from (drug,drug) The functions f x and f y provide local scores for direct and reversed mappings Cut the size of ILP in half comparing to naïve solution

Decomposition Sparsity: Most predicates do not entail one another Proposition: If we can partition the nodes to U,W such that f(u,w)<0 for every u,w then any (u,w) is not an edge in the optimal solution UW

Decomposition Algorithm Input: Nodes V and function f:VxV  R 1.Insert undirected edges for any (u,v) such that f(u,v)>0 2.Find connected components V 1,…V k 3.For i = 1…k E i = ILP-solve(V i,f) Output: E 1,…,E k guaranteed to be optimal

Incremental ILP Given a good classifier most transitivity constraints are not violated Add constraints only if they are violated

Incremental ILP Algorithm Input: Nodes V and function f:VxV  R 1.ACT, VIO =  2.repeat a)E = ILP-solve(V,f,ACT) b)VIO = violated(V,E) c)ACT = ACT  VIO 3.Until |VIO| = 0 Output: E guaranteed to be optimal Empirically converges in 6 iterations and reduces number of constraints from 10 6 to Needs to be efficient

Experiment 1 - Transitivity 1 million TextRunner tuples over 10,672 typed predicates and 156 types Consist ~2,000 typed entailment graphs 10 gold standard graphs of sizes: 7, 14, 22, 30, 38, 53, 59, 62, 86 and 118 Evaluation: – F 1 on set of edges vs. gold standard – Area Under the Curve (AUC)

Results Micro-averageAUC RPF1F1 ILP Clsf Sherlock N/A SR DIRT BINC R/P/F 1 at point of maximal micro-F 1 Transitivity improves rule learning over typed predicates

Experiment 2 - Scalability Run ILP with and without Decompose Incremental-ILP over ~2,000 graphs Compare for various sparsity parameters: – Number of unlearned graphs – Number of learned rules

Results Sparsity# unlearned graphs# learned rules  (%) Reduction (%) /06,242/7, /116,790/19, /326,330/29, Scaling techniques add 3,500 new rules to best configuration Corresponds to 13% increase in relative recall

Conclusions Algorithm for learning entailment rules given both local information and a global transitivity constraint ILP formulation for learning entailment rules Algorithms that scale ILP to larger graphs Application for hierarchical summarization of information for query concepts Resource of 30,000 domain-independent typed entailment rules