INFERENCE Shalmoli Gupta, Yuzhe Wang. Outline Introduction Dual Decomposition Incremental ILP Amortizing Inference.

Slides:



Advertisements
Similar presentations
Iterative Rounding and Iterative Relaxation
Advertisements

Authors Sebastian Riedel and James Clarke Paper review by Anusha Buchireddygari Incremental Integer Linear Programming for Non-projective Dependency Parsing.
ECE 667 Synthesis and Verification of Digital Circuits
C&O 355 Lecture 23 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
Pattern Recognition and Machine Learning: Kernel Methods.
Optimization Tutorial
Copyright © Cengage Learning. All rights reserved. CHAPTER 5 SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION.
1 Fast Primal-Dual Strategies for MRF Optimization (Fast PD) Robot Perception Lab Taha Hamedani Aug 2014.
The Stackelberg Minimum Spanning Tree Game Jean Cardinal · Erik D. Demaine · Samuel Fiorini · Gwenaël Joret · Stefan Langerman · Ilan Newman · OrenWeimann.
1 Logic-Based Methods for Global Optimization J. N. Hooker Carnegie Mellon University, USA November 2003.
Identifying Early Buyers from Purchase Data Paat Rusmevichientong, Shenghuo Zhu & David Selinger Presented by: Vinita Shinde Feb 18 th, 2010.
Ugo Montanari On the optimal approximation of descrete functions with low- dimentional tables.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Approximation Algorithms
The Theory of NP-Completeness
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
Sensitivity of the Objective Function Coefficients
Distributed Combinatorial Optimization
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
June 2013 Inferning Workshop, ICML, Atlanta GA Amortized Integer Linear Programming Inference Dan Roth Department of Computer Science University of Illinois.
Approximation Algorithms for Stochastic Optimization Chaitanya Swamy Caltech and U. Waterloo Joint work with David Shmoys Cornell University.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
Approximation Algorithms for Prize-Collecting Forest Problems with Submodular Penalty Functions Chaitanya Swamy University of Waterloo Joint work with.
1 Lagrangean Relaxation --- Bounding through penalty adjustment.
C&O 355 Mathematical Programming Fall 2010 Lecture 16 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Gomory Cuts Updated 25 March Example ILP Example taken from “Operations Research: An Introduction” by Hamdy A. Taha (8 th Edition)“Operations Research:
Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
1/74 Lagrangian Relaxation and Network Optimization Cheng-Ta Lee Department of Information Management National Taiwan University September 29, 2005.
Lagrangean Relaxation
C&O 355 Lecture 24 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A A A A A A.
Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.
EMIS 8373: Integer Programming Column Generation updated 12 April 2005.
Tightening LP Relaxations for MAP using Message-Passing David Sontag Joint work with Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss.
PRIMAL-DUAL APPROXIMATION ALGORITHMS FOR METRIC FACILITY LOCATION AND K-MEDIAN PROBLEMS K. Jain V. Vazirani Journal of the ACM, 2001.
Approximation Algorithms based on linear programming.
Lecture 7: Constrained Conditional Models
Linear Programming Many problems take the form of maximizing or minimizing an objective, given limited resources and competing constraints. specify the.
Data Driven Resource Allocation for Distributed Learning
Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.
6.5 Stochastic Prog. and Benders’ decomposition
By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS
EMGT 6412/MATH 6665 Mathematical Programming Spring 2016
CIS 700 Advanced Machine Learning for NLP Inference Applications
EMIS 8373: Integer Programming
Margin-based Decomposed Amortized Inference
Gomory Cuts Updated 25 March 2009.
Multicommodity Distribution System Design
Linear Programming.
Chapter 6. Large Scale Optimization
Multicommodity Distribution System Design
CS5321 Numerical Optimization
Generally Discriminant Analysis
Integer Programming (정수계획법)
CSE 6408 Advanced Algorithms.
Flow Feasibility Problems
ICS 252 Introduction to Computer Design
CS154, Lecture 16: More NP-Complete Problems; PCPs
6.5 Stochastic Prog. and Benders’ decomposition
Dan Roth Department of Computer Science
Chapter 6. Large Scale Optimization
Presentation transcript:

INFERENCE Shalmoli Gupta, Yuzhe Wang

Outline Introduction Dual Decomposition Incremental ILP Amortizing Inference

What is Inference

Difficulty in Inference

Lagrangian Relaxation Makes the problem NP Hard Lagrangian Multiplier Linear Constraint Use subgradient Algorithm

Dual Decomposition Specifies that two vectors coherent Lagrangian Multiplier Linear Constraint

Dual Decomposition

Non-Projective Dependency Parsing rootyesterdayJohnasawabirdwaswhichParrot

Arc-Factored Model rootyesterdayJohnasawabirdwaswhichParrot

Sibling Model rootyesterdayJohnasawabirdwaswhichParrot

Optimal Modifiers for each Head-word rootyesterdayJohnasawabirdwaswhichParrot

A Simple Algorithm rootyesterdayJohnasawabirdwaswhichParrot

Parsing : Dual Decomposition Sibling Model Arc-Factored Model Individual Decoding MST

Parsing : Dual Decomposition

Algorithm Outline

Comparison LP/ILP MethodAccuracyIntegral SolutionTime LP(M) ILP (Gurobi) DD (K=5000) DD (K = 250)

References A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing, Alexander M. Rush and Michael Collins, JAIR'13 A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing Dual Decomposition for Parsing with Non-Projective Head Automata, Terry Koo, Alexander M. Rush, Michael Collins, Tommi Jaakkola, and David Sontag, EMNLP, Dual Decomposition for Parsing with Non-Projective Head Automata

INFERENCE Part 2

Inremental Integer Linear Programming 1. Many problems and models can be reformulated into ILP ex. HMM 2. Incorporate more constraints

Non-projective Dependency Parsing "come" is a Head "I'll" is a Child of "come" "subject" is the Label between "come" and "I'll" Non-projective: Dependency can cross

Model Object function x is sentence, y is a set of labelled dependencies, f(i,j,l) is feature for token i, j with label l

Constraints T1: For every non-root token in x there exists exactly one head; the root token has no head. T2: There are no cycles A1: Head are not allowed to have more than one outgoing edge labeled l for all l in a set of labels U C1: In a symmetric coordination there is exactly one argument to the right of the conjunction and at least one argument to the left C4: Arguments of a coordination must have compatible word classes. P1: Two dependencies must not cross if one of their labels is in a set of labels P.

Reformulate into ILP Labelled edges: Existence of a dependency between tokens i and j: Objective function:

Reformulate constraints Only one head(T1): Label Uniqueness(A1): Symmetric Coordination(C1): No Cycles(T2):

Algorithm B x : Constraints added in advance O x : Objective function V x : Variables W: Violated constraints

Experiment results LAC: Labelled accuracy UAC: Unlabelled accuracy LC: Percentage of sentences with 100% labelled accuracy UC: Percentage of sentences with 100% unlabelled accuracy LACUACLCUC bl84.6%88.9%27.7%42.2% cnstr85.1%89.4%29.7%43.8%

Runtime Evaluation Tokens >50 Count Avg. ST(bl) 0.27ms0.98ms3.2ms7.5ms14ms23ms Avg.ST( cnstr) 5.6ms52ms460ms1.5s7.2s33s

Amortizing Inference Algorithm Idea: Under some conditions, we can re-use earlier solutions for certain instances Why? 1. Only a small fraction of structure occurs, compared with large space of possible structures 2. The distribution of observed structures is heavily skewed towards a small number of them # of Examples>># of ILPs >># of Solutions

Amortizing Inference Conditions Considering solving a 0-1 ILP problem Idea of Theorem 1: For every inference variable that is active in the solution, increasing the corresponding objective value will not change the optimal assignment to the variables. For variable whose value in solution is 0, decreasing the objective value will not change the optimal solution

Amortizing Inference Conditions Theorem 1. Let p denote an inference problem posed as an integer linear program belonging to an equivalence class [P], and q~[P] be another inference instance in the same equivalence class. Define d(c)=c q -c p to be the difference of the objective coefficients of the ILPs. Then y p is the solution of problem q if for each i in {1,...,n p }, we have (2y p,i -1)d(c i )>0

Amortizing Inference Conditions Idea of Theorem 2: Suppose y* is optimal solution, c p y<=c p y*,c q y<=c q y* Then (x 1 c p +x 2 c q )y<=(x 1 c p +x 2 c q )y* Theorem 2: y* is solution which has objective coefficients (x 1 c p +x 2 c q )

Amortizing Inference Conditions Theorem 3: Define D(c,x)=c q -sum(x j c p,j ), if there is some x s.t. x positive and for any i (2y p,i -1)D(c,x)>=0 then q has the same optimal solution.

Conditions

Approximation schemes 1. Most frequent solution 2. Top-K approximation 3. Approximations to theorem 1 and theorem 3

Experiment Result

Thank you!