This is an example of a bad talk (Disclaimer: The paper that should have been presented in this talk is a classic in the field, a great paper: this talk,

Slides:

Advertisements

Similar presentations

Completeness and Expressiveness

Advertisements

5.1 Real Vector Spaces.

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Linear Programming. Introduction: Linear Programming deals with the optimization (max. or min.) of a function of variables, known as ‘objective function’,

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.

Fast Algorithms For Hierarchical Range Histogram Constructions

P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2013 – 12269: Continuous Solution for Boundary Value Problems.

Dragan Jovicic Harvinder Singh

Point-wise Discretization Errors in Boundary Element Method for Elasticity Problem Bart F. Zalewski Case Western Reserve University Robert L. Mullen Case.

How Bad is Selfish Routing? By Tim Roughgarden Eva Tardos Presented by Alex Kogan.

P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2014 – 35148: Continuous Solution for Boundary Value Problems.

Ch.7 The Capital Asset Pricing Model: Another View About Risk

Basic Feasible Solutions: Recap MS&E 211. WILL FOLLOW A CELEBRATED INTELLECTUAL TEACHING TRADITION.

Chapter 4: Network Layer

EARS1160 – Numerical Methods notes by G. Houseman

PARTIAL DERIVATIVES 14. PARTIAL DERIVATIVES 14.6 Directional Derivatives and the Gradient Vector In this section, we will learn how to find: The rate.

Entropy Rates of a Stochastic Process

Visual Recognition Tutorial

Offset of curves. Alina Shaikhet (CS, Technion)

Function Optimization Newton’s Method. Conjugate Gradients

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.

PART 7 Constructing Fuzzy Sets 1. Direct/one-expert 2. Direct/multi-expert 3. Indirect/one-expert 4. Indirect/multi-expert 5. Construction from samples.

Approximation Algorithms

Visual Recognition Tutorial

Minimal Neural Networks Support vector machines and Bayesian learning for neural networks Peter Andras

Linear Equations in Linear Algebra

Boyce/DiPrima 9th ed, Ch 11.2: Sturm-Liouville Boundary Value Problems Elementary Differential Equations and Boundary Value Problems, 9th edition, by.

Inferences About Process Quality

PHY 042: Electricity and Magnetism

1.1 Chapter 1: Introduction What is the course all about? Problems, instances and algorithms Running time v.s. computational complexity General description.

LINEAR PROGRAMMING SIMPLEX METHOD.

Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.

Systems of Linear Equation and Matrices

Linear Algebra Chapter 4 Vector Spaces.

Section 2: Finite Element Analysis Theory

Simplex method (algebraic interpretation)

Finite Element Method.

Chapter 3. Pitfalls Initialization Ambiguity in an iteration

Pareto Linear Programming The Problem: P-opt Cx s.t Ax ≤ b x ≥ 0 where C is a kxn matrix so that Cx = (c (1) x, c (2) x,..., c (k) x) where c.

1 1.3 © 2012 Pearson Education, Inc. Linear Equations in Linear Algebra VECTOR EQUATIONS.

Orthogonality and Least Squares

Practical Dynamic Programming in Ljungqvist – Sargent (2004) Presented by Edson Silveira Sobrinho for Dynamic Macro class University of Houston Economics.

Section 2.3 Properties of Solution Sets

Copyright © Curt Hill Quantifiers. Copyright © Curt Hill Introduction What we have seen is called propositional logic It includes.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Chapter 3 Algorithms Complexity Analysis Search and Flow Decomposition Algorithms.

HEAT TRANSFER FINITE ELEMENT FORMULATION

Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

Introduction to Optimization

Inequalities for Stochastic Linear Programming Problems By Albert Madansky Presented by Kevin Byrnes.

Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Ch. 3 Iterative Method for Nonlinear problems EE692 Parallel and Distribution.

Introduction and Preliminaries D Nagesh Kumar, IISc Water Resources Planning and Management: M4L1 Dynamic Programming and Applications.

Business Mathematics MTH-367 Lecture 14. Last Lecture Summary: Finished Sec and Sec.10.3 Alternative Optimal Solutions No Feasible Solution and.

Approximation Algorithms based on linear programming.

Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.

Functions of Complex Variable and Integral Transforms

Chapter 7. Classification and Prediction

Computation of the solutions of nonlinear polynomial systems

Chap 9. General LP problems: Duality and Infeasibility

Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.

Chap 3. The simplex method

Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.

Linear Equations in Linear Algebra

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

I.4 Polyhedral Theory (NW)

I.4 Polyhedral Theory.

Chapter 2. Simplex method

Presentation transcript:

This is an example of a bad talk (Disclaimer: The paper that should have been presented in this talk is a classic in the field, a great paper: this talk, not the paper, is rotten).

On the Foundations of Relaxation Labeling Processes By An Anonymous Student

Overview Motivation I. Introduction to Labeling Problems II. Continuous Relaxation Labeling Processes III. Consistency IV. Overview of Results V. Average Local Consistency VI. Geometric Structure of Assignment Space VII. Maximizing Average Local Consistency VIII. The Relaxation Labeling Algorithm IX. A Local Convergence Result X. Generalizations to Higher Order Compatibilities XI. Comparisons with Standard Relaxation Labeling Updating Schemes XII. Summary and Conclusions Appendix A

Motivation Two concerns: –The decomposition of a complex computation into a network of simple “ myopic ”, or local, computations –The requisite use of context in resolving ambiguities

Motivation Relaxation operations: To solve systems of linear equations, etc. Relaxation labeling: –Extension of relaxation operations –Solutions involve symbols rather than functions. –Assign weights attached to labels Main difference: Labels do not necessarily have a natural ordering

Motivation Algorithm: –Parallel –Each process makes use of the context to assist in a labeling decision Goal –Provide a formal foundation Characterize of what the algorithm is doing to attribute the cause of failure to an inadequate theory

Motivation Treatment –Abstract To relate discrete relaxation to a description of the usual relaxation labeling schemes To develop a theory of consistency To formalize its relationship to optimization Several mathematical results

I. Introduction to Labeling Problems In a labeling problem, one is given: –A set of objects –A set of labels for each object –A neighbor relation over the objects –A constraint relation over labels at pairs (or n-tuples) of neighboring objects Solution: An assignment of labels to each object in a manner which is consistent with respect to the constraint relation

I. Introduction to Labeling Problems λ: Variable to either denote a label or to serve as an index through a set of labels. Λ i : Set of labels attached to node i Λ ij : Constraint relation listing all pairs (λ,λ’) such that λat i is consistent with λ’ at j m : Number of labels in Λ i n : Number of nodes in G S i (λ) : Support function for label λon i from a discrete labeling (count the number of neighbors of an object i which has labels compatible to a given label λat i) Max used because more than one label can be 1 at j.

I. Introduction to Labeling Problems Discrete relaxation – label discarding rule: discard a label λat a node i if there exists a neighbor j of i such that every label λ’ currently assigned to j is incompatible with λ at i ( for all λ’ assigned to j). –A label is retained if at every neighboring node there exists at least one compatible label.

II. Continuous Relaxation Labeling Processes Limit in I: –Pairs of labels are either compatible or completely incompatible –Can ’ t express a preference or relative dislike Solution: –Continuous relaxation labeling –Weighted values representing relative preferences

II. Continuous Relaxation Labeling Processes Compatibility r ij (λ,λ’) : relative support for label λat object i that arises from label λ’ at object j. –Positive: locally consistent pair –Negative: implied inconsistency –Magnitude of r ij (λ,λ’) is proportional to the strength of the constraint –i and j are not neighbors: r ij (λ,λ’) = 0

II. Continuous Relaxation Labeling Processes Difficulty: Formulating a consistent labeling –A consistent labeling is one in which the constraints are satisfied –Logical constraints replaced by weighted assertions: A new foundation is required to describe the structural framework and the precise meaning of the goal of consistency

II. Continuous Relaxation Labeling Processes Structural frameworks attempted: –Define consistency as the stopping points of algorithm Circular, no clue –Regard the label weights as probabilities, use Bayesian analysis, statistical quantities, etc. Unsuccessful, various independence assumptions required –Optimization theory: a vector composed of the current label weights, an evidence vector involving each label ’ s neighborhood weights Authors extended it –Linear programming: constraints are obtained from arithmetical equivalents, preferences can be incorporated only by adding new labels Different, interesting and not incompatible with authors ’ development

II. Continuous Relaxation Labeling Processes Prototype (original) algorithm: –An iterative, parallel procedure analogous to the label discarding rule used in discrete relaxation –For each object and each label, one computes (as support function) using the current assignment values p i (λ). Then new assignment values are defined according to

III. Consistency Require a system of inequalities Permit the logical constraints to be ordered, or weighted Allow an analytic, rather than logical or symbolic, study Definition of consistency: –For unambiguous labelings –For weighted labeling assignments

III. Consistency Unambiguous labeling assignment: A mapping from the set of objects into the set of all labels, each object is associated with exactly one label Space of unambiguous labelings:

III. Consistency Weighted labeling assignments: replace by the condition K is simply the convex hull of K *

III. Consistency Consistency depends on constraints between label numbers: the compatibility matrix, elements of which indicate both positive and negative constraints. Definition 3.1: Labeling spaces require, so replace max with a sum in support function (linear) (refer to I)

III. Consistency Higher order combinations of object labels: –Multidimensional matrix of compatibilities: –Support at object i for label λ: Definition 3.2: The unambiguous labeling is consistent providing Consistency in K* corresponds to satisfying a system of inequalities:

III. Consistency At a consistent unambiguous labeling, the support, at each object, for the assigned label is the maximum support at that object. Given a set of objects, labels, and support functions, there may be many consistent labelings. Condition for consistency in K* (restate)

III. Consistency Definition 3.3: Condition for consistency for weighted labeling assignment Definition 3.4: Condition for strictly consistency (for ) An unambiguous assignment that is consistent in K will also be consistent in K*, since. The converse is also true (3.5).

III. Consistency Proposition 3.5: An unambiguous labeling which is consistent in K * is also consistent in K.

IV. Overview of Results Algorithm for converting a given labeling into a consistent one: –Two approaches: Optimization theory Finite variational calculus –Lead to the same algorithm Achieving consistency is equivalent to solving a variational inequality:

IV. Overview of Results Two paths to study consistency and derive algorithms for achieving it.

V. Average Local Consistency Goal: Update a nearly consistent labeling to a consistent one should be large => Average local consistency should be large. –Two problems: Maximizing a sum doesn ’ t necessarily maximize each individual terms The individual components s i (λ) depend on, which varies during the maximization process.

V. Average Local Consistency Maximizing is the same as maximizing,which is not the same as maximizing the n quantities

V. Average Local Consistency Special case: the compatibility matrix is symmetric, maximizing leads to consistent labeling assignments. General case: the compatibility matrix is not symmetric. VIII will figure out algorithm. –Locally maximizes is the same as if the matrix is symmetrized.

V. Average Local Consistency Gradient ascent: to find local maxima of a smooth functional, which successively move the current by a small step to a new. The amount of increase in is related to the directional derivative of A in the direction of step. The gradient :

V. Average Local Consistency When the compatibilities are symmetric: (cmp Dfn 3.1) : intermediate updating “ direction ”

VI. Geometric Structure of Assignment Space Goal: To discuss gradient ascent on K, and to visualize the more general updating algorithms. A simple example: 2 (n) objects, with 3 (m) possible labels for each object (2 - simplex)

VI. Geometric Structure of Assignment Space Vector : two points, each lying in a copy of the space shown in Fig.2. K: set of all pairs of points in two copies of the triangular space in Fig.2 K with n objects each with m labels: –Space: n copies of an (m-1)-simplex –K: set of all n-tuples of points, each points lying in a copy of the (m-1)-dimensional surface –A weighted labeling assignment is a point in the assignment space K. –An unambiguous labeling: one of the “ corners ” –Each simplex has m corners

VI. Geometric Structure of Assignment Space Tangent space: A surface lies “ tangent ” to the entire surface if place it at the given point, means the set of all directions –K and tangent space are coincide when initiate –Interior of a surface: a vector space –Boundary of surface: a convex subset of a vector space

VI. Geometric Structure of Assignment Space : A labeling assignment in K : Any other assignment in K Difference vector (direction):

VI. Geometric Structure of Assignment Space Set of all tangent vectors at (surface)( roams around K): Set of tangent vectors at the interior point consists of an entire subspace:

VI. Geometric Structure of Assignment Space lies on a boundary of K: a proper subset of above space:

VII. Maximizing Average Local Consistency To find a consistent labeling: –Constraints are symmetric: Gradient ascent –Constraint are not symmetric: same algorithm (VIII) –The increase in due to a small step of length αin the direction ū is approximately the directional derivative: ||u|| = 1 (the greatest increase in can be expected if a step is taken in the tangent direction ū)

VII. Maximizing Average Local Consistency To find direction of steepest ascent: grad should be maximized (solution always exists)

VII. Maximizing Average Local Consistency Lemma 7.3: If lies in the interior of K, then the following algorithm solves problem 7.1 –May fail when is a boundary point of K (solved using algorithm in Appendix A)

Appendix A. Updating Direction Algorithm Give algorithm to replace the updating formulas in common use in relaxation labeling processes. Give projection operator (a finite iterative algo) based on consistency theory and permitting proof of convergence results. Solution to the projection problem: returned vector u. Normalization: ||ū|| = 1 (or ū = 0) Step length: α i

VII. Maximizing Average Local Consistency Algorithm 7.4: find consistent labelings when the matrix of compatibilities is symmetric –Successive iterates are obtained by moving a small step in the direction of the projection of the gradient –Algorithm stops when the projection = 0

VII. Maximizing Average Local Consistency Proposition 7.5: Suppose is a stopping point of Algo 7.4, then if the matrix of compatibilities is symmetric, is consistent.

VIII. The Relaxation Labeling Algorithm Previous entire analysis of average local consistency relies on the assumption of symmetric compatibilities. Example: constraints between letters in English Theorem 4.1 is general (variational inequality)

VIII. The Relaxation Labeling Algorithm Observation 8.1 With defined as above, the variational inequality is equivalent to the statement A labeling is consistent iff points away from all tangent directions Algorithm 8.2 ( The Relaxation Labeling Algorithm )

VIII. The Relaxation Labeling Algorithm Proposition 8.3: suppose is a stopping point of Algo 8.2, then is consistent. Questions: –Are there any consistent labeling for the relaxation labeling algorithm to find? (Answered by 8.4) –Assuming that such points exist, will the algorithm find them? (answered in IX) –Even if a relaxation labeling process converges to a consistent labeling, is the final labeling better than the initial assignment? (not well defined)

VIII. The Relaxation Labeling Algorithm Example of English Proposition 8.4: The variational inequality of Theorem 4.1 always has at least one solution. Thus consistent labelings always exist, for arbitrary compatibility matrices. Usually, more than one solution will exist.

IX. A Local Convergence Result As the step size of the relaxation labeling algorithm 7.4 or 8.2 becomes infinitesimal, these discrete algorithms approximate dynamical system Hypothesis of 9.1: the labeling at every object is close to the consistent assignment

IX. A Local Convergence Result Assume that is strictly consistent in order to prove that it ’ s a local attractor of the relaxation labeling dynamical system If is consistent, but not strictly consistent, maybe: –A local attractor of the dynamical system –A saddle point –An unstable stopping point

X. Generalizations to Higher Order Compatibilities Consistency: be defined using support functions (depend on arbitrary orders of compatibilities): –1-order compatibilities: –3-order: Symmetry condition:

X. Generalizations to Higher Order Compatibilities –k-order compatibilities: Symmetry condition:

X. Generalizations to Higher Order Compatibilities Compatibilities higher than second order, or non- polynomial compatibilities: –Difficulty: combinatorial growth in the number of required computations –Most implementations of relaxation labeling processes have limited the computations to second-order compatibilities

XI. Comparisons with Standard Relaxation Labeling Updating Schemes Algo 8.2: Updates weighted labeling assignments: then updating in the direction defined by the projection of onto Other two standard formulas for relaxation labeling: –

XI. Comparisons with Standard Relaxation Labeling Updating Schemes – Denominator is a normalization term Numerator can be rewritten as:

XII. Summary and Conclusions Relaxation labeling processes: Mechanisms for employing context and constraints in labeling problems. Background: Lacking a proper model characterizing the process and its stopping points, the choice of the coefficient values and the updating formula are subject only to empirical justification. Achievement: Develop the foundations of a theory that figures consistency to explaining what relaxation labeling accomplishes, and leads to a relaxation algorithm with an updating formula using a projection operator.

XII. Summary and Conclusions Discrete relaxation: a label is discarded if it is not supported by the local context of assigned labels. Weighted label assignment: An unambiguous labeling is consistent if the support for the instantiated label at each object is greater than or equal to the support for all other labels at that object. Relaxation labeling process defined by Algo 8.2 with the projection operator specified in Appendix A stops at consistent labelings. Dynamic process will converge to a consistent labeling if one begins sufficiently near a consistent.

XII. Summary and Conclusions Symmetry properties: relaxation labeling algorithm is equivalent to gradient ascent using average local consistency function. Future work: –efficient implementations of the projection operator –Choice of the step size –Normalization methods

Thank you