Continuous optimization Problems and successes

Slides:



Advertisements
Similar presentations
Scaling Multivariate Statistics to Massive Data Algorithmic problems and approaches Alexander Gray Georgia Institute of Technology
Advertisements

Convex Programming Brookes Vision Reading Group. Huh? What is convex ??? What is programming ??? What is convex programming ???
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.
Engineering Optimization
Globally Optimal Estimates for Geometric Reconstruction Problems Tom Gilat, Adi Lakritz Advanced Topics in Computer Vision Seminar Faculty of Mathematics.
1 Welcome to the Kernel-Class My name: Max (Welling) Book: There will be class-notes/slides. Homework: reading material, some exercises, some MATLAB implementations.
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Optimization Tutorial
CMPUT 466/551 Principal Source: CMU
A KTEC Center of Excellence 1 Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.
Separating Hyperplanes
Computer vision: models, learning and inference
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Approximation Algoirthms: Semidefinite Programming Lecture 19: Mar 22.
Numerical Optimization
Semidefinite Programming
Nonlinear Optimization for Optimal Control
1 Introduction to Linear and Integer Programming Lecture 9: Feb 14.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
CES 514 – Data Mining Lecture 8 classification (contd…)
Direct Convex Relaxations of Sparse SVM Antoni B. Chan, Nuno Vasconcelos, and Gert R. G. Lanckriet The 24th International Conference on Machine Learning.
MURI Meeting July 2002 Gert Lanckriet ( ) L. El Ghaoui, M. Jordan, C. Bhattacharrya, N. Cristianini, P. Bartlett.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Unconstrained Optimization Problem
Optimal Adaptation for Statistical Classifiers Xiao Li.
Seminar in Advanced Machine Learning Rong Jin. Course Description  Introduction to the state-of-the-art techniques in machine learning  Focus of this.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
Collaborative Filtering Matrix Factorization Approach
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Computational Geometry Piyush Kumar (Lecture 5: Linear Programming) Welcome to CIS5930.
Non Negative Matrix Factorization
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
1 Kernel based data fusion Discussion of a Paper by G. Lanckriet.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
Introduction to Semidefinite Programs Masakazu Kojima Semidefinite Programming and Its Applications Institute for Mathematical Sciences National University.
Robust Optimization and Applications Laurent El Ghaoui IMA Tutorial, March 11, 2003.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
1 Enclosing Ellipsoids of Semi-algebraic Sets and Error Bounds in Polynomial Optimization Makoto Yamashita Masakazu Kojima Tokyo Institute of Technology.
Dd Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems Generalized Optimal Kernel-based Ensemble Learning for HS Classification.
Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Robust Optimization and Applications in Machine Learning.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Regularized Least-Squares and Convex Optimization.
Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
Learning to Align: a Statistical Approach
Semi-Supervised Clustering
Deep Feedforward Networks
Boosting and Additive Trees (2)
Polynomial Norms Amir Ali Ahmadi (Princeton University) Georgina Hall
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Nonnegative polynomials and applications to learning
Probabilistic Models with Latent Variables
Collaborative Filtering Matrix Factorization Approach
CSCI B609: “Foundations of Data Science”
CS639: Data Management for Data Science
Presentation transcript:

Continuous optimization Problems and successes Tijl De Bie Intelligent Systems Laboratory MVSE, University of Bristol United Kingdom tijl.debie@bristol.ac.uk

Motivation Back-propagation algorithm for training neural networks (gradient descent) Support vector machines Convex optimization `boom’ (NIPS, also ICML, KDD...) What explains this success? (Is it really a success?) (Mainly for CP-ers not familiar with continuous optimization)

(Convex) continuous optimization Convex optimization:

Convex optimization

Convex optimization General convex optimization approach Start with a guess, iteratively improve until optimum found E.g. Gradient descent, conjugate gradient, Newton method, etc For constrained convex optimization: Interior point methods Provably efficient (worst-case, typical case even better) Iteration complexity: Complexity per iteration: polynomial Out-of-the-box tools exist (SeDuMi, SDPT3, MOSEK...) Purely declarative Book: Convex Optimization (Boyd & Vandenberghe)

Convex optimization Convex optimization Logdet Cone Programming LP QP Geometric programming SOCP SDP

Linear Programming (LP) Linear objective Linear inequality constraints Affine equality constraints Applications: Relaxations of Integer LP’s Classification: linear support vector machines (SVM), forms of boosting (Lots outside DM/ML)

Convex Quadratic Programming (QP) Convex Quadratic constraints LP is a special case where Applications: Classification/regression: SVM Novelty detection: minimum volume enclosing hypersphere Regression + feature selection: lasso Structured prediction problems

Second-Order Cone Programming (SOCP) Second Order Cone constraints QCQP is a special case where Applications: Metric learning Fermat-Weber problem: find a point in a plane with minimal sum of distances to a set of points Robust linear programming

Semi-Definite Programming (SDP) Constraints requiring a matrix to be Positive Semi-Definite: SOCP is a special case: Applications: Metric learning Low rank matrix approximations (dimensionality reduction) Very tight relaxations of graph labeling problems (e.g. Max-cut) Semi-supervised learning Approximate inference in difficult graphical models

Geometric programming Objective and constraints of the form: Applications: Maximum entropy modeling with moment constraints Maximum likelihood fitting of exponential family distributions

Log Determinant Optimization (Logdet) Objective is the log determinant of a matrix: = -volume of parallelepiped spanned by columns of X Applications: Novelty detection: minimum volume enclosing ellipsoid Experimental design / active learning (which labels for which data points are likely to be most informative)

Eigenvalue problems Eigenvalue problems are not convex optimization problems Still, a relatively efficient and globally convergent, and a useful primitive: Dimensionality reduction (PCA) Finding relations between datasets (CCA) Spectral clustering Metric learning Relaxations of combinatorial problems

The hype Very popular in conferences like NIPS, ICML, KDD These model classes are sufficiently rich to do sophisticated things Sparsity: L1 norm/linear constraints  feature selection Low-rank of matrices: SDP constraint and trace norm (sparse PCA, labeling problems...) Declarative nature, little expertise needed Computational complexity is easy to understand

After the hype But: Tendency toward other paradigms: Polynomial-time, often with a high exponent E.g. SDP: and sometimes Convex constraints can be too limitative Tendency toward other paradigms: Convex-concave programming (Few guarantees, but works well in practice) Submodular optimization (Approximation guarantees, works well in practice)

CP vs Convex Optimization “CP: Choosing the best model is an art” (Helmut) “CP requires skill and ingenuity” (Barry) I understand in CP there is a hierarchy of propagation methods, but... Is there a hierarchy of problem complexities? How hard is it to see if a constraint will propagate well? Does it depend on the implementation? ...