Machine learning optimization Usman Roshan. Machine learning Two components: – Modeling – Optimization Modeling – Generative: we assume a probabilistic.

Slides:



Advertisements
Similar presentations
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Advertisements

Regularized risk minimization
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Neural and Evolutionary Computing - Lecture 4 1 Random Search Algorithms. Simulated Annealing Motivation Simple Random Search Algorithms Simulated Annealing.
Optimization Tutorial
Optimization 吳育德.
Supervised Learning Recap
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Machine learning optimization
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 14: Embeddings 1Lecture 14: Embeddings.
Approximation Algoirthms: Semidefinite Programming Lecture 19: Mar 22.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Gradient Methods April Preview Background Steepest Descent Conjugate Gradient.
Classification and risk prediction Usman Roshan. Disease risk prediction What is the best method to predict disease risk? –We looked at the maximum likelihood.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Gradient Methods May Preview Background Steepest Descent Conjugate Gradient.
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
Visual Recognition Tutorial
NP and NP- Completeness Bryan Pearsaul. Outline Decision and Optimization Problems Decision and Optimization Problems P and NP P and NP Polynomial-Time.
Evaluating Performance for Data Mining Techniques
Collaborative Filtering Matrix Factorization Approach
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Discriminant Functions
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Linear Discrimination Reading: Chapter 2 of textbook.
Lecture 19: Solving the Correspondence Problem with Graph Cuts CAP 5415 Fall 2006.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Fast Learning in Networks of Locally-Tuned Processing Units John Moody and Christian J. Darken Yale Computer Science Neural Computation 1, (1989)
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Optimal Relay Placement for Indoor Sensor Networks Cuiyao Xue †, Yanmin Zhu †, Lei Ni †, Minglu Li †, Bo Li ‡ † Shanghai Jiao Tong University ‡ HK University.
Clustering Usman Roshan CS 675. Clustering Suppose we want to cluster n vectors in R d into two groups. Define C 1 and C 2 as the two groups. Our objective.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
BSP: An iterated local search heuristic for the hyperplane with minimum number of misclassifications Usman Roshan.
Support vector machines
Clustering Usman Roshan.
LECTURE 11: Advanced Discriminant Analysis
CSE 4705 Artificial Intelligence
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Machine Learning Basics
Probabilistic Models for Linear Regression
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Collaborative Filtering Matrix Factorization Approach
Support vector machines
BNFO 602 Phylogenetics – maximum likelihood
BNFO 602 Phylogenetics Usman Roshan.
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Usman Roshan CS 675 Machine Learning
Support vector machines
Least squares linear classifier
Support vector machines
CS639: Data Management for Data Science
Introduction to Scientific Computing II
Clustering Usman Roshan CS 675.
Presentation transcript:

Machine learning optimization Usman Roshan

Machine learning Two components: – Modeling – Optimization Modeling – Generative: we assume a probabilistic model and optimize model parameters on data with maximum likelihood – Discriminative: we select a model guided by the data

Optimization Most machine learning problems are actually NP-hard Unsupervised learning: – Cluster data into two or more sets – NP-hard – K-means local search Supervised learning – Separating plane with minimum error

Clustering Suppose we want to cluster n vectors in R d into two groups. Define C 1 and C 2 as the two groups. Our objective is to find C 1 and C 2 that minimize where m i is the mean of class C i

Clustering NP hard even for 2-means NP hard even on plane K-means heuristic –Popular and hard to beat –Introduced in 1950s and 1960s

Unsupervised learning Also NP-complete (see paper by Ben-David, Eiron, and Long) Even NP-complete to polynomially approximate (Learning with kernels, Scholkopf and Smola)

Approximate approach To overcome NP-hardness we modify the problems to achieve convexity Convexity guarantees a unique optimum Convexity also allow gives us a gradient descent solution

Gradient descent

Local search Standard approach in computer science to solve hard problems At a high level it is a simple method: – Start with a random solution – Find neighborhood – Select point in neighborhood that optimizes objective – Continue until local minima

Recent new approaches Direct optimization – Recently published work – Our own work: iterated local search Stochastic gradient descent