Mathematical formulation XIAO LIYING. Mathematical formulation.

Slides:

Advertisements

Similar presentations

Regularized risk minimization

Advertisements

Linear Regression.

Regularization David Kauchak CS 451 – Fall 2013.

Pattern Recognition and Machine Learning

A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz.

Pattern Recognition and Machine Learning

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Machine Learning & Data Mining CS/CNS/EE 155 Lecture 2: Review Part 2.

Lecture 13 – Perceptrons Machine Learning March 16, 2010.

Separating Hyperplanes

Coefficient Path Algorithms Karl Sjöstrand Informatics and Mathematical Modelling, DTU.

Lasso, Support Vector Machines, Generalized linear models Kenneth D. Harris 20/5/15.

Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set:

The Widrow-Hoff Algorithm (Primal Form) Repeat: Until convergence criterion satisfied return: Given a training set and learning rate Initial:  Minimize.

The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Unconstrained Optimization Problem

Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

Semi-Stochastic Gradient Descent Methods Jakub Konečný (joint work with Peter Richtárik) University of Edinburgh SIAM Annual Meeting, Chicago July 7, 2014.

Collaborative Filtering Matrix Factorization Approach

Introduction to Machine Learning for Information Retrieval Xiaolong Wang.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Perceptual and Sensory Augmented Computing Advanced Machine Learning Winter’12 Advanced Machine Learning Lecture 3 Linear Regression II Bastian.

Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct

The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.

Linear Discrimination Reading: Chapter 2 of textbook.

Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.

Classification & Regression Part II

Biointelligence Laboratory, Seoul National University

Linear Models for Classification

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

Logistic Regression William Cohen.

Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.

1 Predictive Learning from Data Electrical and Computer Engineering LECTURE SET 5 Nonlinear Optimization Strategies.

Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:

Regularized Least-Squares and Convex Optimization.

Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.

BSP: An iterated local search heuristic for the hyperplane with minimum number of misclassifications Usman Roshan.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Predictive Learning from Data

Deep Feedforward Networks

Dan Roth Department of Computer and Information Science

Empirical risk minimization

Bounding the error of misclassification

Machine Learning – Regression David Fenyő

Boosting and Additive Trees

A Simple Artificial Neuron

Regularized risk minimization

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Machine Learning – Regression David Fenyő

Probabilistic Models for Linear Regression

Collaborative Filtering Matrix Factorization Approach

CSCI B609: “Foundations of Data Science”

Linear Classifier by Dr

Logistic Regression & Parallel SGD

Large Scale Support Vector Machines

Lecture 08: Soft-margin SVM

Biointelligence Laboratory, Seoul National University

Support Vector Machine I

Mathematical Foundations of BME Reza Shadmehr

Softmax Classifier.

Support vector machines

Empirical risk minimization

Image Classification & Training of Neural Networks

Introduction to Neural Networks

CS249: Neural Language Model

Presentation transcript:

Mathematical formulation XIAO LIYING

Mathematical formulation

A common choice to find the model parameters is by minimizing the regularized training error given by L is a loss function that measure model fit. R is a regularization term (aka penalty) that penalizes model complexity is a non-negative hyperparameter

Mathematical formulation Different choices for L entail different classifiers such as Hinge: (soft-margin) Support Vector Machines. Log: Logistic Regression. Least-Squares: Ridge Regression. Epsilon-Insensitive: (soft-margin) Support Vector Regression. All of the above loss functions can be regarded as an upper bound on the misclassification error (Zero-one loss) as shown in the Figure below.

Mathematical formulation

Popular choices for the regularization term R include: L2 norm: L1 norm:, which leads to sparse solutions. Elastic Net:, a convex combination of L2 and L1, where is given by 1 - l1_ratio. The Figure below shows the contours of the different regularization terms in the parameter space when

The Figure below shows the contours of the different regularization terms in the parameter space when.

SGD Stochastic gradient descent is an optimization method for unconstrained optimization problems. In contrast to (batch) gradient descent, SGD approximates the true gradient of by considering a single training example at a time. Where is the learning rate which controls the step- size in the parameter space. The intercept b is updated similarly but without regularization.