Regularization (Additional)

Slides:

Advertisements

Similar presentations

Pattern Classification & Decision Theory. How are we doing on the pass sequence? Bayesian regression and estimation enables us to track the man in the.

Advertisements

Machine Learning and Data Mining Linear regression

Linear Regression.

Regularization David Kauchak CS 451 – Fall 2013.

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Indian Statistical Institute Kolkata

Lecture 13 – Perceptrons Machine Learning March 16, 2010.

Logistic Regression Classification Machine Learning.

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

x – independent variable (input)

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set:

Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.

Classification and Prediction: Regression Analysis

Collaborative Filtering Matrix Factorization Approach

More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.

Learning with large datasets Machine Learning Large scale machine learning.

Classification Part 3: Artificial Neural Networks

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.

ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 2B *Courtesy of Associate Professor Andrew Ng’s Notes, Stanford University.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.

Model representation Linear regression with one variable

Andrew Ng Linear regression with one variable Model representation Machine Learning.

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Regularisation *Courtesy of Associate Professor Andrew.

M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.

CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Logistic Regression Week 3 – Soft Computing By Yosi Kristian.

ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.

The problem of overfitting

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.

Logistic Regression (Classification Algorithm)

M Machine Learning F# and Accord.net.

Chapter 2-OPTIMIZATION

Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.

WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.

Deep Feedforward Networks

Machine Learning & Deep Learning

Boosting and Additive Trees (2)

Lecture 3: Linear Regression (with One Variable)

CSE 4705 Artificial Intelligence

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Machine Learning – Regression David Fenyő

Recognition - III.

Logistic Regression Classification Machine Learning.

Logistic Regression Classification Machine Learning.

Collaborative Filtering Matrix Factorization Approach

Logistic Regression Classification Machine Learning.

Neural Networks Geoff Hulten.

Overfitting and Underfitting

Support Vector Machine I

Softmax Classifier.

Neural networks (1) Traditional multi-layer perceptrons

Machine learning overview

Artificial Intelligence 10. Neural Networks

Logistic Regression Classification Machine Learning.

Shih-Yang Su Virginia Tech

Multiple features Linear Regression with multiple variables

Multiple features Linear Regression with multiple variables

Logistic Regression Classification Machine Learning.

Linear regression with one variable

Logistic Regression Classification Machine Learning.

Logistic Regression Geoff Hulten.

Presentation transcript:

Regularization (Additional) Soft Computing Week 4

The problem of overfitting

Example: Linear regression (housing prices) Size Size Size Overfitting: If we have too many features, the learned hypothesis may fit the training set very well ( ), but fail to generalize to new examples (predict prices on new examples).

Example: Logistic regression ( = sigmoid function)

Addressing overfitting: Options: Reduce number of features. Manually select which features to keep. Model selection algorithm (later in course). Regularization. Keep all the features, but reduce magnitude/values of parameters . Works well when we have a lot of features, each of which contributes a bit to predicting .

Regularization Cost function

Intuition Suppose we penalize and make , really small. Price Price Size of house Size of house Suppose we penalize and make , really small.

Regularization. Small values for parameters “Simpler” hypothesis Less prone to overfitting Housing: Features: Parameters:

Regularization. Price Size of house

Regularization In regularized linear regression, we choose to minimize What if is set to an extremely large value (perhaps for too large for our problem, say )? Algorithm works fine; setting to be very large can’t hurt it Algortihm fails to eliminate overfitting. Algorithm results in underfitting. (Fails to fit even training data well). Gradient descent will fail to converge.

Regularization In regularized linear regression, we choose to minimize What if is set to an extremely large value (perhaps for too large for our problem, say )? Price Size of house

Regularized linear regression

Regularized linear regression

Gradient descent Repeat

Regularized logistic regression

Regularized logistic regression. x1 x2 Cost function:

Gradient descent Repeat

Advanced optimization function [jVal, gradient] = costFunction(theta) jVal = [ ]; code to compute gradient(1) = [ ]; code to compute gradient(2) = [ ]; code to compute gradient(3) = [ ]; code to compute gradient(n+1) = [ ]; code to compute