Regularization (Additional)

Slides:



Advertisements
Similar presentations
Pattern Classification & Decision Theory. How are we doing on the pass sequence? Bayesian regression and estimation enables us to track the man in the.
Advertisements

Machine Learning and Data Mining Linear regression
Linear Regression.
Regularization David Kauchak CS 451 – Fall 2013.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Indian Statistical Institute Kolkata
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Logistic Regression Classification Machine Learning.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
x – independent variable (input)
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set:
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Classification and Prediction: Regression Analysis
Collaborative Filtering Matrix Factorization Approach
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Learning with large datasets Machine Learning Large scale machine learning.
Classification Part 3: Artificial Neural Networks
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 2B *Courtesy of Associate Professor Andrew Ng’s Notes, Stanford University.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Regularisation *Courtesy of Associate Professor Andrew.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Logistic Regression Week 3 – Soft Computing By Yosi Kristian.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
The problem of overfitting
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
Logistic Regression (Classification Algorithm)
M Machine Learning F# and Accord.net.
Chapter 2-OPTIMIZATION
Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.
WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.
Deep Feedforward Networks
Machine Learning & Deep Learning
Boosting and Additive Trees (2)
Lecture 3: Linear Regression (with One Variable)
CSE 4705 Artificial Intelligence
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Machine Learning – Regression David Fenyő
Recognition - III.
Logistic Regression Classification Machine Learning.
Logistic Regression Classification Machine Learning.
Collaborative Filtering Matrix Factorization Approach
Logistic Regression Classification Machine Learning.
Neural Networks Geoff Hulten.
Overfitting and Underfitting
Support Vector Machine I
Softmax Classifier.
Neural networks (1) Traditional multi-layer perceptrons
Machine learning overview
Artificial Intelligence 10. Neural Networks
Logistic Regression Classification Machine Learning.
Shih-Yang Su Virginia Tech
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Logistic Regression Classification Machine Learning.
Linear regression with one variable
Logistic Regression Classification Machine Learning.
Logistic Regression Geoff Hulten.
Presentation transcript:

Regularization (Additional) Soft Computing Week 4

The problem of overfitting

Example: Linear regression (housing prices) Size Size Size Overfitting: If we have too many features, the learned hypothesis may fit the training set very well ( ), but fail to generalize to new examples (predict prices on new examples).

Example: Logistic regression ( = sigmoid function)

Addressing overfitting: Options: Reduce number of features. Manually select which features to keep. Model selection algorithm (later in course). Regularization. Keep all the features, but reduce magnitude/values of parameters . Works well when we have a lot of features, each of which contributes a bit to predicting .

Regularization Cost function

Intuition Suppose we penalize and make , really small. Price Price Size of house Size of house Suppose we penalize and make , really small.

Regularization. Small values for parameters “Simpler” hypothesis Less prone to overfitting Housing: Features: Parameters:

Regularization. Price Size of house

Regularization In regularized linear regression, we choose to minimize What if is set to an extremely large value (perhaps for too large for our problem, say )? Algorithm works fine; setting to be very large can’t hurt it Algortihm fails to eliminate overfitting. Algorithm results in underfitting. (Fails to fit even training data well). Gradient descent will fail to converge.

Regularization In regularized linear regression, we choose to minimize What if is set to an extremely large value (perhaps for too large for our problem, say )? Price Size of house

Regularized linear regression

Regularized linear regression

Gradient descent Repeat

Regularized logistic regression

Regularized logistic regression. x1 x2 Cost function:

Gradient descent Repeat

Advanced optimization function [jVal, gradient] = costFunction(theta) jVal = [ ]; code to compute gradient(1) = [ ]; code to compute gradient(2) = [ ]; code to compute gradient(3) = [ ]; code to compute gradient(n+1) = [ ]; code to compute