Linear regression By gradient descent (with thanks to Prof. Ng’s machine learning course)

Extending the single variable multivariate linear regression h Θ (x) = Θ 0 + Θ 1 x h Θ (x) = Θ 0 + Θ 1 x 1 + Θ 2 x 2 + Θ 3 x 3 + … Θ n x n e.g. start with house prices versus sq ft and then move to house prices versus sq ft, number of bedrooms, age of house h Θ (x) = Θ 0 x 0 + Θ 1 x 1 + Θ 2 x 2 + Θ 3 x 3 + … Θ n x n With x 0 = 1 h Θ (x) = Θ T x

Cost function J(Θ) = (1/2m)Σ i=1,m (h Θ (x (i) ) – y (i) ) 2 Gradient descent: Repeat { Θ j = Θ j - α ∂J(Θ)/∂Θ j } for all j simultaneously Θ j = Θ j - (α /m)Σ i=1,m (h Θ (x (i) ) – y (i) ) Θ 0 = Θ 0 - (α /m)Σ i=1,m (h Θ (x (i) ) – y (i) ) x 0 (i) 1 Θ 1 = Θ 1 - (α /m)Σ i=1,m (h Θ (x (i) ) – y (i) ) x 1 (i) Θ 2 = Θ 2 - (α /m)Σ i=1,m (h Θ (x (i) ) – y (i) ) x 2 (i)

What the Equations Mean The matrices: y and x PRICE SQFTAGEFEATS 205012650137 21501266465 21501292136 19991258044 19001258044 18001277424

Feature Scaling Would like all features to fall roughly into range -1 ≤ x ≤ +1 x i replace with (x i - µ i )/s i where µ i is the mean and s i is the range; alternatively, use mean and standard deviation Don’t scale x 0

Converting results back

Learning Rate and Debugging With small enough α, J should decrease on each iteration: this is first test. An α too large could have you going past the minimum and climbing other side of curve. With α too small, convergence is too slow. Try series of α values, say.oo1,.003,. 01,.03,.1,.3, 1, …

Matlab Implementation

Feature Normalization function [X_norm, mu, sigma] = featureNormalize(X) X_norm = X; mu = zeros(1, size(X, 2)); sigma = zeros(1, size(X, 2)); mu = mean(X); sigma = std(X); m = size(X,1); A = repmat(mu,m,1); X_norm = X_norm - A; A = repmat(sigma,m,1); X_norm =X_norm./A; end

Gradient Descent function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters) m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters A = (X*theta - y); deltatheta = (alpha/m)*(A'*X); theta = theta - deltatheta'; J_history(iter) = computeCostMulti(X, y, theta); end

Cost Function function J = computeCostMulti(X, y, theta) m = length(y); % number of training examples A = (X*theta - y); J = (1/(2*m))*(A'*A); end

Polynomials h Θ (x) = Θ 0 + Θ 1 x + Θ 2 x 2 + Θ 3 x 3 Replace x with x 1, x 2 with x 2, x 3 with x 3 Scale the x, x 2, x 3 values

Normal Equations Θ = (A’ A) -1 A’y A(:,n+1) = ones(length(x),1,class(x)); for a polynomial: for j = n:-1:1 A(:,j) = x.*A(:,j+1); end W = A'*A Y = A'*y Θ = W\Y

Linear regression By gradient descent (with thanks to Prof. Ng’s machine learning course)

Similar presentations

Presentation on theme: "Linear regression By gradient descent (with thanks to Prof. Ng’s machine learning course)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Linear regression By gradient descent (with thanks to Prof. Ng’s machine learning course)

Similar presentations

Presentation on theme: "Linear regression By gradient descent (with thanks to Prof. Ng’s machine learning course)"— Presentation transcript:

Similar presentations

About project

Feedback