Regression UC Berkeley Fall 2004, E77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons.

Slides:



Advertisements
Similar presentations
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Advertisements

Factoring Polynomials
Ch11 Curve Fitting Dr. Deshi Ye
Equations by Factoring
7.1 The Greatest Common Factor and Factoring by Grouping
The General Linear Model. The Simple Linear Model Linear Regression.
Solving Quadratic Equations using Factoring.  has the form: ax 2 + bx + c = 0 If necessary, we will need to rearrange into this form before we solve!
Foundations of (Theoretical) Computer Science Chapter 1 Lecture Notes (more on Section 1.4) David Martin This work is licensed under.
3D Geometry for Computer Graphics. 2 The plan today Least squares approach  General / Polynomial fitting  Linear systems of equations  Local polynomial.
Simple Linear Regression
Curve-Fitting Regression
Solving Equations by Factoring
Linear regression models in matrix terms. The regression function in matrix terms.
Solving ODEs UC Berkeley Fall 2004, E77 Copyright 2005, Andy Packard. This work is licensed under the Creative.
Chapter 10 Real Inner Products and Least-Square (cont.)
Today Wrap up of probability Vectors, Matrices. Calculus
Forms of a Quadratic Equation
3 Polynomial and Rational Functions © 2008 Pearson Addison-Wesley. All rights reserved Sections 3.1–3.4.
Numerical Integration UC Berkeley Fall 2004, E77 Copyright 2005, Andy Packard. This work is licensed under the.
5.1 Factoring – the Greatest Common Factor
Solving Linear Equations UC Berkeley Fall 2004, E77 Copyright 2005, Andy Packard. This work is licensed under the.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. A Concise Introduction to MATLAB ® William J. Palm III.
Factoring Polynomials
Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Chapter 11 Factoring Polynomials.
CHAPTER FIVE Orthogonality Why orthogonal? Least square problem Accuracy of Numerical computation.
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Scientific Computing Linear Least Squares. Interpolation vs Approximation Recall: Given a set of (x,y) data points, Interpolation is the process of finding.
Time Complexity UC Berkeley Fall 2004, E77 Copyright 2005, Andy Packard. This work is licensed under the Creative.
Section 4.1 Vectors in ℝ n. ℝ n Vectors Vector addition Scalar multiplication.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Copyright © Cengage Learning. All rights reserved. Factoring Polynomials and Solving Equations by Factoring 5.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Elementary Linear Algebra Anton & Rorres, 9th Edition
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Scientific Computing General Least Squares. Polynomial Least Squares Polynomial Least Squares: We assume that the class of functions is the class of all.
Numerical Differentiation UC Berkeley Fall 2004, E77 Copyright 2005, Andy Packard. This work is licensed under.
IEEE Arithmetic UC Berkeley Fall 2004, E77 Copyright 2005, Andy Packard. This work is licensed under the Creative.
Solving Equations by Factoring Definition of Quadratic Equations Zero-Factor Property Strategy for Solving Quadratics.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Polynomials UC Berkeley Fall 2004, E77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons.
Solving Quadratic Equations Quadratic Equations: Think of other examples?
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
Copyright © 2013 Pearson Education, Inc. Section 3.2 Linear Equations in Two Variables.
MAT 2401 Linear Algebra 2.5 Applications of Matrix Operations
Root Finding UC Berkeley Fall 2004, E77 Copyright 2005, Andy Packard. This work is licensed under the Creative.
3 Polynomial and Rational Functions © 2008 Pearson Addison-Wesley. All rights reserved Sections 3.1–3.4.
Quiz 2 Feedback & Factoring Monday, September 16 Make sense of problems and persevere in solving them.
Matrices CHAPTER 8.9 ~ Ch _2 Contents  8.9 Power of Matrices 8.9 Power of Matrices  8.10 Orthogonal Matrices 8.10 Orthogonal Matrices 
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
Section 8 Numerical Analysis CSCI E-24 José Luis Ramírez Herrán October 20, 2015.
Copyright © Cengage Learning. All rights reserved. 1 Equations, Inequalities, and Mathematical Modeling.
5.1 Factoring – the Greatest Common Factor
Solving Quadratic Equations by Factoring
Solving Equations by Factoring
Objective The student will be able to:
Copyright 2013, 2010, 2007, 2005, Pearson, Education, Inc.
Factorization by Cross-method
Sec. 1.4 Quadratic Equations.
Tonight : Quiz Factoring Solving Equations Pythagorean Theorem
10701 / Machine Learning Today: - Cross validation,
Copyright © 2011 Pearson Education, Inc.
Discrete Least Squares Approximation
Standard Form Quadratic Equation
Solving Quadratic Equations
Function Handles UC Berkeley Fall 2004, E Copyright 2005, Andy Packard
Solving equations by factoring
Regression Models - Introduction
Approximation of Functions
Approximation of Functions
Presentation transcript:

Regression UC Berkeley Fall 2004, E77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons Attribution-ShareAlike License. To view a copy of this license, visit or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.

Info Midterm next Friday (11/5) 1-2 if you actually enrolled in my section. Check BlackBoard to see Room Location If you have a conflict (as before), bring a letter/scheduleprintout, etc to class next Monday so that we can make arrangements. Review Session Wednesday 11/3 evening. Check BlackBoard HW and Lab due this Friday (10/29) Mid-Course evaluation on BlackBoard. Do it by Thursday, 11/4 at noon. We can’t see your answers, but we know if you’ve done it. Get an extra point.

Regression: Curve-fitting with minimum error Given (x,y) data pairs (x 1,y 1 ), (x 2,y 2 ), …, (x N,y N ) From a prespecified collection of “simple” functions – for example: all linear functions find one that “explains the data pairs with minimum error.” e 1 = f(x 1 ) – y 1 e 2 = f(x 2 ) – y 2 … e k = f(x k ) – y k … e N = f(x N ) – y N For a given function f, the mismatch (error) is defined

Fitting data with a linear function X Y Data points Linear function Positive e i Negative e i

Straight-line functions How does it work if the function f is to be of the form f(x) = ax + b for to-be-chosen parameters a and b? Given (x,y) data pairs (x 1,y 1 ), (x 2,y 2 ), …, (x N,y N ) For fixed values of a and b, the mismatch (error) is e 1 = ax 1 +b – y 1 e 2 = ax 2 +b – y 2 … e N = ax N +b – y N “ data ” by choosing Goal: make this small

Measuring the “amount” of mismatch Several ways to quantify the amount of mismatch. All have the property that if one component of mismatch is “big”, then the measure-of-mismatch is big. This is motivated for a few reasons: –It will lead to least squares problems, which we have already been exposed. And, it makes sense. And… –By making “reasonable” assumptions about the cause of the mismatch (independent, random, zero-mean, identically distributed, Gaussian additive errors in observing y), then it is the best measure of how likely a candidate function led to the data observed. For convenience, we pick … “ noise ” in measurement

Euclidean Norms of vectors If v is an m-by-1 column (or row) vector, the “norm of v” is defined as Symbol to denote “norm of v” Square-root of sum-of-squares of components, generalizing Pythagorean ’ s theorem The norm of a vector is a measure of its length. Some facts: ||v||=0 if and only if every component of v is zero ||v + w|| ≤ ||v|| + ||w||

Straight-line functions Given (x,y) data pairs (x 1,y 1 ), (x 2,y 2 ), …, (x N,y N ) the “ e ” vector ||e|| This says: “ By choice of a and b, minimize the Euclidean norm of the mismatch. ”

The “Least Squares” Problem If A is an n-by-m array, and b is an n-by-1 vector, let c * be the smallest possible (over all choices of m-by-1 vectors x) mismatch between Ax and b (ie., pick x to make Ax as much like b as possible). “is defined as” “the minimum, over all m-by-1 vectors x” “ the length (ie., norm) of the difference/mismatch between Ax and b. ”

Four cases for Least Squares Recall least squares formulation There are 4 scenarios c * = 0: the equation Ax=b has at least one solution –only one x vector achieves this minimum –many different x vectors achieves the minimum c * > 0: the equation Ax=b has no solutions –only one x vector achieves this minimum –many different x vectors achieves the minimum In regression, this is almost always the case

The backslash operator If A is an n-by-m array, and b is an n-by-1 vector, then >> x = A\b solves the “least squares” problem. Namely –If there is an x which solves Ax=b, then this x is computed –If there is no x which solves Ax=b, then an x which minimizes the mismatch between Ax and b is computed. In the case where many x satisfy one of the criterion above, then a smallest (in terms of vector norm) such x is computed. So, mismatch is handled first. Among all equally suitable x vectors that minimize the mismatch, choose a smallest one.

Straight-line functions Given (x,y) data pairs (x 1,y 1 ), (x 2,y 2 ), …, (x N,y N ) the “ e ” vector ||e|| This says: “ By choice of a and b, minimize the Euclidean norm of the mismatch. ”

Linear Regression Code function [a,b] = linreg(Xdata,Ydata) % Fits a linear % function Y = aX + b % to the data given % by Xdata, Ydata % Verify Xdata and Ydata are column % vectors of same length. N = length(Xdata); optpara = [Xdata ones(N,1)]\Ydata; a = optpara(1); b = optpara(2);

Quadratic functions How does it work if the function f is to be of the form f(x) = ax 2 + bx + c for to-be-chosen parameters a, b and c? For fixed values of a, b and c, the error at (x k,y k ) is e k = ax k 2 + bx k + c – y k f(xk)f(xk)

Polynomial functions How does it work if the function f is to be of the form f(x) = a 1 x n + a 2 x n-1 + … + a n x + a n+1 for to-be-chosen parameters a 1, a 2,…,a n+1 ? For fixed values of a 1, a 2,…,a n+1, the error at (x k,y k ) is f(xk)f(xk)

Polynomial Regression Psuedo-Code function p=polyreg(Xdata,Ydata,nOrd) % Fits an nOrd’th order polynomial % to the data given by Xdata, Ydata N = length(Xdata); RM = zeros(N,nOrd+1); RM(:,end) = ones(N,1); for i=1:nOrd RM(:,end-i) = RM(:,end-i+1).*Xdata; end p = RM\Ydata; p = p.’;

General “basis” functions How does it work if the function f is to be of the form for fixed functions b i (called “basis” functions), and to-be- chosen parameters a 1, a 2,…,a n. For fixed values of a 1, a 2,…,a n, the error at (x k,y k ) is