Least squares CS1114

Slides:



Advertisements
Similar presentations
The Maximum Likelihood Method
Advertisements

The Simple Regression Model
Regularization David Kauchak CS 451 – Fall 2013.
Statistical Techniques I EXST7005 Simple Linear Regression.
MS&E 211 Quadratic Programming Ashish Goel. A simple quadratic program Minimize (x 1 ) 2 Subject to: -x 1 + x 2 ≥ 3 -x 1 – x 2 ≥ -2.
1 Functions and Applications
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
2.2 Correlation Correlation measures the direction and strength of the linear relationship between two quantitative variables.
Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.
x – independent variable (input)
Regressions and approximation Prof. Graeme Bailey (notes modified from Noah Snavely, Spring 2009)
Nonlinear Regression Probability and Statistics Boris Gervits.
Clustering and greedy algorithms Prof. Noah Snavely CS1114
Computing transformations Prof. Noah Snavely CS1114
DERIVING LINEAR REGRESSION COEFFICIENTS
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Ch. 11: Optimization and Search Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 some slides from Stephen Marsland, some images.
Correlation and Regression
Collaborative Filtering Matrix Factorization Approach
Neural Networks Lecture 8: Two simple learning algorithms
The Game of Algebra or The Other Side of Arithmetic The Game of Algebra or The Other Side of Arithmetic © 2007 Herbert I. Gross by Herbert I. Gross & Richard.
Linear Regression.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
CORRELATION & REGRESSION
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 9 Section 1 – Slide 1 of 39 Chapter 9 Section 1 The Logic in Constructing Confidence Intervals.
Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.
1 ENM 503 Block 1 Algebraic Systems Lesson 4 – Algebraic Methods The Building Blocks - Numbers, Equations, Functions, and other interesting things. Did.
University Prep Math Patricia van Donkelaar Course Outcomes By the end of the course,
Ch4 Describing Relationships Between Variables. Pressure.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
CSC321: Neural Networks Lecture 2: Learning with linear neurons Geoffrey Hinton.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
GG 313 Geological Data Analysis Lecture 13 Solution of Simultaneous Equations October 4, 2005.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
NASSP Masters 5003F - Computational Astronomy Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least.
Simple Linear Regression In the previous lectures, we only focus on one random variable. In many applications, we often work with a pair of variables.
M Machine Learning F# and Accord.net.
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Curve Fitting Pertemuan 10 Matakuliah: S0262-Analisis Numerik Tahun: 2010.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Optimization and least squares Prof. Noah Snavely CS1114
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
The simple linear regression model and parameter estimation
Copyright © Cengage Learning. All rights reserved.
The Maximum Likelihood Method
Lightstick orientation; line fitting
1 Functions and Applications
Statistical Data Analysis - Lecture /04/03
Large Margin classifiers
The Maximum Likelihood Method
Chapter 15 Linear Regression
CS 2750: Machine Learning Linear Regression
The Maximum Likelihood Method
Collaborative Filtering Matrix Factorization Approach
Prof. Ramin Zabih Least squares fitting Prof. Ramin Zabih
Instructor :Dr. Aamer Iqbal Bhatti
Prof. Ramin Zabih Beyond binary search Prof. Ramin Zabih
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Created by Erin Hodgess, Houston, Texas
Getting to the bottom, fast
Presentation transcript:

Least squares CS1114

2 Robot speedometer  Suppose that our robot can occasionally report how far it has traveled (mileage) –How can we tell how fast it is going?  This would be a really easy problem if: –The robot never lied I.e., it’s mileage is always exactly correct –The robot travels at the same speed  Unfortunately, the real world is full of lying, accelerating robots –We’re going to figure out how to handle them

3 The ideal robot

4 The real (lying) robot

5 Speedometer approach  We are (as usual) going to solve a very general version of this problem –And explore some cool algorithms –Many of which you will need in future classes  The velocity of the robot at a given time is the change in mileage w.r.t. time –For our ideal robot, this is the slope of the line The line fits all our data exactly  In general, if we know mileage as a function of time, velocity is the derivative –The velocity at any point in time is the slope of the mileage function

6 Estimating velocity  So all we need is the mileage function  We have as input some measurements –Mileage, at certain times  A mileage function takes as input something we have no control over –Input (time): independent variable –Output (mileage): dependent variable Independent variable (time) Dependent variable (mileage)

7 Basic strategy  Based on the data, find mileage function –From this, we can compute: Velocity (1 st derivative) Acceleration (2 nd derivative)  For a while, we will only think about mileage functions which are lines  In other words, we assume lying, non- accelerating robots –Lying, accelerating robots are much harder

8 Models and parameters  A model predicts a dependent variable from an independent variable –So, a mileage function is actually a model –A model also has some internal variables that are usually called parameters  –In our line example, parameters are m,b Independent variable (time) Dependent variable (mileage) Parameters (m,b)

Linear regression  Simplest case: fitting a line 9

Linear regression  Simplest case: just 2 points 10 (x 1,y 1 ) (x 2,y 2 )

Linear regression  Simplest case: just 2 points  Want to find a line y = mx + b  x 1  y 1, x 2  y 2  This forms a linear system: y 1 = mx 1 + b y 2 = mx 2 + b  x’s, y’s are knowns  m, b are unknown  Very easy to solve 11 (x 1,y 1 ) (x 2,y 2 )

Linear regression, > 2 points  The line won’t necessarily pass through any data point 12 y = mx + b (y i, x i )

13 Some new definitions  No line is perfect – we can only find the best line out of all the imperfect ones  We’ll define an objective function Cost(m,b) that measures how far a line is from the data, then find the best line –I.e., the (m,b) that minimizes Cost(m,b)

14 Line goodness  What makes a line good versus bad? –This is actually a very subtle question

15 Residual errors  The difference between what the model predicts and what we observe is called a residual error (i.e., a left-over) –Consider the data point (x,y) –The model m,b predicts (x,mx+b) –The residual is y – (mx + b)  For 1D regressions, residuals can be easily visualized –Vertical distance to the line

16 Least squares fitting This is a reasonable cost function, but we usually use something slightly different

17 Least squares fitting We prefer to make this a squared distance Called “least squares”

18 Why least squares?  There are lots of reasonable objective functions  Why do we want to use least squares?  This is a very deep question –We will soon point out two things that are special about least squares –The full story probably needs to wait for graduate-level courses, or at least next semester

19 Gradient descent  Basic strategy: 1.Start with some guess for the minimum 2.Find the direction of steepest descent (gradient) 3.Take a step in that direction (making sure that you get lower, if not, adjust the step size) 4.Repeat until taking a step doesn’t get you much lower

Gradient descent, 1D quadratic  There is some magic in setting the step size 20

21 Some error functions are easy  A (positive) quadratic is a convex function –The set of points above the curve forms a (infinite) convex set –The previous slide shows this in 1D But it’s true in any dimension  A sum of convex functions is convex  Thus, the sum of squared error is convex  Convex functions are “nice” –They have a single global minimum –Rolling downhill from anywhere gets you there

22 Consequences  Our gradient descent method will always converge to the right answer –By slowly rolling downhill –It might take a long time, hard to predict exactly how long (see CS3220 and beyond)

23 Why is an error function hard?  An error function where we can get stuck if we roll downhill is a hard one –Where we get stuck depends on where we start (i.e., initial guess/conditions) –An error function is hard if the area “above it” has a certain shape Nooks and crannies In other words, CONVEX! –Non-convex error functions are hard to minimize

24 What else about LS?  Least squares has an even more amazing property than convexity –Consider the linear regression problem  There is a magic formula for the optimal choice of ( m, b ) –You don’t need to roll downhill, you can “simply” compute the right answer

25 Closed-form solution!  This is a huge part of why everyone uses least squares  Other functions are convex, but have no closed-form solution

26  Closed form LS formula  The derivation requires linear algebra –Most books use calculus also, but it’s not required (see the “Links” section on the course web page) –There’s a closed form for any linear least- squares problem

Linear least squares  Any formula where the residual is linear in the variables  Examples linear regression: [y – (mx + b)] 2  Non-example: [x’ – abc x] 2 (variables: a, b, c) 27

Linear least squares  Surprisingly, fitting the coefficients of a quadratic is still linear least squares  The residual is still linear in the coefficients β 1, β 2, β 3 28 Wikipedia, “Least squares fitting”

Optimization  Least squares is another example of an optimization problem  Optimization: define a cost function and a set of possible solutions, find the one with the minimum cost  Optimization is a huge field 29

Sorting as optimization  Set of allowed answers: permutations of the input sequence  Cost(permutation) = number of out-of- order pairs  Algorithm 1: Snailsort  Algorithm 2: Bubble sort  Algorithm 3: ??? 30