Prof. Ramin Zabih http://cs100r.cs.cornell.edu Least squares fitting Prof. Ramin Zabih http://cs100r.cs.cornell.edu.

Slides:



Advertisements
Similar presentations
Regularization David Kauchak CS 451 – Fall 2013.
Advertisements

Least squares CS1114
Some terminology When the relation between variables are expressed in this manner, we call the relevant equation(s) mathematical models The intercept and.
Nonlinear Regression Ecole Nationale Vétérinaire de Toulouse Didier Concordet ECVPT Workshop April 2011 Can be downloaded at
Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.
Clustering and greedy algorithms Prof. Noah Snavely CS1114
Interpolation, extrapolation, etc. Prof. Ramin Zabih
The Basics and Pseudo Code
Linear Regression James H. Steiger. Regression – The General Setup You have a set of data on two variables, X and Y, represented in a scatter plot. You.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Storage allocation issues Prof. Ramin Zabih
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
Medians and blobs Prof. Ramin Zabih
Clustering Prof. Ramin Zabih
Optimization and least squares Prof. Noah Snavely CS1114
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Modeling and Equation Solving
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Implementing stacks Prof. Ramin Zabih
Quiz 2.
Copyright © Cengage Learning. All rights reserved.
The NP class. NP-completeness
Lightstick orientation; line fitting
Homework, Page 460 Prove the algebraic identity. 1.
Gradient descent David Kauchak CS 158 – Fall 2016.
Deep Feedforward Networks
Prof. Ramin Zabih Implementing queues Prof. Ramin Zabih
Introduction to Randomized Algorithms and the Probabilistic Method
Mathematical Induction II
Distribution of the Sample Means
Garbage collection; lightstick orientation
Local Search Algorithms
CHAPTER 12 More About Regression
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Simple Linear Regression
Data Mining Lecture 11.
Trigonometric Identities
Sparse and Redundant Representations and Their Applications in
Regression Models - Introduction
Preprocessing to accelerate 1-NN
cs540 - Fall 2016 (Shavlik©), Lecture 18, Week 10
The Scientific Method.
Instructor :Dr. Aamer Iqbal Bhatti
Gaussian Mixture Models And their training with the EM algorithm
Sparse and Redundant Representations and Their Applications in
Seam Carving Project 1a due at midnight tonight.
CS5112: Algorithms and Data Structures for Applications
Prof. Ramin Zabih Beyond binary search Prof. Ramin Zabih
Econometrics Chengyaun yin School of Mathematics SHUFE.
M248: Analyzing data Block D UNIT D2 Regression.
CPS 173 Computational problems, algorithms, runtime, hardness
CHAPTER 12 More About Regression
Greedy clustering methods
Principles of the Global Positioning System Lecture 11
Challenge Mat! KEY LEARNING POINT: Section 1: STANDARD QUESTIONS!
11C Line of Best Fit By Eye, 11D Linear Regression
CPSC 121: Models of Computation
CHAPTER 12 More About Regression
Year 10.
Neural Network Training
Local Search Algorithms
Bounding Boxes, Algorithm Speed
CPSC 121: Models of Computation
Quicksort and selection
Getting to the bottom, fast
Local Search Algorithms
Announcements Jack group member by Thursday!! Quiz-8? HW10
Sorting and selection Prof. Noah Snavely CS1114
Regression Models - Introduction
Presentation transcript:

Prof. Ramin Zabih http://cs100r.cs.cornell.edu Least squares fitting Prof. Ramin Zabih http://cs100r.cs.cornell.edu

Administrivia Assignment 4 out, due next Friday Quiz 6 will be Thursday Oct 25

LS fitting and residuals

How to find the LS line fit LS-hillclimbing algorithm: Start with some initial (m,b) Make a small change to (m,b) See if the figure of merit Good(m,b) improves Smaller “sum of squared errors” If so, make that your new (m,b) and continue Done small changes don’t improve (m,b)

Optimization methods This is a simple example of an optimization method There are some possible answers (i.e., lines) and a way to rate each answer (i.e., Good) We want the best answer Equivalently, minimize sum of squared errors The algorithm we just described is sometimes called “hill climbing” There are many smart variants, including “gradient descent”, that we won’t cover Need to leave a few things for the rest of CS…

Why least squares? As many of you observed, there are lots of sensible ways to compute a figure of merit Why do we always use least squares? This is too deep a topic to cover completely in a freshman course But we will point out two things that are special about least squares The full story probably needs to wait for graduate-level courses Or you can ask RDZ some evening…

The LS-hillclimbing method The method we have described for finding the best (m,b) is very general But it isn’t very smart It’s only slightly smarter than snailsort Snailsort: guess an answer; prove it correct LS hillclimbing: guess an initial answer; try to improve your guess It’s not at all clear how we could prove that we had the optimal (m,b)! Only that small changes won’t improve it

Fix m=2, change b

Fix b=-0.5, change m

Some error functions are easy You will notice that the squared error has a single minimum value The plots I just showed you demonstrate this is true in 1D But it’s true in 2D also

Consequences For error functions like this, if we get an answer where a small change doesn’t improve it, we know we are right! There are some issue with what “small” means that will need to wait for CS322 Moreover, our LS-hillclimbing method will always converge to the right answer By slowly rolling downhill It’s actually quite hard to analyze how long it will take (see CS322 and beyond)

Easy and hard error measures What makes this error function easy? It has a single minimum, and we can get to it from anywhere by rolling downhill Not all error functions have this property But, least squares does (proof: CS322)

Why is an error function hard? An error function where we can get stuck if we roll downhill is a hard one Where we get stuck depends on where we start (i.e., initial guess/conditions) An error function is hard if the area “above it” has a certain shape Nooks and crannies In other words, CONVEX! Non-convex error functions are hard to minimize

What else about LS? Least squares has an even more amazing property than convexity Suppose we have a collection of data points (xi,yi) and we want to find the (m,b) that minimizes the squared error If (m,b) were precisely right, we would have:

Closed form solution! There is a magic formula for the optimal choice of (m,b) You don’t need to search, you can simply compute the right answer Shades of snailsort versus quicksort… This is a huge part of why everyone uses least squares The other piece is too hard to explain

 Closed form LS formula The derivation requires linear algebra Most books use calculus also, but it’s not required (see the “Links” section on the course web page)