Prof. Ramin Zabih http://cs100r.cs.cornell.edu Least squares fitting Prof. Ramin Zabih http://cs100r.cs.cornell.edu.

Slides:

Advertisements

Similar presentations

Regularization David Kauchak CS 451 – Fall 2013.

Advertisements

Least squares CS1114

Some terminology When the relation between variables are expressed in this manner, we call the relevant equation(s) mathematical models The intercept and.

Nonlinear Regression Ecole Nationale Vétérinaire de Toulouse Didier Concordet ECVPT Workshop April 2011 Can be downloaded at

Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.

Clustering and greedy algorithms Prof. Noah Snavely CS1114

Interpolation, extrapolation, etc. Prof. Ramin Zabih

The Basics and Pseudo Code

Linear Regression James H. Steiger. Regression – The General Setup You have a set of data on two variables, X and Y, represented in a scatter plot. You.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

Storage allocation issues Prof. Ramin Zabih

SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.

Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.

Medians and blobs Prof. Ramin Zabih

Clustering Prof. Ramin Zabih

Optimization and least squares Prof. Noah Snavely CS1114

Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,

Modeling and Equation Solving

STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.

Implementing stacks Prof. Ramin Zabih

Copyright © Cengage Learning. All rights reserved.

The NP class. NP-completeness

Lightstick orientation; line fitting

Homework, Page 460 Prove the algebraic identity. 1.

Gradient descent David Kauchak CS 158 – Fall 2016.

Deep Feedforward Networks

Prof. Ramin Zabih Implementing queues Prof. Ramin Zabih

Introduction to Randomized Algorithms and the Probabilistic Method

Mathematical Induction II

Distribution of the Sample Means

Garbage collection; lightstick orientation

Local Search Algorithms

CHAPTER 12 More About Regression

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Simple Linear Regression

Data Mining Lecture 11.

Trigonometric Identities

Sparse and Redundant Representations and Their Applications in

Regression Models - Introduction

Preprocessing to accelerate 1-NN

cs540 - Fall 2016 (Shavlik©), Lecture 18, Week 10

The Scientific Method.

Instructor :Dr. Aamer Iqbal Bhatti

Gaussian Mixture Models And their training with the EM algorithm

Sparse and Redundant Representations and Their Applications in

Seam Carving Project 1a due at midnight tonight.

CS5112: Algorithms and Data Structures for Applications

Prof. Ramin Zabih Beyond binary search Prof. Ramin Zabih

Econometrics Chengyaun yin School of Mathematics SHUFE.

M248: Analyzing data Block D UNIT D2 Regression.

CPS 173 Computational problems, algorithms, runtime, hardness

CHAPTER 12 More About Regression

Greedy clustering methods

Principles of the Global Positioning System Lecture 11

Challenge Mat! KEY LEARNING POINT: Section 1: STANDARD QUESTIONS!

11C Line of Best Fit By Eye, 11D Linear Regression

CPSC 121: Models of Computation

CHAPTER 12 More About Regression

Neural Network Training

Local Search Algorithms

Bounding Boxes, Algorithm Speed

CPSC 121: Models of Computation

Quicksort and selection

Getting to the bottom, fast

Local Search Algorithms

Announcements Jack group member by Thursday!! Quiz-8? HW10

Sorting and selection Prof. Noah Snavely CS1114

Regression Models - Introduction

Presentation transcript:

Prof. Ramin Zabih http://cs100r.cs.cornell.edu Least squares fitting Prof. Ramin Zabih http://cs100r.cs.cornell.edu

Administrivia Assignment 4 out, due next Friday Quiz 6 will be Thursday Oct 25

LS fitting and residuals

How to find the LS line fit LS-hillclimbing algorithm: Start with some initial (m,b) Make a small change to (m,b) See if the figure of merit Good(m,b) improves Smaller “sum of squared errors” If so, make that your new (m,b) and continue Done small changes don’t improve (m,b)

Optimization methods This is a simple example of an optimization method There are some possible answers (i.e., lines) and a way to rate each answer (i.e., Good) We want the best answer Equivalently, minimize sum of squared errors The algorithm we just described is sometimes called “hill climbing” There are many smart variants, including “gradient descent”, that we won’t cover Need to leave a few things for the rest of CS…

Why least squares? As many of you observed, there are lots of sensible ways to compute a figure of merit Why do we always use least squares? This is too deep a topic to cover completely in a freshman course But we will point out two things that are special about least squares The full story probably needs to wait for graduate-level courses Or you can ask RDZ some evening…

The LS-hillclimbing method The method we have described for finding the best (m,b) is very general But it isn’t very smart It’s only slightly smarter than snailsort Snailsort: guess an answer; prove it correct LS hillclimbing: guess an initial answer; try to improve your guess It’s not at all clear how we could prove that we had the optimal (m,b)! Only that small changes won’t improve it

Fix m=2, change b

Fix b=-0.5, change m

Some error functions are easy You will notice that the squared error has a single minimum value The plots I just showed you demonstrate this is true in 1D But it’s true in 2D also

Consequences For error functions like this, if we get an answer where a small change doesn’t improve it, we know we are right! There are some issue with what “small” means that will need to wait for CS322 Moreover, our LS-hillclimbing method will always converge to the right answer By slowly rolling downhill It’s actually quite hard to analyze how long it will take (see CS322 and beyond)

Easy and hard error measures What makes this error function easy? It has a single minimum, and we can get to it from anywhere by rolling downhill Not all error functions have this property But, least squares does (proof: CS322)

Why is an error function hard? An error function where we can get stuck if we roll downhill is a hard one Where we get stuck depends on where we start (i.e., initial guess/conditions) An error function is hard if the area “above it” has a certain shape Nooks and crannies In other words, CONVEX! Non-convex error functions are hard to minimize

What else about LS? Least squares has an even more amazing property than convexity Suppose we have a collection of data points (xi,yi) and we want to find the (m,b) that minimizes the squared error If (m,b) were precisely right, we would have:

Closed form solution! There is a magic formula for the optimal choice of (m,b) You don’t need to search, you can simply compute the right answer Shades of snailsort versus quicksort… This is a huge part of why everyone uses least squares The other piece is too hard to explain

 Closed form LS formula The derivation requires linear algebra Most books use calculus also, but it’s not required (see the “Links” section on the course web page)