Linear Regression review

Slides:



Advertisements
Similar presentations
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Advertisements

Regularization David Kauchak CS 451 – Fall 2013.
MS&E 211 Quadratic Programming Ashish Goel. A simple quadratic program Minimize (x 1 ) 2 Subject to: -x 1 + x 2 ≥ 3 -x 1 – x 2 ≥ -2.
Least squares CS1114
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Reformulated - SVR as a Constrained Minimization Problem subject to n+1+2m variables and 2m constrains minimization problem Enlarge the problem size and.
Optimization Methods One-Dimensional Unconstrained Optimization
Lecture 12 Projection and Least Square Approximation Shang-Hua Teng.
Lecture 12 Least Square Approximation Shang-Hua Teng.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Optimization Methods One-Dimensional Unconstrained Optimization
Linear Discriminators Chapter 20 From Data to Knowledge.
EXAMPLE 3 Write an equation for a function
EXAMPLE 4 Choose a solution method Tell what method you would use to solve the quadratic equation. Explain your choice(s). a. 10x 2 – 7 = 0 SOLUTION a.
LINEAR ALGEBRA A toy manufacturer makes airplanes and boats: It costs $3 to make one airplane and $2 to make one boat. He has a total of $60. The many.
Linear Regression James H. Steiger. Regression – The General Setup You have a set of data on two variables, X and Y, represented in a scatter plot. You.
Quadratics Solving equations Using “Completing the Square”
Linear Models for Classification
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Exam 1 Oct 3, closed book Place ITE 119, Time:12:30-1:45pm
KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.
Mathematical Analysis of MaxEnt for Mixed Pixel Decomposition
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
Geology 5670/6670 Inverse Theory 27 Feb 2015 © A.R. Lowry 2015 Read for Wed 25 Feb: Menke Ch 9 ( ) Last time: The Sensitivity Matrix (Revisited)
Warm Up Tell whether the system has one solution, no solution, or infinitely many solutions.
3-1Forecasting Weighted Moving Average Formula w t = weight given to time period “t” occurrence (weights must add to one) The formula for the moving average.
Simple Linear Regression In many scientific investigations, one is interested to find how something is related with something else. For example the distance.
Quiz 2.
Data Mining ICCM
Linear Equations Constant Coefficients
Deep Feedforward Networks
Review of Matrix Operations
Heuristic Optimization Methods
Chapter 2 Single Layer Feedforward Networks
Solving Quadratic Equations by the Complete the Square Method
One-layer neural networks Approximation problems
A Simple Artificial Neuron
Haim Kaplan and Uri Zwick
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Factoring Quadratics.
Neural Networks A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
CS 2750: Machine Learning Linear Regression
Probabilistic Models for Linear Regression
A special case of calibration
Solving Quadratic Equations by Factoring
Linear Discriminators
Lecture 8 Generalized Linear Models &
Simple linear equation
Use back - substitution to solve the triangular system. {image}
For Big Data sets and Data Science Applications
The Quadratic Formula.
Collaborative Filtering Matrix Factorization Approach
Factoring Quadratics using X-Box method
CSCI B609: “Foundations of Data Science”
Linear Classifier by Dr
دسته بندی با استفاده از مدل های خطی
Conjugate Gradient Method
Instructor :Dr. Aamer Iqbal Bhatti
مدلسازي تجربي – تخمين پارامتر
Introduction to Scientific Computing II
Introduction to Scientific Computing II
Introduction to Scientific Computing II
Nonlinear Fitting.
Factoring Quadratics using X-Box method
Incremental Problem Solving for CS1100
Linear Discrimination
Calibration and homographies
Presentation transcript:

Linear Regression review http://youtu.be/GAmzwIkGFgE http://youtu.be/ocGEhiLwDVc http://youtu.be/qPga0OBV-O8 http://youtu.be/MwokVxy5tvg

Search and LR LR minimizes the sum of the errors squared between regression line and data points LR finds values for A and B in y = Ax + B to minimize the sum of the errors squared Are there other ways of “finding” A and B? Yes Do they guarantee minimizing sigma errors squared? Suppose the relationship is not linear?

Problem solving as search Through the lens of search, all problems look the same. There is a space of candidate solutions There is a candidate solution generator There is a way to measure “progress” so you know when you reach a “good solution” You can tell if you found a “good solution” You can compare two candidates and tell which is better Every candidate has a cost (minimize) or utility (maximize) which can guide progress

Generate and test Repeat Candidate = generate() if test(candidate) == “good solution” break

Search How is linear regression like generate and test? Linear regression has a very very good generator that generates a “good solution” in one iteration But it only works on linearly, related data Quadratic regression only works on quadratically related data

Poorly understood data Stock markets GDP Cancer Car buying Aisle stocking Recommendations Images, videos,

Poorly understood data Visualization can help when data is two or three dimensional (maybe upto 10 dimensions). This is still an art. Generate and test might be slow Consider using a simple generator for LS regression. Generator would generate all possible values of A and B within [-1024.00...+1024.00] Suppose we have 100 dimensions?

Can we use gradients? A gradient is a local slope. If we can tell which of two candidates is better can we make progress towards a solution? Think about the connect4 learner If one set of weights is better than another, can we make progress towards the “best” set of weights?

Search algorithm solutionOld = generate() solutionNew = generate() Repeat if evaluate(solutionNew) < evaluate(solutionOld) SolutionOld = solutionNew SolutionNew = modify(solutionNew) Else SolutionNew = modify(solutionOld)

Issues Time versus quality Limiting the search space Discretizing the search space Susceptibility to local optima