Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Linear Regression – Predicting Quality of Wine

Similar presentations

Presentation on theme: "Introduction to Linear Regression – Predicting Quality of Wine"— Presentation transcript:

1 Introduction to Linear Regression – Predicting Quality of Wine
CSCI 200 Data Mining Introduction to Linear Regression – Predicting Quality of Wine

2 Predicting Quality of Wine
Linear Regression is simple and powerful method to analyze data and make predictions Bordeaux is a region in France popular for producing wine There are differences in price and quality from year to year that are sometimes very significant Bordeaux wines are widely believed to taste better when they are older. There is an incentive to store young wines until they are mature

3 Predicting Quality of Wine
The main issue: it is hard to determine the quality of the wine when it is so young just by tasting it, since the taste will change significantly by the time it will be consumed Wine testers and experts taste the wine and then predict which ones will be the best one latest Question: can we model this process and make stronger predictions

4 Predicting Quality of Wine
On March 4, 1990, the New York Times announced that Princeton Professor of Economics Orley Ashenfelter can predict the quality of Bordeaux wine without tasting a single drop. Ashenfelter's predictions have nothing to do with assessing the aroma of the wine. They are the results of a mathematical model. Ashenfelter used a method called linear regression.

5 Linear Regression The methods predicts an outcome variable or dependent variable. It uses a set independent variables. Dependent variable: a typical price in for Bordeaux wine in an auction. This approximates quality. independent variables: age of the wine-- so the older wines are more expensive--and weather-related information

6 Linear Regression Four independent variables: The age of the wine The average growing season temperature The harvest rain The winter rain

7 Quality of Wine – Linear Regression
Professor Ashenfelter believed that his predictions are more accurate than those of the world's most influential wine critic, Robert Parker. Robert M. Parker Jr., generally regarded as the most influential wine critic in America, calls Professor Ashenfelter's research ''ludicrous and absurd.''

8 Predicting Quality of Wine - Links

9 One-Variable Linear Regression
This method uses one independent variable to predict the dependent variable Independent variable: average growing season temperature (AGST) The dependent variable, wine price. The goal of linear regression is to create a predictive line through the data. There are many different lines that could be drawn to predict wine price using average growing season temperature

10 Simple Prediction - Average
The equation for this line: y = 7.07 This linear regression model would predict 7.07 regardless of the temperature.

11 Better Prediction 0.5*Only(AGST)-1.25 This linear regression model would predict a higher price when the temperature is higher.

12 General Equation Y = A*X + B – the model
X – independent variable (in our case AGST) Y- dependent variable (in our case Price) Using this equation we will calculate PREDICTION values Model makes Errors Y=A*X+B+E Error term, E, is also often called a residual.

13 Y[i]=A*X[i]+B + E[i] For each observation, i, we have data for the dependent variable Yi and data for the independent variable, Xi. Using this equation we make a prediction. This prediction is hopefully close to the true outcome, Yi. Since the coefficients have to be the same for all data points, i, we often make a small error, E[i] The best model (choice of A and B) has the smallest error

14 SSE – Sum of Squared Errors
SSE for Average Line SSE for 0.5*AGST-1.25

15 Better Measures for Regression Quality
Root Means Squared Error (RMSE): RMSE = SQRT(SSE/N) (N – is the total number of data points) R squared – R2 R2 compares the best model to a baseline model Baseline model – is the model that does not use any variables - AVERAGE The baseline model predicts the average value of the dependent variable regardless of the value of the independent variable.

16 R2 The sum of squared errors for the baseline model is also known as the total sum of squares, commonly referred to as SST. In our Example: SST= 10.15 R2 = 1 – SSE/ SST SSE>=0, SST>=0 SSE<=SST (Y = A*X + B, if A = 0 we get Baseline Model) Linear regression model will never be worse than the baseline model. R2 = 1 – Perfect Predictive Mode R2 = 0 – No Improvement over the baseline

17 R2 R2 is unitless and universally interpretable between problems.
However, it can still be hard to compare between problems. Good models for easy problems will have an R2 close to 1. But good models for hard problems can still have an R2 close to zero.

18 Regression Model Result
The line that gives the minimum sum of squared errors is the line that regression model will find. Formula for the Linear Regression Model: Y = *AGST R2 = SSE =

Download ppt "Introduction to Linear Regression – Predicting Quality of Wine"

Similar presentations

Ads by Google