Download presentation
Presentation is loading. Please wait.
Published byBarnaby Alexander Modified over 8 years ago
1
Introduction
2
We want to see if there is any relationship between the results on exams and the amount of hours used for studies. Person ABCDEFGHIJ Hours/ Day 4567535788 Result 20252235151422303739
4
Probability Theory Random varible, X Models for X Probability distributions: Bernoulli, Binomal, Normal, Uniform….. Parameters: μ, 2, p,….
5
Inference We estimate the parameters in the population by taking a sample and compute the sample- parameters:
6
From the sample we can make conclusions about the parameters in the population of interest. Point Estimation Confidence Intervals Hypothesis Test
7
Example 8.2 Keller: One variable Y= Return of investment Model: the return is normally distributed with a mean of 10% and a standard deviation of 5%.
8
Two or more variables = return = share price = interest = inflation
9
Why regression? 1.Forecast a dependent variable (Y) with from the value of independent varaibles (x 1,x 2,…,x k ). 2.Analyze specific relations between Y and x 1,x 2,…,x k. How is Y related to x 1,x 2,…,x k ?
10
Simple linear regression Model for one independent variable X and one dependent variable Y. X Y We think X has an effect on Y.
11
Correlation X Y X and Y vary together without theories that one affects the other Example: Y = price of houses X = prices of appartments
12
Scatterplots are used to describe the relationship between two variables and for making relevant choices of models.
14
Coefficient of correlation The coefficient of correlation r is a measure of linear relationship between two variables x and y. The coefficient of correlation can take values between –1 and +1. Be aware that r is a measure of linear relationships. Even if r = 0 there can be a nonlinear relationship between x and y.
15
Scatterplot: babies from VK
16
Correlation between weight and length for newborn babies Correlations 1,765**,,000 35,765**1,000, 35 Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N LENGTH WEIGHT LENGTHWEIGHT Correlation is significant at the 0.01 level (2-tailed). **.
17
Linear models are used when we think there is a linear relationship between two variables.
18
Regression models Linear models for relationships between two variables X (ex length) and Y (ex weight). Y is called the respond variable (dependent variable) X is called the explanatory variable (independent variable). X can often be controlled by experiments.
19
House size House Cost Most lots sell for $25,000 Building a house costs about $75 per square foot. House cost = 25000 + 75(Size) The model has a deterministic and a probabilistic component
20
House cost = 25000 + 75(Size) House size House Cost Most lots sell for $25,000 However, house cost vary even among same size houses! Since cost behave unpredictably, we add a random component.
21
The Model The first order linear model y = dependent variable x = independent variable 0 = y-intercept 1 = slope of the line = error variable, normally distributed around 0.
22
Estimation of β 0 and β 1 β 0 and β 1 are unknown population- parameters, and are therefore estimated from the data. We get the estimated model
23
The estimates are determined by –drawing a sample from the population of interest, –calculating sample statistics. –producing a straight line that cuts into the data. x y
24
Least Square Method As criteria for ”the best” we use the line that minimizes the sum of (the squares of the) distances between the observations and the line. The method is called the Least Square Method. We get the Least Squares (Regression) Line
25
The Least Square estimates are found by y i = observed value of the dependent variable for the i:th observation = estimated value of the dependent variable for the i:th observation
26
The LS estimators are
27
”Shortcut formula” - p. 582
28
Example In order to analyze the relationship between advertising and sales, the manager of a fast food company recorded the advertising budget ($thousands) and the sales ($millions) during one year for a sample of the company’s restaurants (of equal size).
30
Example, continued restaurantxixi yiyi xiyixiyi xi2xi2 1276115.2 2552135,6 3720153,6 4648117,6 5336106,8 6396150,0 71056164,4 81188190,8 9372136,8
31
Example, Output in SPSS
33
Interpretation of b 0 and b 1 b 1 = 0.0681 When the advertising budget increases with one thousand dollars, the yearly sales increases with $68100. b 0 = 99.3 [ When the advertising budget is zero, the yearly sales are $99 million.] Only meaningful if x=0 is part of the sample.
34
Extrapolation Statements about predicted values ( ) of the dependent variable outside the observed interval for the independent variable (x). Often not meaningful!!!
35
Assessing the Model The least squares method will produces a regression line whether or not there are linear relationship between x and y. Consequently, it is important to assess how well the linear model fits the data. Several methods are used to assess the model. All are based on the sum of squares for errors, SSE.
36
Residuals To evaluate the regression model, we use the residuals: e i are observations of the error variable ε i.
37
SSE - Sum of squares for error
38
In the example: advertisment restaurantxixi yiyi =99.3 + 0.068 x i y i - ( y i - ) 2 1276115.2 2552135,6 3720153,6 4648117,6 5336106,8 6396150,0 71056164,4 81188190,8 9372136,8
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.