Download presentation
Presentation is loading. Please wait.
1
1 732G21/732A35/732G28
2
732G21 Sambandsmodeller http://www.ida.liu.se/~732G21 http://www.ida.liu.se/~732G21 One semester=Regr.analysis+ + analysis of variance (teacher: Lotta Hallberg) 732G28 Regression methods http://www.ida.liu.se/~732G28 http://www.ida.liu.se/~732G28 Half of semester=Regr. analysis 732A35 Linear statistical models http://www.ida.liu.se/~732A22 http://www.ida.liu.se/~732A22 Almost one semester=Regr. Analysis+ + analysis of variance (teacher: Lotta Hallberg) 732G21/732A35/732G282
3
Course language: English, but you may use Swedish We use It’s learning (accessed via Student portal) (show…) 9 Lectures 8 Labs (computer). Deadlines, around 5 days after lab ends 8 Lessons=I solve problems on the whiteboard + lab discussion One written final exam Course book: Kutner, M.H., Nachtsheim, C.J., Neter, J. and Li, W. Applied Linear Statistical Models with Student Data CD, 5th Edition, ISBN 0073108742. 732G21/732A35/732G283
4
Linear statistical models are widely used in ◦ Business ◦ Economics ◦ Engineering ◦ Social, biological sciences ◦ Etc Example: A database contains price of houses sold in Linköping in 2009, their age, size, other parameters. ◦ Given parameters of a new house determine its approximate market price Determine reasonable price bounds 732G21/732A35/732G284
5
Analysis of databases Observations (records, cases) in rows Variables in columns ◦ Explanatory variables (predictors, inputs) X i ◦ Response Y, we assume Y=f(X 1,…,X n ) In this lecture, models with only one explanatory variable 732G21/732A35/732G285 NoArea (X 1 )Age (X 2 )Price (Y) 1320142,530,000 221011,800,000 …………
6
Real data can seldom be presented as Y=βX (observation errors, missing inputs etc) 732G21/732A35/732G286 Example: Age and salary for a sample of eight persons from a company. AgeSalary 2117 3230 4027 5635 6144 5538 3936 3325 Scatterplot
7
Presented relation is almost linear Linear regression analysis: find a linear finction as close as possible to the data 732G21/732A35/732G287
8
For each X, there is a probability distribution P(Y=y|X=x) of Y The aim is to find a regression function E(Y|X=x) 732G21/732A35/732G288
9
Construction of regression models Selection of prediction variables (variance reduction) Functional form (from theory, approximation) Domain of the model Software MINITAB SAS SPSS Matlab Excel 732G21/732A35/732G289
10
Formal statement Y i is i th response value β 0 β 1 model parameters, regression parameters (intercept, slope) X i is i th predictor value is i.i.d. random vars with expectation zero and variance σ 2 732G21/732A35/732G2810
11
Features (show…) All Y i and Y j are uncorrelated Meaning of regression parameters β 0 response value at X=0 β 1 change in EY per unit increase in X 732G21/732A35/732G2811
12
Given data set Method of least squares: Observed response Y i Estimated response Deviation Regression fit is good when all deviations are minimized (see pict) -> minimimize sum of squares 732G21/732A35/732G2812
13
How to find minimum of Q? Estimators of β 0 and β 1 732G21/732A35/732G2813
14
Exercise (For salary data, MINITAB): 1. Make scatterplot (Scatterplot…, with, without regression lien) 2. Perform regression using ”Regression…” 3. Perform regression using ”Fitted line plot..” 4. Calculate coefficients by hand 732G21/732A35/732G2814
15
732G21/732A35/732G2815
16
Gauss-Markov theorem Estimators b 0 and b 1 are unbiased and have minimum variance among all unbiased estimators Unbiased bias=Eb 0 -β 0 =0 Eb 0 =β 0 Analogously, Eb 1 =β 1 Show illustration… 732G21/732A35/732G2816
17
Mean (expected response) Point estimator of mean response (fitted value) Residuals 732G21/732A35/732G2817
18
Plot of residuals (obtain it with MINITAB) 732G21/732A35/732G2818
19
Properties of residuals 1. (because ) 2. is minimum possible 3. (because of 1) 4., (can be shown) 5. Regression line always goes through 732G21/732A35/732G2819
20
Estimate of variance of single population (sample variance) In regression, we compute s 2 using residuals (look at residual plot) 732G21/732A35/732G2820
21
Why divided by n-2? Because E(MSE)=σ 2 Important: In general, unbiased d - degrees of freedom, number of model parameteres Example: Compute residuals, SSE, MSE, find it in MINITAB output 732G21/732A35/732G2821
22
Minitab ◦ Graph → Scatterplot ◦ Stat → Regression ◦ Stat->Fitted Line Plot 732G21/732A35/732G2822
23
Course book, Ch. 1 up to page 27. 732G21/732A35/732G2823
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.