Charles University Charles University STAKAN III

Slides:



Advertisements
Similar presentations
“Students” t-test.
Advertisements

Multiple Regression Analysis
Regression and correlation methods
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Econ 140 Lecture 81 Classical Regression II Lecture 8.
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
The Simple Linear Regression Model: Specification and Estimation
Statistics Are Fun! Analysis of Variance
Multiple Regression Models
Chapter 11 Multiple Regression.
Experimental Evaluation
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Overview Definition Hypothesis
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
Chapter 13 Multiple Regression
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
Example x y We wish to check for a non zero correlation.
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 FINAL EXAMINATION STUDY MATERIAL III A ADDITIONAL READING MATERIAL – INTRO STATS 3 RD EDITION.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Charles University FSV UK STAKAN III Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Virtual University of Pakistan
Stats Methods at IC Lecture 3: Regression.
Charles University Charles University STAKAN III
Virtual University of Pakistan
Lecture #25 Tuesday, November 15, 2016 Textbook: 14.1 and 14.3
Lecture 11: Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
Chapter 4 Basic Estimation Techniques
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Regression Analysis AGEC 784.
Charles University Charles University STAKAN III
Statistical Data Analysis - Lecture /04/03
26134 Business Statistics Week 5 Tutorial
Correlation and Simple Linear Regression
Charles University Charles University STAKAN III
Correlation and Simple Linear Regression
Inferences About Means from Two Independent Groups
Chapter 9 Hypothesis Testing.
Hypothesis testing and Estimation
Correlation and Simple Linear Regression
Charles University Charles University STAKAN III
Charles University Charles University STAKAN III
Interval Estimation and Hypothesis Testing
Simple Linear Regression and Correlation
Chapter 7: The Normality Assumption and Inference with OLS
Product moment correlation
Charles University Charles University STAKAN III
The Multiple Regression Model
MGS 3100 Business Analysis Regression Feb 18, 2016
Statistical inference for the slope and intercept in SLR
Correlation and Simple Linear Regression
Presentation transcript:

Charles University Charles University STAKAN III Tuesday, 14.00 – 15.20 Charles University Charles University Econometrics Econometrics Jan Ámos Víšek Jan Ámos Víšek FSV UK Institute of Economic Studies Faculty of Social Sciences Institute of Economic Studies Faculty of Social Sciences STAKAN III Fifth Lecture

Schedule of today talk Recalling the last lemma of the previous lecture, we shall discuss how to apply it. How can we make an idea about the magnitude of impact of given explanatory variable on the response? What is the role of intercept in the model? Coefficient of determination - - an overall characteristic of model . Distribution of the coefficient of determination - - Fisher-Snedecor F.

i.e. is distributed as Student with degrees of freedom. Recalling the lemma proved on the previous lecture Assumptions Let be iid. r.v’s . Moreover, let and be regular. Put Assertions Then . Assumptions Put where , then is called , where This transformation , is called studentization. Assertions Then , i.e. is distributed as Student with degrees of freedom. We are going to show how to employ it.

Data , i.e. we find an estimate of model, say Time total = -3.62 + 1.27 * Weight - 0.53 * Puls - 0.51 * Strength + 3.90 * Time per ¼-mile Is WEIGHT significant for explanation of data or not ? It cannot be indicated by the magnitude of the estimate of the coefficient ! Assume that WEIGHT was given in kilograms and imagine that WEIGHT will be given in grams !! Then the above given model will change to Time total = -3.62 + 0.00127 * Weight - 0.53 * Puls - 0.51 * Strength + 3.90 * Time per ¼-mile although both models are identical.

Assume that WEIGHT is not significant, i.e. model Time total = -3.62 - 0.53 * Puls - 0.51 * Strength + 3.90 * Time per ¼-mile can be accepted as well . Under is distributed as Student . Fix some . If p-value p cannot be rejected on the significance level . In fact, it cannot be rejected on any level p . It means that the corresponding explanatory variable can be (surely not ”should be”) excluded from the model.

can be rejected on the significance level . Sometimes, one can read in textbook: “We cannot claim that with probability larger than p .” or some similar statement . Any probabilistic statement about is of course false, is constant. The only justifiable conclusion is that, if , the event of probability p ap- peared . If p-value p can be rejected on the significance level . It means that the corresponding explanatory variable should be included into the model.

What is p-value ? Student density with n-p d.f. This area is equal If blue area is at least 0.05, say, we “delete” the corresponding explanatory variable from the model. This area is equal to p-value.

On some previous slide: “It (i.e. significance of given explanatory variable) cannot be indicated by the magnitude of the estimate of the coefficient !” In other words, a size (extent) of the influence of given explanatory variable on the response one cannot be concluded from the magnitude of (the estimate of) coefficient! Sum up over and divide by . subtract it from the original model and divide by sample standard deviations

Assume regression model for the transformed data where of course , and . , Notice that, if the original model had the intercept, , i.e. the transformed model is without the intercept. So we may consider model which runs through the origin. Moreover, all variables have the population variance equal to 1. So in this model the magnitude of the estimate of regression coefficient indicates its impact on the response variable.

A comment on the role of intercept in model When data are far away from origin, a small shift of one observation may cause a large change of intercept. So that a bit atypical random fluctuation of one observation may cause that the inter- cept seems to be zero, it may be insignificant, although it is not. Intercept Intercept Moreover, not including the intercept in model, we force the regression to run through origin, with consequence explained on one of previous slides.

Conclusion We already know how to estimate coefficients of the regression model and to decide which of them are significant for the model to explain data. Next step We are going to learn how to decide whether the estimated model is “acceptable”, as the model for given data.

COEFFICIENT of DETERMINATION Let us look for an inspiration What about considering sum of squared residuals? Evidently, the “red” and “green” model have not very different sum of squares. Now, the “red” and “green” model have considerably different sum of squares.

COEFFICIENT of DETERMINATION Definition Let the regression model include the intercept. Then put where . Then the coefficient of determination is given as (1) . If the regression model does not include the intercept, put . Then the coefficient of determination is again given by (1). Let us try direct interpretation, .

Geometric interpretation of COEFFICIENT of DETERMINATION Denote Y .

Geometric interpretation of COEFFICIENT of DETERMINATION Denote . Y

COEFFICIENT of DETERMINATION Comments 1) The model includes intercept Our model is compared with the model . 2) The model does not include intercept Our model is compared with the model . Warning Excluding the intercept from model may cause a dramatic increase of coefficient of determination BUT ....... See the next slide !

COEFFICIENT of DETERMINATION Sum of “green” squares will be compared Sum of “green” squares will be compared with the sum of “red” squares. again with the sum of “red” squares. They will not be too different, hence the estimated model will not be accepted. The “green” sum will be conside- rably smaller than the “red” one, hence the model will be accepted, ALTHOUGH THE “RIGHT” MODEL IS WORSE THAN THE “LEFT” ONE !!!!

COEFFICIENT of DETERMINATION Comments What values of the coefficient of determination are acceptable? In exact and technical sciences 0.6 and more In social sciences 0.2 and more REMEMBER – Econometrics is already exact science !!!! (It is a part of mathematics, of course.) Coefficient of determination is only one of a set of indicators of “quality” of model.

What will change if we include one additional explanatory variable? Coefficient of determination What will change if we include one additional explanatory variable? Recalling , consider . Then Recalling also , we conclude that coeff of determination does not decrease with increasing number of explanatory variables. Adjusted coefficient of determination Notice that

FISHER-SNEDECOR Definition Let the regression model include the intercept. Then put where is the coefficient of determination. If the regression model does not include the intercept, put is usually called Fisher-Snedecor F .

Moreover, let and be regular. If Lemma Assumptions Let be iid. r.v’s, Moreover, let and be regular. If the regression model does not include the intercept and , Assertions then i.e. is distributed as Fisher-Snedecor with and degrees of freedom. Assumptions If the regression model includes the intercept and , Assertions then .

Proof Let . Define the matrix .

Recall that Geometric “proof” Y .

From previous slide and also Q.E.D.

Let us summarize: We already know how to estimate model. We already know how to find which explanatory variables are significant. We already know how to decide whether the model is acceptable as such. What should we do next: To learn how to find all “results” (given in the frame above) in the output from a statistical package. To learn how to verify whether the assumptions we have used, really hold to be enabled to employ whole still explained theory at all.

What is to be learnt from this lecture for exam ? Significance of given explanatory variable - it is to be included into the model. Impact of given explanatory variable on the response one. What is the role of intercept in model - can be its significance judged from t-statistic? Coefficient of determination - its distribution, role and importance for model. All what you need is on http://samba.fsv.cuni.cz/~visek/