Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics for Education Research Statistics for Education Research Lecture 9 MANOVA, Linear Regression & Multiple Regression Instructor: Dr. Tung-hsien.

Similar presentations


Presentation on theme: "Statistics for Education Research Statistics for Education Research Lecture 9 MANOVA, Linear Regression & Multiple Regression Instructor: Dr. Tung-hsien."— Presentation transcript:

1 Statistics for Education Research Statistics for Education Research Lecture 9 MANOVA, Linear Regression & Multiple Regression Instructor: Dr. Tung-hsien He the@tea.ntue.edu.tw

2 MANOVA: Multivariate ANalysis Of VAriance MANOVA: Multivariate ANalysis Of VAriance 1. Features: (a) Multivariate Analysis : MANOVA is a type of multivariate analysis because it is applied to test multiple dependent variables simultaneously. (b) GLM Model: MANOVA is also a GLM model. Thus, assumptions behind ANCOVA are also applied.

3 2. Reasons for Using MANOVA: (a) Repeating ANOVA or ANCOVA will increase the desired  level. For instance, if ANOVA is performed 4 times to analyze 4 different dependent variables, then the  will be inflated and becomes 4 *  level, increasing the chance of making Type I error. (b) Including all dependent variables into a single MANOVA model will maintain the desired  level.

4 (c) Although Bonferroni adjustment techniques can be used to correct the inflated , most researchers will employ MANOVA instead of Bonferroni adjustment techniques. 3. Always Check : a. Dependent variables should NOT be correlated to each other. b. Every dependent variable must be moderately related to the independent variables.

5 c. Box’s Test for Equality of Variance-Covariance: Nonsignificant is favored; or d. Leneve’s Test for Error Variance: Nonsignificant is favored

6 Note: (a) Box’s Test is more favored than Levene’s test for MANOVA is used (b) Samples with equal n will have the robustness of the equality of variance-covariance. Thus, there is no need reporting either Box’s test or Levene’s test.

7 4. Report: a. Wilk’s Lambda: Significant value is favored b. Only when Wilk’s Lambda is significant should F ratio be reported.

8 5. Example: (a) Scenario: A researcher is interested in studying the effects of the teaching methods on subjects’ reading and writing achievement. To this end, the researcher selects 31 subjects and randomly but not evenly assigns them into three different teaching methods (Method 1, 2, & 3). The researcher wants to know whether the three methods of teaching will exercise significant effects on subjects’ reading and writing achievement.

9 (b) Conditions: (1) One Independent Variable: Teaching Methods (2) 3 Samples: 3 Levels of the Independent Variable (3) Two Dependent Variables: Reading and Writing achievement. (c) Technique: MANOVA (d) SPSS Procedures:

10 Linear Regression Linear Regression 1. Basic Concept: Using correlation between two variables and a straight line to predict the score on the Y variable based on the score on X variable (Using X to predict Y); 2. Terms: a. Criterion Variable: Y variable b. Predictor Variable: X variable c. Linear Line: the straight line that represents how change in X variable is associated with change in Y variable

11 3. Important Feature: Regression does not indicate any changes in X cause changes in Y (i.e., causal relation), but it suggests so.

12 Regression Line Regression Line 1. A straight line; 2. In predicting Y from X, equation for this line: Ŷ = b X + a; Ŷ= Predicted Score, b = slope, regression coefficient (how tilt the line will be) a = Y intercept, regression constant (i.e., the point where the regression line intersect with Y Axis).

13 4. Regression line is obtained by using method of least squares: See figure 16.3, p. 429. 5. See Figure 16.1, p. 426: Ŷ = 0.5 X + 2.0 6. Hands-on Exercise: Table 16.1, p. 427: How should the regression be written: Ŷ = 0.654* X (B value=0.654) + 5.231 (Constant=5.231)

14 7. R, R 2 & Adjusted R 2 (R 2 and Adjusted R 2 are referred as coefficients of determination) : (a) The value of R indicates the degree of how successful a regression model may predict (i.e., the differences in predicted values and observed values) (b) R 2 and Adjusted R 2 indicate the portion of variances can be explained by the combined variance of the predictors (if two or more predictors are involved).

15 8. Differences in R 2 & Adjusted R 2 : R 2 tends to overestimate the relation between samples and populations (i.e., when R 2 is used to estimate the relation magnitude between two populations, the values tend to be enlarged. Adjusted R 2 simply adjust these overestimated relations by taking out the bias (i.e., by taking out some variances).

16 Further Concepts Further Concepts 1. Errors in Prediction: e = (Y- Ŷ); e is assumed to be normal distributions; 2. Standard error of estimate: standard deviation of e- distribution; See Formula 16.10 & 16.11, p. 434. 3. The higher correlation between Y & X, the accuracy of prediction. 4. Hypothesis Testing: H o : B = 0 H a : B  0

17 Multiple Regression (MR) Multiple Regression (MR) 1. Predictors are more than one; X variables are more than one; 2. Y variable remains one; 3. Multiple Regression Line: Equation: Ŷ = b 1 X 1 + b 2 X 2 +... + a Equation: Z Ŷ =  1 Z 1 +  2 Z 2 +... +  K Z K 3. Predictor Variable Selection: Highly correlated to Y but lowly correlated to each other.

18 Residuals [ 殘差值 ]: the actual value of the dependent variable minus the value predicted by the regression equation. Residuals [ 殘差值 ]: the actual value of the dependent variable minus the value predicted by the regression equation. Indexes of Fitness of MR Model Indexes of Fitness of MR Model 1. Nonautocorrelation ( 自我相關檢定 ): a. Measure: Durbin-Watson -> 0-4; D = 2, nonautocorrelation is retained -> A favored Result! b. It simply indicates that whether two predictors are NOT highly related to each other c. Because predictors should NOT be highly related, there should be nonautocorrelation.

19 d. Reasons: 自變數 ( 預測變數 : x) 的殘差值之間 ( 將 x 1 變數投入迴歸後,所得新 y 的值減去實際測到的 y 值,所剩下的數字 ) ,不可有相關性。當資料分 析之後出現,兩個預測變數的殘差值有自相關的 情形,則違反多重迴歸分析的基本假設。由數學 公式及統計實證中發現,兩個預測變數的殘差值 高度相關時,會造成這兩個預測變數的迴歸係數 (B 值 ) 中,有一個將變極大 ,另一個就相對變的 極小,使迴歸模式失準。

20 e. Interpretation: 當兩個自變數高度相關,代表這 兩個自變數重疊成分非常的高,應該可以是為具 有相同特徵的變數。既是同質性這麼高的變數, 在邏輯上,可以歸類至同一類型之變數。既然在 這一類同一類型的變數中,已有一個投入迴歸分 析中,就不需要再把另一個相同特質的變數再放 入迴歸中。就好像研究者不會把相同的預測變數, 重複放入同一次的迴歸分析中。因此,如果兩個 自變數其自我檢定是顯著值,邏輯上將被視為同 一變數,如此,同時將這兩個變數投入迴歸分析 中,是不合邏輯的。

21 2. Collinearity ( 線性重合檢定 ): a. Meaning: It simply indicates the extent to which predictors are highly correlated to one another. b. Because predictors should NOT be related to each other, there should not be collinearity. c. Measures: (1) VIF: Variance Inflation Factor( 變異數波動 ) : the higher, the more serous collinearity is and the less appropriate the model: (1) VIF: Variance Inflation Factor ( 變異數波動 ) : the higher, the more serous collinearity is and the less appropriate the model: Cutting Score: VIF > 10: Seriously violate collinearity

22 (2) CI: Condition Index: the higher, the more serous collinearity is and the less appropriate the model Cutting Scores: CI: around 10, minor collinearity CI: 30-100, mediate collinearity CL: >100, serious collinearity

23 d. Meaning: 理由與自我相關檢定一樣。 e. Interpretation: 試想將兩個幾乎一模一樣的自變 數放入迴歸中,是不是觀察值一有變化,其中一 個自變數應該會反應,另一個也會隨之反應,結 果就是自變數的 b 值就會大幅波動,造成迴歸模 式不穩定。

24 3. Homoscedasticity (variance homogeneity): 等分 散性 : Predictors must have homoscedasticity. a. Requirement for Pearson r. b. Two Components: (1) First Component: (a) Variance of X variable around μ x must be a constant regardless of Y values: 自變數誤差項的 變異數固定為一常數

25 (b) It slimily indicates that : Ŷ = b 1 X 1 + b 2 X 2 +... + a; there should be a constant so that no matter how big the observed value of X 1 and how small the observed value of X 2 are, the differences can be balanced out. Thus, the constant is used to “balance” the original weights of predictors. (c) This concept (equal error) is similar to homogeneity for ANOVA/ANOCA, Sphericity for Repeated Measures, and Variance-Covariance for MANOVA.

26 (2) Second Component (a) Variance of Y variable around μ y must be the same regardless of X values: (b) This simply means μ y must be normalized. It is because in any normal distribution, the unit of its standard deviation, i.e., standard error, is equal. (c) This component simply states Normality.

27 c. Example: 不論預測變數觀察值的數值有多高, 對 Y 變數背後的母群體平均數 μ y 而言,標準誤, 也就是 Sampling Distribution of Means 分佈圖的標 準差,要一樣大。不管 Y 數字為何,因為預測出 來的 Ŷ 越接近 Y , 表示 Model 越準確。當 Y-Ŷ 時,也 就是殘差值 =0 時,表示 Model 預測的準確性是最 好的。但是自變數的觀察值可能會因量表不同而 有極大的差異,所以預測 Ŷ 時,要有一個常數來 調整這些差異,使得 Y-Ŷ 所得的殘差值可以變的 越小越好。如果沒有常數來調整,自變數的觀察 值越大,自然就會得到較大的 b 值。

28 試想如果 M 自變數觀察值最小值是 10,000 , W 自 變數最大的觀察值是 10 ,如果不將其對 y 值的差 異設定為常數, M 自變數的 b 值,自然會遠遠大 於 W 自變數的 b 值,如此造成迴歸模式的荒謬。

29 c. Measure: Casewise Diagnostic; Standard Residual; Studentized Residual; Scatterplot of Regression Standardized Residual Cutting Scores: (1) Standard Residual & Studentized Residual: +/- 2 or 3 Standard Errors (2) Scatterplot of Regression Standardized Residual: Horizontal Distribution: 標準化殘差值與預測值的 交叉散佈圖 -> 若交叉散佈圖呈現水平的隨機散佈, 表示迴歸模式符合等分散性的假設 )

30 4. Normality ( 常態性檢定 ): Normal P-P Plot Regression Standardized Residuals ( 常態機率散布 圖 )

31 1. Scenario: A researcher is interested in knowing whether the logical reasoning will predict students’ creativity. Thus, the researcher selects 20 high school students and tests their reasoning and creativity. 2. Conditions: a. Predictor Variable: Reasoning Score b. Criterion Variable: Creativity 3. Proper Stat Technique: Regression Analysis 4. SPSS Procedures:

32 1. Scenario: A researcher is interested in knowing whether teachers’ ages and perceptions of environments will predict their reactions to rating. 2. Conditions: a. Predictor Variables: Ages & Perceptions b. Criterion Variable: Reaction 3. Proper Stat Technique: Multiple Regression (MR) Analysis 4. SPSS Procedures:

33 3P 3P PP: Ages may exercise considerable effects on EFL learning. IP: However, the younger, the better may not be true for EFL learners. PP: It seems more important to prolong the lengths of EFL learning at university level than to invest educational efforts at elementary school level. Literature Review: Focusing on age issues and its effects on EFL learning attainment. Literature Review: Focusing on age issues and its effects on EFL learning attainment.

34 1. Broad Research Question(?) 2. Specific Research Questions(?) 3. Studies related to the Method Method 1. Method Design: Experimental (?) 2. Subjects: 803 college freshmen (Randomly Selected?) 3. Grouping Criteria: questionnaire (demographic questionnaire) & telephone interviews.

35 4. Instruments: a. NJCEE: Reading & Writing (Validity of NJCEE for Reading?) b. Tunghai Placement Exam (Listening Test): Reliability/Item Difficulty & Discriminability (Validity?) c. Pronunciation Test: Interrater Reliability d. Questionnaires: (?) 4. Statistical Techniques: Multiple-Regressions

36 a. Criterion Variables: Reading Score, Writing Score, Listening Score, & Pronunciation Score b. Predictors: age, aptitude, EFEI.. (See p. 426) c. Age is a significant predictor of listening: P. 426. (Effects of Ages are referred?). 5. Implications: a. Five rationales for prolonging English education throughout university.. P. 428, last paragraph. (Do data show these implications?) b. Can MRs be used to pinpoint any causality?

37 c. Is this study well designed?


Download ppt "Statistics for Education Research Statistics for Education Research Lecture 9 MANOVA, Linear Regression & Multiple Regression Instructor: Dr. Tung-hsien."

Similar presentations


Ads by Google