Unit 6a: Motivating Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 1

Slides:



Advertisements
Similar presentations
The Multiple Regression Model.
Advertisements

ADVANCED STATISTICS FOR MEDICAL STUDIES Mwarumba Mwavita, Ph.D. School of Educational Studies Research Evaluation Measurement and Statistics (REMS) Oklahoma.
Some (Simplified) Steps for Creating a Personality Questionnaire Generate an item pool Administer the items to a sample of people Assess the uni-dimensionality.
Chapter 4 – Reliability Observed Scores and True Scores Error
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Lecture 6: Multiple Regression
Data Analysis Statistics. Inferential statistics.
Education 795 Class Notes Factor Analysis II Note set 7.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Correlation and Regression Analysis
Quantitative Research
Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1
S052/Shopping Presentation – Slide #1 © Willett, Harvard University Graduate School of Education S052: Applied Data Analysis Shopping Presentation: A.
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
Relationships Among Variables
Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Unit 3b: From Fixed to Random Intercepts © Andrew Ho, Harvard Graduate School of EducationUnit 3b – Slide 1
Separate multivariate observations
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Factor Analysis Psy 524 Ainsworth.
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1
Chapter 8: Bivariate Regression and Correlation
MEASUREMENT MODELS. BASIC EQUATION x =  + e x = observed score  = true (latent) score: represents the score that would be obtained over many independent.
© Willett, Harvard University Graduate School of Education, 8/27/2015S052/I.3(c) – Slide 1 More details can be found in the “Course Objectives and Content”
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Unit 7a: Factor Analysis
Chapter 3 Data Exploration and Dimension Reduction 1.
Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Fundamental Statistics in Applied Linguistics Research Spring 2010 Weekend MA Program on Applied English Dr. Da-Fu Huang.
Unit 5b: The Logistic Regression Approach to Life Table Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 5b– Slide 1
User Study Evaluation Human-Computer Interaction.
Tests and Measurements Intersession 2006.
Unit 1c: Detecting Influential Data Points and Assessing Their Impact © Andrew Ho, Harvard Graduate School of EducationUnit 1c – Slide 1
Examining Relationships in Quantitative Research
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
S052/Shopping Presentation – Slide #1 © Willett, Harvard University Graduate School of Education S052: Applied Data Analysis What Would You Like To Know.
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
 Muhamad Jantan & T. Ramayah School of Management, Universiti Sains Malaysia Data Analysis Using SPSS.
Unit 3a: Introducing the Multilevel Regression Model © Andrew Ho, Harvard Graduate School of EducationUnit 3a – Slide 1
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
© Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 1 More details can be found in the “Course Objectives and Content”
SEM Basics 2 Byrne Chapter 2 Kline pg 7-15, 50-51, ,
PSY6010: Statistics, Psychometrics and Research Design Professor Leora Lawton Spring 2007 Wednesdays 7-10 PM Room 204.
1 Correlation and Regression Analysis Lecture 11.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
© Willett, Harvard University Graduate School of Education, 3/1/2016S052/III.1(b) – Slide 1 S052/III.1(b): Applied Data Analysis Roadmap of the Course.
Principal Component Analysis
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 10: Correlational Research 1.
Unit 2a: Dealing “Empirically” with Nonlinear Relationships © Andrew Ho, Harvard Graduate School of EducationUnit 2a – Slide 1
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Theme 6. Linear regression
CJT 765: Structural Equation Modeling
Multiple Regression.
12 Inferential Analysis.
EPSY 5245 EPSY 5245 Michael C. Rodriguez
12 Inferential Analysis.
Principal Component Analysis
MGS 3100 Business Analysis Regression Feb 18, 2016
Structural Equation Modeling
Presentation transcript:

Unit 6a: Motivating Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 1

Interitem Correlations Reliability… and multilevel modeling, revisited (AHHH!) Transition to PCA, by VVV: Visualizing Variables as Vectors © Andrew Ho, Harvard Graduate School of Education Unit 6a– Slide 2 Multiple Regression Analysis (MRA) Multiple Regression Analysis (MRA) Do your residuals meet the required assumptions? Test for residual normality Use influence statistics to detect atypical datapoints If your residuals are not independent, replace OLS by GLS regression analysis Use Individual growth modeling Specify a Multi-level Model If time is a predictor, you need discrete- time survival analysis… If your outcome is categorical, you need to use… Binomial logistic regression analysis (dichotomous outcome) Multinomial logistic regression analysis (polytomous outcome) If you have more predictors than you can deal with, Create taxonomies of fitted models and compare them. Form composites of the indicators of any common construct. Conduct a Principal Components Analysis Use Cluster Analysis Use non-linear regression analysis. Transform the outcome or predictor If your outcome vs. predictor relationship is non-linear, Use Factor Analysis: EFA or CFA? Course Roadmap: Unit 6a Today’s Topic Area

© Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 3 Here’s a dataset in which teachers’ responses to what the investigators believed were multiple indicators/ predictors of a single underlying construct of Teacher Job Satisfaction:  The data are described in TSUCCESS_info.pdf. Here’s a dataset in which teachers’ responses to what the investigators believed were multiple indicators/ predictors of a single underlying construct of Teacher Job Satisfaction:  The data are described in TSUCCESS_info.pdf. DatasetTSUCCESS.txt OverviewResponses of national sample of teachers to six questions about job satisfaction. Source Administrator and Teacher Survey of the High School and Beyond (HS&B) dataset, 1984 administration, National Center for Education Statistics (NCES). All NCES datasets are also available free from the EdPubs on-line supermarket.High School and BeyondNational Center for Education StatisticsEdPubs Sample Size5269 teachers (4955 with complete data). More Info HS&B was established to study educational, vocational, and personal development of young people beginning in their elementary or high school years and following them over time as they began to take on adult responsibilities. The HS&B survey included two cohorts: (a) the 1980 senior class, and (b) the 1980 sophomore class. Both cohorts were surveyed every two years through 1986, and the 1980 sophomore class was also surveyed again in Multiple Indicators of a Common Construct

© Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 4 ColVarVariable DescriptionLabels 1 X1 You have high standards of teacher performance. 1 = strongly disagree2 = disagree 3 = slightly disagree4 = slightly agree 5 = agree6 = strongly agree 2 X2 You are continually learning on the job. 1 = strongly disagree2 = disagree 3 = slightly disagree4 = slightly agree 5 = agree6 = strongly agree 3 X3 You are successful in educating your students. 1 = not successful2 = a little successful 3 = successful4 = very successful 4 X4 It’s a waste of time to do your best as a teacher. 1 = strongly agree2 = agree, 3 = slightly agree4 = slightly disagree, 5 = disagree6 = strongly disagree 5 X5 You look forward to working at your school. 1 = strongly disagree2 = disagree 3 = slightly disagree4 = slightly agree 5 = agree6 = strongly agree 6 X6 How much of the time are you satisfied with your job? 1 = never2 = almost never 3 = sometimes4 = always As is typical of many datasets, TSUCCESS contains:  Multiple variables – or “indicators” – that record teacher’s responses to the survey items.  These multiple indicators are intended to provide teachers with replicate opportunities to report their job satisfaction (“teacher job satisfaction” being the focal “construct” in the research). As is typical of many datasets, TSUCCESS contains:  Multiple variables – or “indicators” – that record teacher’s responses to the survey items.  These multiple indicators are intended to provide teachers with replicate opportunities to report their job satisfaction (“teacher job satisfaction” being the focal “construct” in the research). To incorporate these multiple indicators successfully into subsequent analysis – whether as outcome or predictor – you must deal with several issues: 1.You must decide whether each of the indicators should be treated as a separate variable in subsequent analyses, or whether they should be combined to form a “composite” measure of the underlying construct of teacher job satisfaction. 2.To form such a composite, you must be able to confirm that the multiple indicators actually “belong together” in a single composite. 3.If you can confirm that the multiple indicators do indeed belong together in a composite, you must decide on the “best way” to form that composite. To incorporate these multiple indicators successfully into subsequent analysis – whether as outcome or predictor – you must deal with several issues: 1.You must decide whether each of the indicators should be treated as a separate variable in subsequent analyses, or whether they should be combined to form a “composite” measure of the underlying construct of teacher job satisfaction. 2.To form such a composite, you must be able to confirm that the multiple indicators actually “belong together” in a single composite. 3.If you can confirm that the multiple indicators do indeed belong together in a composite, you must decide on the “best way” to form that composite. Always know your items. Read each one. Take the test.

© Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 5 Var Variable Description Labels X1 You have high standards of teacher performance. 1 = strongly disagree2 = disagree 3 = slightly disagree4 = slightly agree 5 = agree6 = strongly agree X2 You are continually learning on the job. 1 = strongly disagree2 = disagree 3 = slightly disagree4 = slightly agree 5 = agree6 = strongly agree X3 You are successful in educating your students. 1 = not successful2 = a little successful 3 = successful4 = very successful X4 It’s a waste of time to do your best as a teacher. 1 = strongly agree2 = agree, 3 = slightly agree4 = slightly disagree, 5 = disagree6 = strongly disagree X5 You look forward to working at your school. 1 = strongly disagree2 = disagree 3 = slightly disagree4 = slightly agree 5 = agree6 = strongly agree X6 How much of the time are you satisfied with your job? 1 = never2 = almost never 3 = sometimes4 = always Different Indicators Have Different Metrics: i.Indicators X1, X2, X4, & X5 are measured on 6-point scales. ii.Indicators X3 & X6 are measured on 4- point scales. iii.Does this matter, and how do we deal with it in the compositing process? iv.Is there a “preferred” scale length? Some Indicators “Point” In A “Positive” Direction And Some In A “Negative” Direction: i.Notice the coding direction of X4, compared to the directions of the rest of the indicators. ii.When we composite the indicators, what should we do about this? Coding Indicators On The “Same” Scale Does Not Necessarily Mean That They Have The Same “Value” At The Same Scale Points: i.Compare scale point “3” for indicators X3 and X6, for instance. ii.How do we deal with this, in compositing? Different Indicators Have Different Metrics: i.Indicators X1, X2, X4, & X5 are measured on 6-point scales. ii.Indicators X3 & X6 are measured on 4- point scales. iii.Does this matter, and how do we deal with it in the compositing process? iv.Is there a “preferred” scale length? Some Indicators “Point” In A “Positive” Direction And Some In A “Negative” Direction: i.Notice the coding direction of X4, compared to the directions of the rest of the indicators. ii.When we composite the indicators, what should we do about this? Coding Indicators On The “Same” Scale Does Not Necessarily Mean That They Have The Same “Value” At The Same Scale Points: i.Compare scale point “3” for indicators X3 and X6, for instance. ii.How do we deal with this, in compositing? Indicators are not created equally.  Different scales  Positive or negative wording/direction/“polarity”  Different variances on similar scales  Different means on similar scales (difficulty)  Different associations with the construct (discrimination) Indicators are not created equally.  Different scales  Positive or negative wording/direction/“polarity”  Different variances on similar scales  Different means on similar scales (difficulty)  Different associations with the construct (discrimination) Always know the scale of your items. Score your test.

* * Input the raw dataset, name and label the variables and selected values. * * Input the target dataset: infile X1-X6 using "C:\My Documents\ … \Datasets\TSUCCESS.txt" * Label the variables: label variable X1 "Have high standards of teaching" label variable X2 "Continually learning on job" label variable X3 "Successful in educating students" label variable X4 "Waste of time to do best as teacher" label variable X5 "Look forward to working at school" label variable X6 "Time satisfied with job" * Label the values of the variables: label define lbl1 1 "Strongly Disagree" 2 "Disagree" 3 "Slightly Disagree" /// 4 "Slightly Agree" 5 "Agree" 6 "Strongly Agree" label values X1 X2 X3 lbl1 label define lbl2 1 "Strongly Agree" 2 "Agree" 3 "Slightly Agree" /// 4 "Slightly Disagree" 5 "Disagree" 6 "Strongly Disagree" label values X4 lbl2 label define lbl3 1 "Not Successful" 2 "Somewhat Successful" /// 3 "Successful" 4 "Very Successful" label values X3 lbl3 label define lbl4 1 "Almost Never" 2 "Sometimes" /// 3 "Almost Always" 4 "Always" label values X6 lbl4 * * Input the raw dataset, name and label the variables and selected values. * * Input the target dataset: infile X1-X6 using "C:\My Documents\ … \Datasets\TSUCCESS.txt" * Label the variables: label variable X1 "Have high standards of teaching" label variable X2 "Continually learning on job" label variable X3 "Successful in educating students" label variable X4 "Waste of time to do best as teacher" label variable X5 "Look forward to working at school" label variable X6 "Time satisfied with job" * Label the values of the variables: label define lbl1 1 "Strongly Disagree" 2 "Disagree" 3 "Slightly Disagree" /// 4 "Slightly Agree" 5 "Agree" 6 "Strongly Agree" label values X1 X2 X3 lbl1 label define lbl2 1 "Strongly Agree" 2 "Agree" 3 "Slightly Agree" /// 4 "Slightly Disagree" 5 "Disagree" 6 "Strongly Disagree" label values X4 lbl2 label define lbl3 1 "Not Successful" 2 "Somewhat Successful" /// 3 "Successful" 4 "Very Successful" label values X3 lbl3 label define lbl4 1 "Almost Never" 2 "Sometimes" /// 3 "Almost Always" 4 "Always" label values X6 lbl4 © Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 6 Standard data-input and indicator-naming statements. Label items descriptively, ideally with item stems/prompts. Make absolutely sure that your item scales are oriented in the same direction: Positive should mean something similar. Look at your data…  Every row is a person. A person-by-item matrix, a standard data representation in psychometrics.  Note that we have some missing data.  Every row is a person. A person-by-item matrix, a standard data representation in psychometrics.  Note that we have some missing data.

Exploratory Data Analysis for Item Responses Are these items on the same “scale”? © Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 7

© Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 8 Missing Data and Pairwise Correlations: Pairwise Deletion vs. Casewise/Listwise Deletion Diagonals of correlation matrices are always 1 (or left blank). In this case, the n-count is the number of teachers who responded to Question 1. Complete data vs. Casewise/Listwise deletion ( keep if NMISSING==0 ; drop if NMISSING>0 ). Note the differing n- counts across variables in complete data but not in casewise/listwise deleted data. We’ll proceed with listwise deletion here, but keep in mind the assumption that data are missing at random from the population. If missing data are few, no worries. Otherwise, explicitly state your assumptions and your approach, and consider advanced techniques like “multiple imputation.”

© Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 9 Pairwise Correlations and the Argument for a “Construct” IndicatorX1X2X3X4X5X6 X1:Have high standards of teaching X2:Continually learning on the job 0.55 (5058) X3:Successful in educating students 0.16 (5069) 0.16 (5082) X4:Waste of time to do best as teacher 0.21 (5071) 0.23 (5079) 0.30 (5094) X5:Look forward to working at school 0.25 (5069) 0.27 (5070) 0.36 (5088) 0.45 (5091) 0.55 X6:Time satisfied with job 0.19 (5060) 0.22 (5069) 0.44 (5094) 0.40 (5082) 0.55 (5081) Bivariate correlations estimated under pairwise deletion Bivariate correlations estimated under listwise deletion (n=4955) single compositesame construct To justify forming a single composite, you must argue that all indicators measure the same construct:  Here, generally positive inter-correlations support a “uni-dimensional” view.  But, the small & heterogeneous values of indicator inter-correlations also suggest:  Either there is considerable measurement error in each indicator,  Or that some, or all, of indicators may also measure other unrelated constructs.  This is bad news for the “internal consistency” (reliability) of the ultimate composite. single compositesame construct To justify forming a single composite, you must argue that all indicators measure the same construct:  Here, generally positive inter-correlations support a “uni-dimensional” view.  But, the small & heterogeneous values of indicator inter-correlations also suggest:  Either there is considerable measurement error in each indicator,  Or that some, or all, of indicators may also measure other unrelated constructs.  This is bad news for the “internal consistency” (reliability) of the ultimate composite. Sample inter-correlations among the indicators: positive  Are all positive (thankfully!), small moderatemagnitude differ widely  Are of small to moderate magnitude but differ widely (unfortunately!). Sample inter-correlations among the indicators: positive  Are all positive (thankfully!), small moderatemagnitude differ widely  Are of small to moderate magnitude but differ widely (unfortunately!).

© Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 10 Three ways of looking at “reliability” Three Definitions of Reliability 1.Reliability is the correlation between two sets of observed scores from a replication of a measurement procedure. 2.Reliability is the proportion of “observed score variance” that is accounted for by “true score variance.” 3.Reliability is like an average of pairwise interitem correlations, “scaled up” according to the number of items on the test (because averaging over more items decreases error variance). Three Definitions of Reliability 1.Reliability is the correlation between two sets of observed scores from a replication of a measurement procedure. 2.Reliability is the proportion of “observed score variance” that is accounted for by “true score variance.” 3.Reliability is like an average of pairwise interitem correlations, “scaled up” according to the number of items on the test (because averaging over more items decreases error variance). Three Necessary Intuitions 1.Any observed score is one of many possible replications. 2.Any observed score is the sum of a “true score” (average of all theoretical replications) and an error term. 3.Averaging over replications gives us better estimates of “true” scores by averaging over error terms. Three Necessary Intuitions 1.Any observed score is one of many possible replications. 2.Any observed score is the sum of a “true score” (average of all theoretical replications) and an error term. 3.Averaging over replications gives us better estimates of “true” scores by averaging over error terms.

© Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 11 1) Correlation between two replications of a measurement procedure Robert Brennan, reliability guru, likes to use this aphorism: A person with one watch knows what time it is. A person with two watches is never quite sure.

2) Proportion of Observed Score Variance accounted for by True Score Variance observed variancetrue variance The Reliability of a measure (or composite) is a population parameter that describes how much of the observed variance in the measure (or composite) is actually true variance: T T E =

© Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 13 Interlude: To Standardize or not to Standardize For an additive composite of “standardized” indicators: First, each indicator is standardized to a mean of 0 and a standard deviation of 1: Then, the standardized indicator scores are summed For an additive composite of “standardized” indicators: First, each indicator is standardized to a mean of 0 and a standard deviation of 1: Then, the standardized indicator scores are summed For an additive composite of “raw” indicators: Each indicator remains in its original metric. Composite scores are the sum of the scores on the raw indicators, for each person in the sample: where X 1i is the raw score of the i th teacher on the 1 st indicator, and so on … For an additive composite of “raw” indicators: Each indicator remains in its original metric. Composite scores are the sum of the scores on the raw indicators, for each person in the sample: where X 1i is the raw score of the i th teacher on the 1 st indicator, and so on …

© Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 14 3) Average Interitem Covariance, Scaled Up alpha : Our straightforward command to obtain Cronbach’s alpha, an “internal consistency” estimate of population reliability.  Running this on standardized variables, STD1-STD6 (or, running this on unstandardized variables and using the std option) gives us “standardized coefficient alpha” alpha : Our straightforward command to obtain Cronbach’s alpha, an “internal consistency” estimate of population reliability.  Running this on standardized variables, STD1-STD6 (or, running this on unstandardized variables and using the std option) gives us “standardized coefficient alpha”  Recall that covariance is an “unstandardized” correlation.  A covariance on standardized variables is thus a correlation.  This is the straight average of our interitem correlations from Slide 9.  Recall that covariance is an “unstandardized” correlation.  A covariance on standardized variables is thus a correlation.  This is the straight average of our interitem correlations from Slide 9. The long-run average of errors is zero. Correlation between averages will rise. Proportion of observed score variance will rise as error variance drops.

© Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 15 Providing each indicator in a composite is measuring the same underlying construct:  The more indicators you include in the composite, the higher the reliability of the composite.  Because:  Measurement errors in each indicator are random, and cancel out in the composite.  Any true variation in each indicator combines and surfaces through the noise. Providing each indicator in a composite is measuring the same underlying construct:  The more indicators you include in the composite, the higher the reliability of the composite.  Because:  Measurement errors in each indicator are random, and cancel out in the composite.  Any true variation in each indicator combines and surfaces through the noise. The Spearman-Brown “Prophecy” Formula

© Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 16 Three ways of looking at “reliability” Three Definitions of Reliability 1.Reliability is the correlation between two sets of observed scores from a replication of a measurement procedure. 2.Reliability is the proportion of “observed score variance” that is accounted for by “true score variance.” 3.Reliability is like an average of pairwise interitem correlations, “scaled up” according to the number of items on the test (because averaging over more items decreases error variance). Three Definitions of Reliability 1.Reliability is the correlation between two sets of observed scores from a replication of a measurement procedure. 2.Reliability is the proportion of “observed score variance” that is accounted for by “true score variance.” 3.Reliability is like an average of pairwise interitem correlations, “scaled up” according to the number of items on the test (because averaging over more items decreases error variance). Three Necessary Intuitions 1.Any observed score is one of many possible replications. 2.Any observed score is the sum of a “true score” (average of all theoretical replications) and an error term. 3.Averaging over replications gives us better estimates of “true” scores by averaging over error terms. Three Necessary Intuitions 1.Any observed score is one of many possible replications. 2.Any observed score is the sum of a “true score” (average of all theoretical replications) and an error term. 3.Averaging over replications gives us better estimates of “true” scores by averaging over error terms.

© Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 17 A Baseline Reliability Analysis  We use unstandardized items but include the std option, to standardize.  casewise deletion leads to 4955 observations across all items.  Positive signage because we already reversed the polarity of X4.  We use unstandardized items but include the std option, to standardize.  casewise deletion leads to 4955 observations across all items.  Positive signage because we already reversed the polarity of X4.

© Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 18 Reliability from a Multilevel Modeling Perspective: Reshaping Data The Data in Wide Format: Every participant is a row. Every item is a column. The Data in Wide Format: Every participant is a row. Every item is a column. The Data in Long Format: Every item score is a row. A single column for all score replications.. The Data in Long Format: Every item score is a row. A single column for all score replications.. Think it might be possible to consider teachers as grouping variables for item scores? xtset ID ?

© Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 19 Reliability from a Multilevel Modeling Perspective: Intraclass Correlation