CHAPTER 5 Test Scores as Composites

CHAPTER 5 Test Scores as Composites
This Chapter is about the Quality of Items in a Test.

Test Scores as Composites
What is the Composite Test Score? A composite test score is a total test score created by summing two or more subtest scores i.e., WAIS IV Full Scale IQ consisted of 1-Verbal Comprehension Index, 2-Perceptual Reasoning Index, 3-Working Memory Index, and 4-Processing Speed Index. Doctoral Qualifying Examinations and EPPP Exams are also composite test scores.

Item Scoring Schemes [skeems] or Systems We have 2 different scoring system
1. Dichotomous Scores Dichotomous Scores are restricted to 0 and 1 such as scores on True and False, and multiple-choice question 2. Non-dichotomous Scores Non dichotomous Scores are not restricted to 0 and 1 Can have range of possible points such as in essays. 1,2, 3, 4, 5……..

Ex. Dichotomous Scheme Examples
1. The space between nerve cell endings is called the a. Dendrite b. Axon ; c. Synapse d. Neutron (In this item, responses a, b, and d are scored 0; response c is scored 1.) 2. Teachers in public school systems should have the right to strike. a. Agree b. Disagree (In this item, a response of Agree is scored 1; Disagree is scored 0) . Or, you can use True or False.

Practical Implication for Test Construction
Variance and Covariance measure the quality of items in a test. Reliability and validity measure the quality of the entire test. σ²=SS/N  used by one set of data Variance is the degree of variability of scores from mean.

Practical Implication for Test Construction
Correlation is based on a statistic called Covariance (Cov xy or S xy) COVxy=SP/N-1  used for 2 sets of data Covariance is a number that reflects the degree to which 2 variables vary together. r=sp/√ssx.ssy

Variance X σ² = ss/N Pop 1 s² = ss/n-1 or ss/df Sample 2 4 5
SS=Σx²-(Σx)²/N SS=Σ( x-μ)² Sum of Squared Deviation from Mean

Covariance Covariance is a number that reflects the degree to which 2 variables vary together. Original Data X Y

Covariance COVxy=SP/N-1 2 ways to calculate the SP SP= Σxy-(Σx.Σy/N) SP= Σ(x-μx)(y-μy) SP requires 2 sets of data SS requires only one set of data

Descriptive Statistics for Dichotomous Data

Total Score Variance As the proportion of total variance attributed to true variance increases, the degree of reliability increases,

Descriptive Statistics for Dichotomous Data Item Variance & Covariance

Descriptive Statistics for Dichotomous Data
P=Item Difficulties: P= (#of examinees who answered an item correctly / total # of examinees or P=f/N See handout The higher the P value The easier the item

Relationship between Item Difficulty P and σ²
Variance σ² (quality) 0 difficult easy P= Item Difficulty

Non-dichotomous Scores Examples
1. Write a grammatically correct German sentence using the first person singular form of the verb verstehen. (A maximum of 3 points may be awarded and partial credit may be given.) 2. An intellectually disabled person is a nonproductive member of society. 5. Strongly agree 4. Agree, 3. No opinion 2. Disagree 1. Strongly disagree (Scores can range from 1 to 5 points. with high scores indicating a positive attitude toward intellectually disabled citizens.)

Descriptive Statistics for Non-dichotomous Variables

Variance of a Composite “σ²C”
σ²=SS/N σ²a=SSa/Na σ²b=SSb/Nb σ²C= σ²a+σ²b Ex. From WAIS III-- FSIQ=VIQ+PIQ If More than 2 subtests, σ²C=σ²a+σ²b+σ²c… Calculate the variance for each subtest and add them up. Ex. next

Calculate the Composite variance for this test and the next use σ²C= σ²a+σ²b

Calculate the Composite variance for this test and the previous one, use σ²C= σ²a+σ²b

Variance of a Composite “σ²C”
More than 2 subtests Ex. WAIS IV Full Scale IQ which consist of a-Verbal Comprehension Index, b-Perceptual Reasoning Index, c-Working Memory Index, and d-Processing Speed Index. σ²C=σ²a+σ²b+σ²c+σ²d

*Suggestions to Increase the Total Score Variance of a Test
1-Increase the number of items in a test 2-Item difficulties p (medium range) 3-Items with similar content have higher correlations & higher covariance 4-Item scores & total scores variances alone are not indices (in-də-ˌcēz) of test quality (reliability and validity).

*1-Increase the Number of Items in a Test (how to calculate the test variance)
Variance for a test of 25 items is higher than a variance for a test of 20 items. σ²=(N)σ²x)+(N)N-1)(COVx)= Ex. If the COVx=items covariance = (0.10) σ²x=items variance  (0.20) N= #of items in a test -- first try N=20 σ²=test variance  For 20 items 42 , then try N=25 and σ²=test variance for 25 items 65

2-Item Difficulties Item difficulties should be almost equal for all of the items and difficulty levels should be in the medium range.

3-Items with Similar Content have Higher Correlations & Higher Covariance

4- Item Scores & Total Scores Variances Alone are not Indices of Test Quality
Variance and Covariance are important and necessary however, they are not sufficient to determine the test quality. To determine a higher level of test quality we use Reliability and Validity.

UNIT II RELIABILITY CHAP 6: RELIABILITY AND THE CLASSICAL TRUE SCORE MODEL CHAP 7: PROCEDURES FOR ESTIMATING RELIABILITY CHAP 8: INTRODUCTION TO GENERALIZABILITY THEORY CHAP 9: RELIABILITY COEFFICIENTS FOR CRITERION-REFERENCED TESTS

CHAPTER 6 Reliability and the Classical True Score Model
Reliability (p)=Reliability is a measure of consistency/dependability, or when a test measures same thing more than once and results in same outcome. i.e., bathroom scale Reliability refers to the consistency of examinees performance over repeated administrations of the same test or parallel forms of the test (Linda Crocker Text).

THE Contemporary MODEL

*TYPES OF RELIABILITY TYPE OF RELIABILITY WHT IT IS HOW DO YOU DO IT
WHAT THE RELIABILITY COEFFICIENT LOOKS LIKE Test-Retest 2 Admin A measure of stability Time Administer the same test/measure at two different times to the same group of participants r test1.test2 Ex. IQ test Parallel/alternate Interitem/Equivalent Forms A measure of equivalence Administer two different forms of the same test to the same group of participants r testA.testB Ex. Stats Test Test-Retest with Alternate Forms A measure of stability and equivalence On Monday, you administer form A to 1st half of the group and form B to the second half. On Friday, you administer form B to 1st half of the group and form A to the 2nd half Inter-Rater 1 Admin A measure of agreement Have two raters rate behaviors and then determine the amount of agreement between them Percentage of agreement Internal Consistency A measure of how consistently each item measures the same underlying construct i.e. dep. Correlate performance on each item with overall performance across participants Cronbach’s Alpha Method Kuder-Richardson Method Split Half Method Hoyts Method

Test-Retest Class IQ Scores
Students X 1st time on MonY 2nd time on Fri John Jo Mary Kathy David

Parallel/alternate Forms
Scores on 2 forms of stats tests Students Form A Form B John Jo Mary Kathy David

Test-Retest with Alternate Forms
On Monday, you administer form A to 1st half of the group and form B to the second half. On Friday, you will administer form B to 1st half of the group and form A to the 2nd half Students Form A to 1st group (Mon) Students Form B to 2nd group (Mon) David Mark Mary Jane Jo George John Mona Kathy Maria Next slide

Test-Retest with Alternate Forms
On Friday, you administer form B to 1st half of the group and form A to the second Students Form B to 1st group (Fri) Students Form A to 2nd group (FRi) David Mark Mary Jane Jo George John Mona Kathy Maria

B. Inter-Rater Reliability
It is measure of consistency from rater to rater. It is a measure of agreement between the raters.

Procedures Requiring 1 Test Administration
B. Inter-Rater Reliability Items Rater Rater 2 First do the r for rater1.rater2 then, X 100.

Procedures Requiring 1 Test Administration
B. Inter-Rater Reliability More than 2 raters: Raters 1, 2, and 3 Calculate r for 1 & 2=.6 Calculate r for 1 & 3=.7 Calculate r for 2 & 3=.8 µ=.7 x100=70%

HOW RELIABILITY IS MEASURED
Reliability is Measured by Using a Correlation Coefficient r test1•test2 or r x.y Reliability Coefficients: Indicates how scores on one test change, relative to scores on a second test Can range from 0.0 to ±1 ±1.00 = perfect reliability 0.00 = no reliability

THE CLASSICAL MODEL

A CONCEPTUAL DEFINITION OF RELIABILITY CLASSICAL MODEL
Method Error Observed Score = True Score ± Error Score Trait Error X=T±E

Classical Test Theory The Observed Score, X=T+E
X is the score you actually record or observe on a test. The True Score, T=X-E or, the difference between the Observed score and Error score is the True score T score is the reflection of the examinee true knowledge The Error Score, E =X-T or, the difference between the Observed score and True score is the Error score. E are factors that cause the True Score and observed score to differ.

A CONCEPTUAL DEFINITION OF RELIABILITY
Method Error Observed Score = True Score ± Error Score Trait Error (X) Observed Score X=T±E Score that actually observed Consists of two components True Score Error Score

Method Error Observed Score = True Score ± Error Score Trait Error True Score T=X-E Perfect reflection of true value for individual Theoretical score

Method Error Observed Score = True Score ± Error Score Trait Error Method error is due to characteristics of the test or testing situation Trait error is due to individual characteristics Conceptually, Reliability = True Score Observed Score Reliability of the observed score becomes higher if error is reduced!! True Score True Score + Error Score

A CONCEPTUAL DEFINITION OF RELIABILITY OR
Method Error Observed Score = True Score ± Error Score Trait Error Error Score E=X-T Is the Difference between Observed and True score ± X=T±E 95=90+5 or 85=90-5 The difference between T and X is 5 points or E=±5

The Classical True Score Model
X=T±E X= Represents the observed test score T= Represents the individual's True knowledge of score E= Represents the random error component

*Classical Test Theory
What Makes up the Error Score? E=X-T Error Score consisted of; 1-Method Error and 2-Trait Error 1-Method Error Method Error is the difference between True & Observed Scores resulting from the test or testing situation. 2-Trait Error Trait Error is the difference between True & Observed Scores resulting from the characteristics of examinees. See next slide

What Makes up the Error Score?

Expected Value of True Score
Definition of the True Score The True score is defined as the expected value of the examinees’ test scores (mean of observed scores) over many repeated testing with the same test.

Definition of the Error Score
Error scores for an examinee over many repeated testing should be Zero. eEj=Tj-Tj=0 eEj=Expected value of Error Tj=Examinee’ True Score Ex. next

Error Score X-E=T or, the difference between the Observed score and Error score is the True score (scores are from the same examinee) = 90 88+2 =90 80+10=90 100-10= X±E=T =90 =90 =90 =90 =0

*INCREASING THE RELIABILITY OF A TEST Meaning Decreasing Error 7 Steps
1. Increase Sample Size (n) 2. Eliminate Unclear Questions 3. Standardize Testing Conditions 4. Moderate the Degree of Difficulty of the tests (P) 5. Minimize the Effects of External Events 6. Standardize Instructions (Directions) 7. Maintain Consistent Scoring Procedures (use rubric)

*Increasing Reliability of your Items in a Test
p=T/T+E p=T/X

*Increasing Reliability Cont..

How Reliability (p) is Measured for an Item/score
P=True Score/True Score + Error Score or p=T/T+E 0=== p === ±1 Note: In this formula you always add your Error(the difference between T and X) to the True Score in the denominator (±) , Whether is positive or negative. p=T/(T +the difference between T and X) which is E) p=T/T+E

Which Item has the Highest Reliability
Which Item has the Highest Reliability? Maximum points for this question is p=T/T+E +2= 8……….. 8/10=0.80 -3=6…………. 6/9=0.666 +7=1……….…1/8=0.125 -1=9…………..9/10=0.90 +4=6………....6/10=0.60 -4=6……….....6/10=0.60 +1=7………....7/8=0.875 0=10…………10/10=1.0 -5=4…………..4/9=0.444 +6=3…………..3/9=0.333 The greater the ERROR, The LESS RELIABLE

How Classical Reliability (p) is Measured for a Test
X=T+E p=T/X…for an essay item/score Examinees X1=t1+e Ex. 10 = 7+3 X2=t2+e Ex. 8 = X3=t3+e Ex. 6 = Then calculate the σ²X=4 & σ²T=2.33

How Classical Reliability (p) is Measured for a Test
Reliability Coefficient for All Items px1x2=σ²T/σ²X Px1x2 for previous ex=2.33/4.00= 0.58 Pk=σ²T/σ²X

How Reliability Coefficient (p) is Measured for a Test
Examinees T E X T±E=X = 5 =7 =13 =14 =3 =2 =9 =10 P= σ²T/ σ²x /19.554= 0.493

Reliability Coefficient (p) for parallel test forms
Reliability Coefficient (p) =The correlation between scores on parallel test forms. Next slide

X±E=T Scores on Parallel Test Forms
Examinees X Test A Y Test B = =89 = =86 = =83 = =87 = =85 = =80 = =83 = =91 r=sp/√ssx.ssy r=0.882

*Reliability Coefficient and Reliability Index
The item-reliability index provides a measure of the test’s internal consistency.

Reliability Coefficient- px1x2=σ²T/σ²X Reliability Index pxt=σT/σX Therefore-px1x2=(pxt)² Or pxt = Just like the relationship between σ² and σ The higher the item-reliability index, The higher the internal consistency of the test.

PX1X2= σ²T/σ²X Reliability Coefficient is the correlation coefficient that expresses the degree of reliability of a test. Reliability Index PXT= σT/σX Reliability index is the correlation coefficient that expresses the degree of relationship between True (T) and Observed (X) scores of a test. It is the √ of Reliability Coefficient.

Reliability of a Composite C=a+b…..+k
Two Ways to Determine/predict the Reliability of the Composite Test Scores *1-Spearman Brown Prophecy Formula Allows us to estimate the reliability of a composite of parallel tests when the reliability of one of these tests is known. Ex. Next *2 -CRONBACH’S Alpha (α) or Coefficient (α)

*Next week Split Half Reliability Method which is the same as Spearman Brown Prophecy Formula when K=2

*1. Spearman Brown Prophecy Formula
*1. Spearman Brown Prophecy Formula

If N or K=2 then, we can call it Split half Reliability Method which is used for Measuring the Internal Consistency Reliability (see next chapter) The effect of changing test length can also be estimated by using Spearman Brown Prophecy Formula. Just like increasing the variance of a test by increasing the # of items in a test (Chapter 5)

*The Spearman-Brown Prophcy Formula is used for: a,b,c
a. Correcting for one half of the test by estimating the reliability of the whole test. b. Determining how many additional items are needed to increase reliability up to a certain level. c. Determining how many items can be eliminated without reducing reliability below a predetermined level

*2-CRONBACH’S Alpha (α) or Coefficient (α) is a preferred statistic Allows us to estimate the reliability of a composite when we know the composite score variance and/or the covariance among all its components. Next slide

*2-CRONBACH’S Alpha (α) or Coefficient (α) α=Pccʹ= (1- K= # of tests=3 σ²i= Variance of each test σ²ta, σ²tb, σ²tc σ²ta =2, σ²tb =3, σ²tc=4 σ²C= Composite score variance=12

The Standard Error of Measurement σE or σM
A standard error of measurement is a tool used to estimate or infer how far an observed score (X)deviates from a true score (T). Standard Error of Measurement is the Mean of the Standard Deviations (σ) of all errors (E) made by several examinee. Next slide E=T-X

The Standard Error of Measurement σE or σM
Standard Error of Measurement is the Mean of the Standard Deviations (σ) of all errors (E) made by several examinee. E=T-X Examinees Test Test 2 Test 3 Test 4 E=95-90= E=85-86= E=90-95= E=95-93= σ1= σ2= σ3= σ4=1.29 =5.9/4=1.47- QM=1.47

*The Standard Error of Measurement σE
1. Find the σs of these errors (E) for all of the examinees tests. 2. The mean/average for these σs is called the Standard Error of Measurement σE = σx Pxxʹ= r =reliability coefficient  or use Px1x2 for parallel tests. σx=Standard Deviation for a Set of Observed Scores(X).

*The Standard Error of Measurement σE
is a tool used to estimate or infer how far an observed score (X) deviates from a true score (T). σE = σx Pxxʹ= r = reliability coefficient=use Px1x2 for parallel tests=.91 σx=Standard Deviation for a Set of Observed Scores=  σE=3 next slide

The Standard Error of Measurement σE
This means the average difference between the True scores (T) and Observed scores (X) is 3 points for all examinees which is called the Standard Error of Measurement. The standard error of measurement is inversely related to reliability. As σE Increases the reliability decreases.

*Factors that Affect Reliability Coefficients
1. Group Homogeneity 2. Test length 3. Time limit

*Factors that Affect Reliability Coefficients
1. Group Homogeneity If a sample of examinees is highly homogeneous on the construct being measured, the reliability estimate will be lower than if the sample were more heterogeneous. 2. Test length Longer tests are more reliable than shorter tests. The effect of changing test length can be estimated by using Spearman Brown Prophecy Formula. 3. Time limit Time Limit refers to when a test has a rigid time limit. Meaning, some examinees finish but others don’t, this will artificially inflate the test reliability coefficient.

CHAPTER 5 Test Scores as Composites

Similar presentations

Presentation on theme: "CHAPTER 5 Test Scores as Composites"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CHAPTER 5 Test Scores as Composites

Similar presentations

Presentation on theme: "CHAPTER 5 Test Scores as Composites"— Presentation transcript:

Similar presentations

About project

Feedback