Presentation is loading. Please wait.

Presentation is loading. Please wait.

©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?

Similar presentations


Presentation on theme: "©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?"— Presentation transcript:

1 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?

2 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-2 Chapter Objectives After completing this chapter, you should be able to 1.Describe criterion-referenced and norm-referenced measurement and state when it is appropriate to use each. 2.Define validity and validity coefficient and describe how the validity coefficient is determined. 3. Define the three types of validity evidence for norm- referenced tests, provide examples of each type, and describe how each may be estimated.

3 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-3 Chapter Objectives 4. Describe how the criterion-referenced validity of a test can be determined through the use of behavioral objectives or testing prior to and after instruction. 5. Describe how domain-referenced validity and decision validity are used to determine criterion-referenced validity. 6. Define reliability and describe the four methods for estimating the reliability of norm-referenced tests.

4 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-4 Chapter Objectives 7. Define objectivity and describe how it may be estimated. 8. Describe the features of administrative feasibility that should be considered when selecting or constructing a test.

5 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-5 Criterion-Referenced Measurement *Used when individuals are expected to perform at a specific level of achievement. *An individual’s level of performance is not compared with the performance of others. *A minimum level of performance is referred to as criterion behavior. *Behavior objectives used to describe the expected level of performance.

6 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-6 Criterion-Referenced Measurement Examples: 1.To meet the good health standard, a twelve-year old male should have a body fat percent no greater than 25 percent. 2. For successful completion of a running fitness program, the individual must be able to run 2 miles in 14 minutes or less.

7 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-7 Criterion-Referenced Measurement *May be used to determine grades; standards must be well planned. *Limitations: If pass/fail standard is used, does not show how good or poor an individual’s level of ability is. Too often the standard for success is arbitrarily set. *Occasions when criterion-referenced measurement is appropriate; i.e., health-related physical fitness tests.

8 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-8 Norm-Referenced Measurement *Used when you wish to compare an individual’s performance on a test with performance of other individuals. *Comparison is done with the use of norms. Examples: z-scores, T-scores, percentiles *Norms may be used to establish criterion-referenced standards. *Norms usually reported by gender, weight, height, age, or grade level.

9 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-9 Norm-Referenced Measurement Must consider following factors in using norms: 1.The sample size used to determine norms (more confidence in large sample). 2. The population used to determine the norms (age and experience) 3. The date the norms were established.

10 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-10 Validity * Most important criterion to consider when evaluating a test. *Traditionally validity refers to the degree to which a test actually measures what it claims to measure. Refers more to the agreement between what the test measures and the performance, skill, or behavior the test is designed to measure. Means validity is specific to a particular use and group.

11 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-11 Validity *Evidence to support validity is reported as validity coefficient. *Coefficient is determined through correlation technique. *Closer the coefficient is to +1.00, the more the valid the test. *A test used as a substitute for another validated test should have a validity coefficient of 0.80 or higher. *Predictive tests with validity coefficients of 0.60 have been accepted.

12 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-12 Validity of Criterion-Referenced Test *Directly related to predetermined behavioral objectives. *Objectives must be stated in a clear, exact manner and be limited to small segments of instruction. *Test items should be constructed to parallel behavioral objectives. *Several test items for each objective; validity estimated by how well they measure the behavioral objective.

13 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-13 Validity of Criterion-Referenced Tests *May also determine C-R validity by testing prior to and after instruction; validity accepted if significant improvement after instruction or if the behavioral objectives are master by an acceptable number of individuals. *Success of C-R testing depends the predetermined standard of success; must be realistic, but high enough to require individuals to develop skill.

14 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-14 Validity of Criterion-Referenced Tests Domain-referenced validity evidence - technique used to validate C-R tests *The word domain used to represent the criterion behavior. *If test items represent the criterion behavior, test has logical validity (referred to as domain-referenced validity). Example: 1. Topspin tennis serve technique is analyzed. 2. Most important components of the serve form are included in the criterion behavior. 3. Successful performance is defined - form, number of successful serves out of attempted serves, and placement and speed of serves.

15 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-15 Validity of Criterion-Referenced Tests Decision validity *Used to validate C-R tests when a test’s purpose is to classify individuals as proficient or nonproficient. *Cutoff score is identify, and individuals scoring above the cutoff score are classified as proficient.

16 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-16 Validity of Norm-Referenced Tests *Three types of validity evidence reported for norm- referenced tests. Validity of test is better accepted if more than one type of strong validity evidence is reported.

17 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-17 Validity of Norm-Referenced Tests Content Validity *Related to how well a test measures all skills and subject matter that have been presented to individuals. *To have content validity, test must be related to objectives of class, presentation, etc. (that for which the group is responsible). *The longer the test, the easier it is to have content validity. *Realistic test (sample test) must represent the total content of a longer test.

18 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-18 Validity of Norm-Referenced Tests Content Validity *Skills test must have content validity also. *Ask self - Does test measure what group has been taught? *May be provided through the use of experts on the area that you are testing.

19 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-19 Validity of Norm-Referenced Tests Content validity evidence sometimes called logical or face validity. When possible, it is best to use content validity evidence with other validity evidence.

20 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-20 Validity of Norm-Referenced Tests Criterion Validity Evidence Indicated by how well test scores correlate with a specific criterion (successful performance). May be subdivided into predictive validity (future performance) and concurrent validity (current performance).

21 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-21 Validity of Norm-Referenced Tests Predictive Validity Evidence *Used to estimate future performance. *Generally, a predictor test is given and correlated with a criterion measure (variable that has been defined as indicating successful performance of a trait). *Criterion measure is obtained at a later date; may be after an instructional unit or after period of development.

22 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-22 Validity of Norm-Referenced Tests Predictive Validity Evidence *Definition of successful performance is sometimes difficult to estimate. *One method of determining successful performance is through a panel of experts; correlate experts’ ratings with performance on test. *SAT and ACT; predictor of success in college; criterion measure is success in college.

23 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-23 Validity of Norm-Referenced Tests Concurrent Validity *Immediate predictive validity; indicates how well individual currently performs a skill. *Test results correlated with a current criterion measurement. *Test and criterion measurement administered at approximately same time; procedure often used to estimate validity.

24 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-24 Validity of Norm-Referenced Tests Choice of criterion measure is important consideration in estimation of criterion validity evidence (predictive and concurrent validity). Three criterion measures used most often. 1.Expert ratings 2.Tournament play 3.Previously validity test

25 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-25 Validity of Norm-Referenced Tests Construct Validity Evidence *Refers to the degree that the individual possesses a trait (construct) presumed to be reflected in the test performance. *Anxiety, intelligence, and motivation are constructs. *Examples - cardiovascular fitness and tennis skills *Construct validity can be demonstrated by comparing higher-skilled individuals with lesser-skilled individuals.

26 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-26 Factors Affecting Validity 1.The characteristics of the individuals being tested - Test is valid only for individuals of gender, age, and experience similar to those on whom the test was validated. 2. The criterion measure (variable that has been defined as indicating successful performance of a trait) selected - Different measures correlated with the same set of scores will produce different correlation coefficients (expert ratings, tournament play, previous validated tests)

27 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-27 Factors Affecting Validity 3. Reliability - Test must be reliable to be valid. 4. Administrative procedures - Validity will be affected if unclear directions are given, or if all individuals do not perform the test the same way.

28 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-28 Reliability *Refers to consistency of a test. *Reliable test should obtain approximately the same results each time it is administered. *Individuals may not obtain the same score on the second administration of a test (fatigue, motivation, environmental conditions, and measurement error may affect scores), but the order of the scores will be approximately the same if test has reliability.

29 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-29 Reliability *To have a high degree of validity, a test must have a high degree of reliability. *Objective measures have higher reliability than subjective measures.

30 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-30 Reliability of Criterion-Referenced Tests *Defined as consistency of classification (how consistently the test classified individuals as masters or nonmasters). *Determined in much the same way as reliability of norm- referenced tests (test-retest, parallel forms, split-half, or K- R formulas). *C-R reliability applies to a single cluster of items (each cluster is intended to measure the attainment of a different objective). *Reliability coefficient estimated for each cluster.

31 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-31 Reliability of Criterion-Referenced Tests Also be estimated through the proportion of agreement coefficient. - Test administered to a group; based on the results of test scores, each individual is classified as a master or nonmaster. On another day, group is administered same test again, and again each person is classified as master or nonmaster. - Proportion of agreement determined by how many group members are classified as masters and nonmasters on both test days.

32 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-32 Methods of Estimating Reliability of Norm-Referenced Tests Test-Retest Method *Requires two administration of same test to the same group of individuals. *Calculate correlation coefficient between the two sets of scores (intraclass correlation coefficient best). *Greatest source of error in this method is caused by changes in individuals being tested. *Appropriate time interval between administration of tests sometimes difficult to determine.

33 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-33 Methods of Estimating Reliability of Norm-Referenced Tests Parallel Forms Method *Requires the administration of parallel or equivalent forms of a test to the same group and calculation of the correlation coefficient. *Of both forms of test are administered during the same test period or in two sessions separated by a short time period.

34 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-34 Methods of Estimating Reliability of Norm-Referenced Tests Parallel Forms Method *Primary problem with this method - difficult to construct two tests that are parallel in content and item characteristics. *If both tests administered within short time of each other, learning, motivation, and testing conditions do not influence correlation coefficient. *Reliability of most standardized tests estimated through this method.

35 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-35 Methods of Estimating Reliability of Norm-Referenced Tests Split-Half Method *Test is split into halves; scores of the halves are correlated. *Requires only one administration of test. *Common practice is to correlate odd-numbered items with even-numbered items. *Reliability coefficient is for a test of only half the length of original test. *Reliability usually increases as length of test increases. Spearman-Brown formula often use to estimate reliability of full test.

36 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-36 Methods of Estimating Reliability of Norm-Referenced Tests Spearman-Brown Formula Reliability of full test = 2 x reliability of half test 1 + reliability of half test Example: Reliability of two halves of a test =.70 Reliability of full test = 2 x.70 = 1.4 =.82 1 +.70 1.7 *Split-half method may produce an inflated correlation coefficient, but it is frequently used to estimate reliability coefficients of knowledge test.

37 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-37 Methods of Estimating Reliability of Norm-Referenced Tests Kuder-Richardson Formula 21 *Many ways to split a test to compute “half-test” scores for correlation purposes; for each split different correlation coefficient probably would be obtained. *K-R 21 estimates the average correlation that might be obtained if all possible split-half combinations of a group of items were correlated. *Basic assumptions: 1. Test items can be scored 1 for correct and 0 for wrong. 2. The total score is the sum of the item scores.

38 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-38 Methods of Estimating Reliability of Norm-Referenced Tests Kuder-Richardson Formula 21 r kr = n 1 - X(n - X) n = number of items n - 1 n(s 2 ) X = test mean (average number of items answered correctly s 2 = test variance Example: n = 50, X = 40, s 2 = 25 r kr = 50 1 - 40(50 - 40) = 1.02 1 - 40(10) = 1.02(1 -.40) 49 50(25) 1000 r kr = 1.02(.60) =.61

39 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-39 Factors Affecting Reliability 1.Method of scoring - The more objective the test, the higher the reliability. 2. The heterogeneity of the group - Reliability coefficients based on test scores from a group ranging in abilities will be overestimated. 3. The length of the test - The longer the test, the greater the reliability. 4. Administrative procedures - The directions must be clear, all individuals should be ready, motivated to do well, and perform the test in the same way; testing environment should be favorable to good performance.

40 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-40 Objectivity *Test has high objectivity when two or more persons can administer the same test to the same group and obtain approximately the same results. *Specific form of reliability. *Determined by test-retest (different individuals administer the test) correlational procedure. *Certain forms of measurement are more objective than others.

41 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-41 Objectivity *More likely to take place with: 1. Complete and clear instructions for administration and scoring. 2. Administration of test by trained administrators. 3. Use of simple measurement procedures. 4. Use of appropriate mechanical tools of measurement. 5. Numerical scores; phrases or terms less likely to reflect objectivity.

42 ©2013, The McGraw-Hill Companies, Inc. All Rights Reserved 5-42 Administrative Feasibility Administrative considerations may determine which test you use. 1. Cost 2. Time 3. Ease of administration 4. Scoring 5. Norms Good sports skills test will be similar to game performance.


Download ppt "©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?"

Similar presentations


Ads by Google