1 EPSY 546: LECTURE 1 INTRODUCTION TO MEASUREMENT THEORY George Karabatsos.

Slides:



Advertisements
Similar presentations
Richard M. Jacobs, OSA, Ph.D.
Advertisements

Lesson 10: Linear Regression and Correlation
StatisticalDesign&ModelsValidation. Introduction.
Independent and Dependent Variables
Copyright © Allyn & Bacon (2007) Statistical Analysis of Data Graziano and Raulin Research Methods: Chapter 5 This multimedia product and its contents.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
A quick introduction to the analysis of questionnaire data John Richardson.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Chapter 14 Inferential Data Analysis
Introduction to Regression Analysis, Chapter 13,
Relationships Among Variables
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Understanding Research Results
PRED 354 TEACH. PROBILITY & STATIS. FOR PRIMARY MATH Lesson 14 Correlation & Regression.
Chapter 13: Inference in Regression
Hypothesis Testing Charity I. Mulig. Variable A variable is any property or quantity that can take on different values. Variables may take on discrete.
Hypothesis Testing for Ordinal & Categorical Data EPSY 5245 Michael C. Rodriguez.
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
Chapter 15 Correlation and Regression
Foundations of Educational Measurement
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Chapter Eight The Concept of Measurement and Attitude Scales
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
Chapter 1: Introduction to Statistics. 2 Statistics A set of methods and rules for organizing, summarizing, and interpreting information.
UNDERSTANDING RESEARCH RESULTS: DESCRIPTION AND CORRELATION © 2012 The McGraw-Hill Companies, Inc.
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.
Examining Relationships in Quantitative Research
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Variables and their Operational Definitions
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
CHI SQUARE TESTS.
Academic Research Academic Research Dr Kishor Bhanushali M
1 EPSY 546: LECTURE 1 SUMMARY George Karabatsos. 2 REVIEW.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
Chapter Eight: Using Statistics to Answer Questions.
Chapter 6: Analyzing and Interpreting Quantitative Data
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Sampling Design & Measurement Scaling
Intro to Psychology Statistics Supplement. Descriptive Statistics: used to describe different aspects of numerical data; used only to describe the sample.
Chapter 6 - Standardized Measurement and Assessment
Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
Lesson 3 Measurement and Scaling. Case: “What is performance?” brandesign.co.za.
PSY 325 AID Education Expert/psy325aid.com FOR MORE CLASSES VISIT
Statistics & Evidence-Based Practice
Theme 5. Association 1. Introduction. 2. Bivariate tables and graphs.
Chapter 12 Understanding Research Results: Description and Correlation
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Statistical tests for quantitative variables
Evaluation of measuring tools: validity
CHAPTER 5 MEASUREMENT CONCEPTS © 2007 The McGraw-Hill Companies, Inc.
Chapter 15: Correlation.
Introduction to Statistics
Basic Statistical Terms
LEARNING OUTCOMES After studying this chapter, you should be able to
UNIT IV ITEM ANALYSIS IN TEST DEVELOPMENT
Unit XI: Data Analysis in nursing research
Statistics II: An Overview of Statistics
15.1 The Role of Statistics in the Research Process
Chapter Nine: Using Statistics to Answer Questions
Chapter 18: The Chi-Square Statistic
Presentation transcript:

1 EPSY 546: LECTURE 1 INTRODUCTION TO MEASUREMENT THEORY George Karabatsos

2 What is test theory?

3 WHAT IS A TEST? Test: A procedure for obtaining a sample of person behavior from a specified domain of items.

4 WHAT IS A TEST? Test: A procedure for obtaining a sample of person behavior from a specified domain of items. General: Exam, questionnaire, survey, judge-observed task, etc.

5 ITEM RESPONSE SCORING Test item responses are “scored”. Some Examples: Dichotomous : 1 = Correct, 0 = Incorrect (Scored from possibly a multiple choice test item)

6 ITEM RESPONSE SCORING Test item responses are “scored”. Some Examples: “Rating Scale”: 1 = Strongly Disagree 2 = Disagree 3 = Agree 4 = Strongly Agree

7 ITEM RESPONSE SCORING Test item responses are “scored”. Some Examples: Partial Credit: 1 = Completely incorrect 2 = Partially correct 3 = Completely correct

8 WHAT TESTS DO Tests are designed to measure latent traits that manifest in the responses to the test items.

9 LATENT VARIABLES Some substantive examples of latent traits: –Exam: Ability on long division. –Attitude Questionnaire: Agreement towards capital punishment. –Survey: Frequency of drug use. –Survey: Quality of life.

10 LATENT VARIABLES Latent trait = latent variable = psychological trait/variable/attribute = unidimensional variable = construct

11 LATENT VARIABLES For measurement, latent variables are often numerically represented either: –by total test score (person or item), –or by parameters of “person ability” or “item difficulty”.

12 Some Challenges of latent trait measurement (5) 1. No single approach to the measurement of a latent trait is universally accepted.

13 Some Challenges of latent trait measurement (5) 1. No single approach to the measurement of a latent trait is universally accepted. ** Two theorists may possibly select different items to measure a particular latent trait (e.g., math ability).

14 Some Challenges of latent trait measurement (5) 2. Psychological measurements are usually based on limited samples of behavior.

15 Some Challenges of latent trait measurement (5) 2. Psychological measurements are usually based on limited samples of behavior. ** Practically impossible to confront respondents with all possible items that represent the latent trait (e.g., all long division items)

16 Some Challenges of latent trait measurement (5) 2. Psychological measurements are usually based on limited samples of behavior. ** N = 1, for each person on an item.

17 Some Challenges of latent trait measurement (5) 3. Latent trait measurement obtained is always subject to error.

18 Some Challenges of latent trait measurement (5) 3. Latent trait measurement obtained is always subject to error. Random: sampling error of respondents, and of items; inherent unreliability of respondents (e.g., boredom, lucky guess, carelessness).

19 Some Challenges of latent trait measurement (5) 3. Latent trait measurement obtained is always subject to error. Systematic: Cheating on exam; Response bias; item does not measure latent trait; misscoring; test form out of order.

20 Some Challenges of latent trait measurement (5) 4. Establishing measurement scales for the latent trait.

21 Some Challenges of latent trait measurement (5) 4. Establishing measurement scales for the latent trait. Stevens (1946): “the assignment of numerals or events according to rules.” (NOT!)

22 Some Challenges of latent trait measurement (5) 4. Establishing measurement scales for the latent trait. Michell: Measurement requires tests of the hypothesis that the variable is quantitative. (Echoing Luce, Krantz, Suppes, Tversky, in three FM volumes)

23 Some Challenges of latent trait measurement (5) 5. Latent traits must also demonstrate relationships to other important traits or observable phenomena.

24 Some Challenges of latent trait measurement (5) 5. Latent traits must also demonstrate relationships to other important traits or observable phenomena. **Measurements of latent traits have value when they can be related to other traits or events in the real world.

25 WHAT IS TEST THEORY? The study of the 5 pervasive measurement problems just described, and developing/applying methods for their resolution.

26 TEST THEORY COURSE Become aware of the logic and mathematical models that underlie practices in test use and construction.

27 TEST THEORY COURSE Awareness of these models, including their assumptions and limitations, should lead to an improved practice in test construction and more intelligent use of test information in decision making.

28 TEST THEORY COURSE Test theory provides general framework for viewing the process of instrument development. Test theory distinguishes from the more applied subject of educational and psychological assessment (focuses on administration and interpretation of specific tests).

29 Process of Test Construction

30 TEST CONSTRUCTION 10 steps can be followed to construct an test for the measurement of persons (and items). (C&A, Chapter 4)

31 TEST CONSTRUCTION 1. Identify the primary purpose(s) for which the test measurements will be used.

32 TEST CONSTRUCTION 1. Identify the primary purpose(s) for which the test measurements will be used. 2. Hypothesize items that define the latent trait of interest.

33 TEST CONSTRUCTION 3. Prepare a set of test specifications, delineating the proportion of items that should focus on each type of behavior identified in Step 2.

34 TEST CONSTRUCTION 3. Prepare a set of test specifications, delineating the proportion of items that should focus on each type of behavior identified in Step Construct an initial pool of items.

35 TEST CONSTRUCTION 5. Have items reviewed and revised.

36 TEST CONSTRUCTION 5. Have items reviewed and revised. 6. Hold preliminary item tryouts (and revise).

37 TEST CONSTRUCTION 5. Have items reviewed and revised. 6. Hold preliminary item tryouts (and revise). 7. Field test the items on a large sample representative of the examinee population for whom the test is intended. (PILOT STUDY)

38 TEST CONSTRUCTION 8. Determine statistical properties of the items, and when appropriate, eliminate items that do not meet pre-established criteria.

39 TEST CONSTRUCTION 8. Determine statistical properties of the items, and when appropriate, eliminate items that do not meet pre-established criteria. 9. Design and conduct reliability and validity studies for the final form of the test.

40 TEST CONSTRUCTION 10. Develop guidelines for administration, scoring, and interpretation of the test scores. (e.g., prepare norm tables, suggest recommended cutting scores or standards for performance, etc.)

41 Statistical Concepts for Test Theory

42 BASIC STATISTICS (C&A2) Frequency tables and graphs Distribution Normal distribution (p.d.f., c.d.f.) Central tendency: Mode, median, mean. Variability: Variance, standard deviation. Z - scores For infinite populations.

43 BASIC STATISTICS (C&A2) Relationship between two variables Scatterplot. Pearson’s correlation coefficient. Ordinary linear regression. Standard error of Y predictions, for a given regression equation.

44 BASIC STATISTICS (C&A5) Statistics: Test Items Mean and total score for an item, over respondents (item difficulty). Variance of responses on a test item Inter-item correlation (Pearson’s product moment correlation or phi-correlation)

45 VARIANCE OF TEST SCORES AND TEST ITEMS Since tests are usually scored by the sum of the item scores, it follows that there should be some relationship between individual item variances and the variance of the total test scores.

46 VARIANCE OF TEST SCORES AND TEST ITEMS In fact, since the measurement of individual differences is a central goal of testing, one goal of test construction should be to maximize the variance of the total test scores. The reliability and validity of a test depends on this variance.

47 Covariance between items i and j : N = Number of respondents J = number of items  = population mean VARIANCE OF TEST SCORES AND TEST ITEMS

48 Variance-Covariance Matrix VARIANCE OF TEST SCORES AND TEST ITEMS

49 Total Test Score Variance = Sum of item variances + sum of item covariances VARIANCE OF TEST SCORES AND TEST ITEMS

50 Implications of Equation (first term) Total test score variance increases as the number of items (J) is increased. (except when the added items have a non positive correlation with the other items). VARIANCE OF TEST SCORES AND TEST ITEMS

51 Implications of Equation (second term) Test score variance increases when items are added that have positive covariances with the other test items. VARIANCE OF TEST SCORES AND TEST ITEMS

52 Implications of Equation Test score variance is maximized when: –items are equal in difficulty (this increases item covariances), –and of “medium” difficulty (this increases item variances). VARIANCE OF TEST SCORES AND TEST ITEMS

53 Introduction To Scaling

54 4 SCALES OF MEASUREMENT 1. Nominal Scale: –Used for classification. –Assigns the same numbers to objects that are equivalent, and a different number to objects that are not.

55 4 SCALES OF MEASUREMENT 1. Nominal Scale: –Class of admissible transformations: class of one-to-one transformations. i.e., n i (x) = n i (y) iff n j (x) = n j (y) for all scales i, j, and objects x, y.

56 4 SCALES OF MEASUREMENT 2. Ordinal Scale: –With respect to some attribute, this scale orders objects in magnitude, but does not measure distances between the objects. –Example: Ranking

57 4 SCALES OF MEASUREMENT 2. Ordinal Scale: –Class of admissible transformations: class of increasing monotonic transformations. i.e., n i (x) > n i (y) iff n jj (x) > n j (y) for all scales i, j, and objects x, y.

58 4 SCALES OF MEASUREMENT 3. Interval Scale: –Involves the numerical representation of relation upon the differences between entities with respect to some attribute. (no absolute zero point) –Example: temperature measurement. (Fahrenheit, Celsius)

59 4 SCALES OF MEASUREMENT 3. Interval Scale: –Class of admissible transformations: class of positive linear transformations. n j (x) = a[n i (x)] + b for a > 0, 0 0 e.g., C = (5/9)F  (160/9)

60 4 SCALES OF MEASUREMENT 4. Ratio Scale: –Has properties of order, equal distance between units, and an absolute zero point. –Non-zero measurements on this scale may be expressed as ratios of one another. –Examples: Length, weight, etc.

61 4 SCALES OF MEASUREMENT 4. Ratio Scale: –Class of admissible transformations: class of multiplicative transformations n i (x) = [ n j (x) ] c, for c > 0

62 MEASUREMENT As mentioned earlier, establishing a measurement scale for a given variable requires hypothesis tests. The measurement of directly observable, physical phenomena is easily obtainable and verifiable.

63 MEASUREMENT However, this is not the case for the measurement of “latent” psychological phenomena (e.g., ability, intelligence, attitudes, beliefs, etc.), which are not directly observable.

64 CONJOINT MEASUREMENT The axioms of conjoint measurement can be tested to determine whether latent traits are measurable on an ordinal or interval scale.

65 INDEPENDENCE AXIOM (row)

66 Monotone Homogeneity (MH)

67 2PL:

68 3PL:

69 4PL:

70 INDEPENDENCE AXIOM (column)

71 ISOP (Scheiblechner 1995)

72 RASCH-1PL:

73 Thomsen condition (e.g.,double cancellation)

74

75 MH analysis ICC Crossings

76 DM analysis

77 Model Selection & Evaluation

78 Model Assessment: Detailed

79 Model Assessment: Detailed Person Fit Posterior Item Predictive Examinee Responses P-value