1 EPSY 546: LECTURE 1 SUMMARY George Karabatsos. 2 REVIEW.

Slides:



Advertisements
Similar presentations
Questionnaire Development
Advertisements

Chapter 16: Correlation.
Some (Simplified) Steps for Creating a Personality Questionnaire Generate an item pool Administer the items to a sample of people Assess the uni-dimensionality.
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
MEASUREMENT CONCEPTS © 2012 The McGraw-Hill Companies, Inc.
The Department of Psychology
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Professor Gary Merlo Westfield State College
Reliability - The extent to which a test or instrument gives consistent measurement - The strength of the relation between observed scores and true scores.
4/25/2015 Marketing Research 1. 4/25/2015Marketing Research2 MEASUREMENT  An attempt to provide an objective estimate of a natural phenomenon ◦ e.g.
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Part II Sigma Freud & Descriptive Statistics
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Measurement the process by which we test hypotheses and theories. assesses traits and abilities by means other than testing obtains information by comparing.
5/15/2015Marketing Research1 MEASUREMENT  An attempt to provide an objective estimate of a natural phenomenon ◦ e.g. measuring height ◦ or weight.
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
Reliability and Validity of Research Instruments
RESEARCH METHODS Lecture 18
Concept of Measurement
Beginning the Research Design
Item PersonI1I2I3 A441 B 323 C 232 D 112 Item I1I2I3 A(h)110 B(h)110 C(l)011 D(l)000 Item Variance: Rank ordering of individuals. P*Q for dichotomous items.
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
LECTURE 5 TRUE SCORE THEORY. True Score Theory OBJECTIVES: - know basic model, assumptions - know definition of reliability, relation to TST - be able.
1 Measurement PROCESS AND PRODUCT. 2 MEASUREMENT The assignment of numerals to phenomena according to rules.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
Measurement: Reliability and Validity For a measure to be useful, it must be both reliable and valid Reliable = consistent in producing the same results.
Research Methods in MIS
Hypothesis Testing Using The One-Sample t-Test
Measurement and Data Quality
Reliability, Validity, & Scaling
MEASUREMENT MODELS. BASIC EQUATION x =  + e x = observed score  = true (latent) score: represents the score that would be obtained over many independent.
Measurement in Exercise and Sport Psychology Research EPHE 348.
Reliability and Validity what is measured and how well.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
1 EPSY 546: LECTURE 1 INTRODUCTION TO MEASUREMENT THEORY George Karabatsos.
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Tests and Measurements Intersession 2006.
Chapter 16 The Chi-Square Statistic
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
EPSY 546: LECTURE 3 GENERALIZABILITY THEORY AND VALIDITY
Research methods in clinical psychology: An introduction for students and practitioners Chris Barker, Nancy Pistrang, and Robert Elliott CHAPTER 4 Foundations.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
MEASUREMENT. MeasurementThe assignment of numbers to observed phenomena according to certain rules. Rules of CorrespondenceDefines measurement in a given.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Chapter 6 - Standardized Measurement and Assessment
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
Lesson 5.1 Evaluation of the measurement instrument: reliability I.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Classical Test Theory Psych DeShon. Big Picture To make good decisions, you must know how much error is in the data upon which the decisions are.
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
Lecture 5 Validity and Reliability
Product Reliability Measuring
Evaluation of measuring tools: validity
Tests and Measurements: Reliability
Classical Test Theory Margaret Wu.
Week 3 Class Discussion.
پرسشنامه کارگاه.
PSY 614 Instructor: Emily Bullock Yowell, Ph.D.
Evaluation of measuring tools: reliability
Presentation transcript:

1 EPSY 546: LECTURE 1 SUMMARY George Karabatsos

2 REVIEW

3 Test (& types of tests) REVIEW

4 Test (& types of tests) Item response scoring paradigms REVIEW

5 Test (& types of tests) Item response scoring paradigms Data paradigm of test theory (typical) REVIEW

6 DATA PARADIGM

7 Latent Trait   Re (unidimensional) REVIEW: Latent Trait

8 Latent Trait   Re (unidimensional) Real Examples of Latent Traits REVIEW: Latent Trait

9 Item Response Function (IRF) REVIEW: IRF

10 Item Response Function (IRF) –Represents different theories about latent traits. REVIEW: IRF

11 Item Response Function (IRF) –Dichotomous response: P j (  ) = Pr[X j = 1] = Pr[Correct Response to item j |  ] REVIEW: IRF

12 Item Response Function (IRF) –Polychotomous response: P jk (  ) = Pr[X j > k |  ] = Pr[Exceed category k of item j |  ] REVIEW: IRF

13 Item Response Function (IRF) –Dichotomous or Polychotomous response: E j (  ) = [Expected Rating for item j |  ] 0 < E j (  ) < K REVIEW: IRF

14 IRF: Dichotomous items

15 IRF: Polychotomous items

16 The unweighted total score X +n stochastically orders the latent trait  (Hyunh, 1994; Grayson, 1988) REVIEW: SCALES

17 4 Scales of Measurement –Conjoint Measurement REVIEW: SCALES

18 Conjoint Measurement –Row Independence Axiom REVIEW

19 Conjoint Measurement –Row Independence Axiom Property: Ordinal Scaling and unidimensionality of  (test score) REVIEW

20 INDEPENDENCE AXIOM (row)

21 Conjoint Measurement –Row Independence Axiom Property: Ordinal Scaling and unidimensionality of  (test score) IRF: Non-decreasing over  REVIEW

22 Conjoint Measurement –Row Independence Axiom Property: Ordinal Scaling and unidimensionality of  (test score) IRF: Non-decreasing over  Models: MH, 2PL, 3PL, 4PL, True Score, Factor Analysis REVIEW

23 2PL:

24 3PL:

25 4PL:

26 Monotone Homogeneity (MH)

27 Conjoint Measurement –Column Independence Axiom (adding) REVIEW

28 Conjoint Measurement –Column Independence Axiom (adding) Property: Ordinal Scaling and unidimensionality of both  (test score) and item difficulty (item score) REVIEW

29 INDEPENDENCE AXIOM (column)

30 Conjoint Measurement –Column Independence Axiom (adding) Property: Ordinal Scaling and unidimensionality of both  (test score) and item difficulty (item score) IRF: Non-decreasing and non-intersecting over  REVIEW

31 Conjoint Measurement –Column Independence Axiom (adding) Property: Ordinal Scaling and unidimensionality of both  (test score) and item difficulty (item score) IRF: Non-decreasing and non-intersecting over  Models: DM, ISOP REVIEW

32 DM/ISOP (Scheiblechner 1995)

33 Conjoint Measurement –Thomsen Condition (adding) REVIEW

34 Conjoint Measurement –Thomsen Condition (adding) Property: Interval Scaling and unidimensionality of both  (test score) and item difficulty (item score) REVIEW

35 Thomsen condition (e.g.,double cancellation)

36 Conjoint Measurement –Thomsen Condition (adding) Property: Interval Scaling and unidimensionality of both  (test score) and item difficulty (item score) IRF: Non-decreasing and parallel (non- intersecting) over  REVIEW

37 Conjoint Measurement –Thomsen Condition (adding) Property: Interval Scaling and unidimensionality of both  (test score) and item difficulty (item score) IRF: Non-decreasing and parallel (non- intersecting) over  Models: Rasch Model, ADISOP REVIEW

38 RASCH-1PL:

39 5 Challenges of Latent Trait Measurement REVIEW

40 5 Challenges of Latent Trait Measurement Test Theory attempts to address these challenges REVIEW

41 Test Construction (10 Steps) REVIEW

42 Test Construction (10 Steps) Basic Statistics of Test Theory REVIEW

43 Total Test Score (X + ) variance = Sum[Item Variances] + Sum[Item Covariances] REVIEW

44 EPSY 546: LECTURE 2 TRUE SCORE TEST THEORY AND RELIABILITY George Karabatsos

45 TRUE SCORE MODEL Theory: Test score is a random variable. X +n Observed Test Score of person n, T n True Test Score (unknown) e n Random Error (unknown)

46 TRUE SCORE MODEL The Observed person test score X +n is a random variable (according to some distribution) with mean T n = E(X +n ) and variance  2 (X +n ) =  2 (e n ).

47 TRUE SCORE MODEL The Observed person test score X +n is a random variable (according to some distribution) with mean T n = E(X +n ) and variance  2 (X +n ) =  2 (e n ). Random Error e n = X +n – T n is distributed with mean E(e n ) = E(X +n –T n ) = 0, and variance  2 (e n ) =  2 (X +n ).

48 TRUE SCORE MODEL True Score: T n true score of person n E (X n )expected score of person n sPossible score s  {0,1,…,s,…,S} p ns Pr[Person n has test score s]

49 TRUE SCORE MODEL 3 Assumptions: 1)Over the population of examinees, error has a mean of 0. E[e] = 0 2)Over the population of examinees, true scores and error scores have 0 correlation.  [T, e] = 0

50 TRUE SCORE MODEL 3 Assumptions: 3)For a set of persons, the correlations of the error scores between two testings is zero.  [e 1, e 2 ] = 0 –“Two testings”: when a set of persons take two separate tests, or complete two testing occasions with the same form. –The two sets of person scores are assumed to be randomly chosen from two independent distributions of possible observed scores.

51 TRUE SCORE ESTIMATION

52 TRUE SCORE ESTIMATION

53 TRUE SCORE ESTIMATION

54 TRUE SCORE ESTIMATION is test reliability. The proportion of variance of observed scores that is explained by the variance of the true scores.

55 TEST RELIABILITY is the error of measurement.

56 TEST RELIABILITY is the standard error of measurement. (random error)

57 TEST RELIABILITY is the standard error of measurement. (random error) Estimated ((1–  )*100)% confidence interval around the test score:

58 TEST RELIABILITY It is desirable for a test to be Reliable.

59 TEST RELIABILITY Reliability – the degree to which the respondents’ test scores are consistent over repeated administrations of the same test.

60 TEST RELIABILITY Reliability – the degree to which the respondents’ test scores are consistent over repeated administrations of the same test. Indicates the precision of a set of test scores in the sample.

61 TEST RELIABILITY Reliability – the degree to which the respondents’ test scores are consistent over repeated administrations of the same test. Indicates the precision of a set of test scores in the sample. Random and systematic error can affect the reliability of a test.

62 TEST RELIABILITY Reliability – the degree to which the respondents’ test scores are consistent over repeated administrations of the same test. Test developers have a responsibility to demonstrate the reliability of scores obtained from their tests.

63 ESTIMATING RELIABILITY Estimated item variance Estimated total test score variance

64 ESTIMATING RELIABILITY Estimated covariance between items i and j Estimated total test score variance

65 OTHER FORMS OF RELIABILITY Test-Retest Reliability: The correlation between persons’ test scores over two administrations of the same test.

66 OTHER FORMS OF RELIABILITY Split-Half Reliability (using Spearman-Brown correction for test length):  AB Correlation between scores of Test A and Test B

67 TEST VALIDITY VALIDITY: A test is valid if it measures what it claims to measure. Types: Face, Content, Concurrent, Predictive, Construct.

68 Face validity: When the test items appear to measure what the test claims to measure. Content Validity: When the content of the test items, according to experts, adequately represent the latent trait that the test intends to measure. TEST VALIDITY

69 Concurrent validity: When the test, measuring a particular latent trait, correlates highly with another test that measures the same trait. Predictive validity: When the scores of the test predict some meaningful criterion. TEST VALIDITY

70 Construct validity: A test has construct validity when the results of using the test fit hypotheses concerning the nature of the latent trait. The higher the fit, the higher the construct validity. TEST VALIDITY

71 RELIABILITY & VALIDITY Up to a point, reliability and validity increase together, but then any further increase in reliability (over ~.96) decreases validity. For e.g., when there is perfect reliability (perfect correlations between items), the test items are essentially paraphrases of each other.

72 RELIABILITY & VALIDITY “If the reliability of the items were increased to unity, all correlations between items would also become unity, and a person passing one item would pass all items and and another failing one item would fail all the other items. Thus all the possible scores would be a perfect score of one or zero…Is the dichotomy of scores the best that would be expected for items with equal difficulty?” (Tucker, 1946, on the attenuation paradox) (see also Loevinger, 1954)