Download presentation
Presentation is loading. Please wait.
1
Measurement Joseph Stevens, Ph.D. © 2005
2
Measurement Process of assigning quantitative or qualitative descriptions to some attribute Operational Definitions Assessment Collection of measurement information Interpretation Synthesis Use Evaluation Value added to assessment information (e.g. good, poor, “ought”, “needs improvement”)
3
Assessment Decisions/Purposes Instructional Curricular Treatment/Intervention Placement/Classification Selection/Admission Administration/Policy-making Personal/Individual Personnel Evaluation
4
Scaling Process of systematically translating empirical observations into a measurement scale Origin Units Information Types of scales
5
Score Interpretation Direct interpretation Need for analysis, relative interpretation Normative interpretation Anchoring/Standards
6
Frames of Reference for Interpretation Current versus future performance Typical versus maximum or potential Standard of comparison To self To others To standard Formative versus summative
7
Domains Cognitive Ability/Aptitude Achievement Memory, perception, etc. Affective Beliefs Attitudes Feelings, interests, preferences, emotions Behavior
8
Cognitive Level Knowledge Comprehension Application Analysis/Synthesis Evaluation
9
Assessment Tasks Selected Response – MC, T-F, matching Restricted Response – cloze, fill-in, completion Constructed Response - essay Free Response/Performance Assessments Products Performances Rating Ranking Magnitude Estimation
10
CRT versus NRT Criterion Referenced Tests (CRT) Comparison to a criterion/standard Items that represent the domain Relevance Representativeness Norm Referenced Tests Comparison to a group Items that discriminate one person from another
11
Kinds of Scores Raw Standard scores Developmental Standard Scores Percentile Ranks (PR) Normal Curve Equivalent (NCE) Grade Equivalent (GE)
12
Scoring Methods Objective Subjective Holistic Analytic
15
Aggregating Scores Total scores Summated scores Composite scores Issues Intercorrelation of components Variance Reliability
16
Theories of Measurement Classical Test Theory (CTT) X = T + E Item Response Theory (IRT) http://work.psych.uiuc.edu/irt/tutorial.asp
19
Reliability Consistency Consistency of Decisions Prerequisite to validity Errors in measurement
20
Reliability Sources of errors Variations in physical and mental condition of person measured Changes in physical or environmental conditions Tasks/Items Administration conditions Time Skill to skill Raters/judges Test forms
21
Estimating Reliability Reliability versus standard error of measurement (SEM) Internal Consistency Cronbach’s alpha Split-half Example Test-Retest Inter-rater
22
Estimating Reliability Correlations, rank order versus exact agreement Percent Agreement Exact versus close (number of agreements/number of scores x 100) Problem of chance agreements
23
Estimating Reliability Kappa Coefficient Takes chance agreements into account Calculate expected frequencies and subtract Kappa ≥.70 acceptable Examine pattern of disagreements Example Example Percent agreement = 63.8% r =.509 Kappa =.451
24
BelowMeetsExceedsTotal Below93113 Meets48214 Exceeds2169 Total1512936
25
Estimating Reliability Spearman-Brown prophecy formula More is better
26
Reliability as error Systematic error Random error SEM _______ SEM = SD x √ 1 - r xx
27
Factors affecting reliability Time limits Test length Item characteristics Difficulty Discrimination Heterogeneity of sample Number of raters, quality of subjective scoring
28
Validity Accuracy Unified View (Messick) Use and Interpretation Evidential basis Content Criterion Concurrent-Discriminant Construct Consequential basis
29
Validity Internal, structural Multitrait-Multimethod (Campbell & Fiske) Predictive
30
Test Development Construct Representation Content analysis Review of research Direct observation Expert judgment (panels, ratings, Delphi) Instructional objectives
31
Test Development Blueprint Content X Process Domain sampling Item frames Matching item type and response format to purpose Item writing Item Review (grammar, readability, cueing, sensitivity)
32
Test Development Writing instructions Form design (NAEP brown ink) Field and pilot testing Item analysis Review and revision
33
Equating Need to link across forms, people, or occasions Horizontal equating Vertical equating Designs Common item Common persons
34
Equating Equipercentile Linear IRT
35
Bias and Sensitivity Sensitivity in item and test development Differential results versus bias Differential Item Functioning (DIF) Importance of matching, legal versus psychometric Understanding diversity and individual differences
36
Item Analysis Difficulty, p Means and standard deviations Discrimination, r-point biserial Omits Removing or revising “bad” items Example Example
37
Factor Analysis Method of evaluating structural validity and reliability Exploratory (EFA) exampleexample Confirmatory (CFA) exampleexample
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.