Measurement Joseph Stevens, Ph.D. © 2005.  Measurement Process of assigning quantitative or qualitative descriptions to some attribute Operational Definitions.

Slides:



Advertisements
Similar presentations
Test Development.
Advertisements

Consistency in testing
Topics: Quality of Measurements
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
VALIDITY AND RELIABILITY
Lesson Six Reliability.
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
1 Effective Use of Benchmark Test and Item Statistics and Considerations When Setting Performance Levels California Educational Research Association Anaheim,
Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.
Jamie DeLeeuw, Ph.D. 5/7/13. Reliability Consistency of measurement. The measure itself is dependable. ***A measure must be reliable to be valid!*** High.
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
Chapter 4 Validity.
Session 3 Normal Distribution Scores Reliability.
PowerPoint Slides developed by Ms. Elizabeth Freeman
FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,
Research Methods in MIS
Classroom Assessment A Practical Guide for Educators by Craig A
Standardized Test Scores Common Representations for Parents and Students.
Validity and Reliability
Instrument Validity & Reliability. Why do we use instruments? Reliance upon our senses for empirical evidence Senses are unreliable Senses are imprecise.
Foundations of Educational Measurement
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
CRT Dependability Consistency for criterion- referenced decisions.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Reliability Chapter 3.  Every observed score is a combination of true score and error Obs. = T + E  Reliability = Classical Test Theory.
General Information Iowa Writing Assessment The Riverside Publishing Company, 1994 $39.00: 25 test booklets, 25 response sheets 40 minutes to plan, write.
Reliability & Validity
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Tests and Measurements Intersession 2006.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
Assessment What is it? Collection of relevant information for the purpose of making reliable curricular decisions and discriminations among students (Gallahue,
Research methods in clinical psychology: An introduction for students and practitioners Chris Barker, Nancy Pistrang, and Robert Elliott CHAPTER 4 Foundations.
MEASUREMENT: SCALE DEVELOPMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
The Teaching Process. Problem/condition Analyze Design Develop Implement Evaluate.
A ssessment & E valuation. Assessment Answers questions related to individuals, “What did the student learn?” Uses tests and other activities to determine.
Designs and Reliability Assessing Student Learning Section 4.2.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Chapter 9 Correlation, Validity and Reliability. Nature of Correlation Association – an attempt to describe or understand Not causal –However, many people.
Psychometrics. Goals of statistics Describe what is happening now –DESCRIPTIVE STATISTICS Determine what is probably happening or what might happen in.
Measurement MANA 4328 Dr. Jeanne Michalski
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Chapter 6 - Standardized Measurement and Assessment
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
Measuring Research Variables
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Chapter 2 Norms and Reliability. The essential objective of test standardization is to determine the distribution of raw scores in the norm group so that.
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
MGMT 588 Research Methods for Business Studies
Introduction to the Validation Phase
Questions What are the sources of error in measurement?
Test Design & Construction
Evaluation of measuring tools: validity
Reliability & Validity
Part II Knowing How to Assess Chapter 5 Minimizing Error
Evaluation of measuring tools: reliability
The first test of validity
Presentation transcript:

Measurement Joseph Stevens, Ph.D. © 2005

 Measurement Process of assigning quantitative or qualitative descriptions to some attribute Operational Definitions  Assessment Collection of measurement information Interpretation Synthesis Use  Evaluation Value added to assessment information (e.g. good, poor, “ought”, “needs improvement”)

Assessment Decisions/Purposes  Instructional  Curricular  Treatment/Intervention  Placement/Classification  Selection/Admission  Administration/Policy-making  Personal/Individual  Personnel Evaluation

Scaling  Process of systematically translating empirical observations into a measurement scale  Origin  Units  Information  Types of scales

Score Interpretation  Direct interpretation  Need for analysis, relative interpretation  Normative interpretation  Anchoring/Standards

Frames of Reference for Interpretation  Current versus future performance  Typical versus maximum or potential  Standard of comparison To self To others To standard  Formative versus summative

Domains  Cognitive Ability/Aptitude Achievement Memory, perception, etc.  Affective Beliefs Attitudes Feelings, interests, preferences, emotions  Behavior

Cognitive Level  Knowledge  Comprehension  Application  Analysis/Synthesis  Evaluation

Assessment Tasks  Selected Response – MC, T-F, matching  Restricted Response – cloze, fill-in, completion  Constructed Response - essay  Free Response/Performance Assessments Products Performances  Rating  Ranking  Magnitude Estimation

CRT versus NRT  Criterion Referenced Tests (CRT) Comparison to a criterion/standard Items that represent the domain  Relevance  Representativeness  Norm Referenced Tests Comparison to a group Items that discriminate one person from another

Kinds of Scores  Raw  Standard scores  Developmental Standard Scores  Percentile Ranks (PR)  Normal Curve Equivalent (NCE)  Grade Equivalent (GE)

Scoring Methods  Objective  Subjective Holistic Analytic

Aggregating Scores  Total scores  Summated scores  Composite scores  Issues Intercorrelation of components Variance Reliability

Theories of Measurement  Classical Test Theory (CTT) X = T + E  Item Response Theory (IRT)

Reliability  Consistency  Consistency of Decisions  Prerequisite to validity  Errors in measurement

Reliability  Sources of errors Variations in physical and mental condition of person measured Changes in physical or environmental conditions Tasks/Items Administration conditions Time Skill to skill Raters/judges Test forms

Estimating Reliability  Reliability versus standard error of measurement (SEM)  Internal Consistency Cronbach’s alpha Split-half Example  Test-Retest  Inter-rater

Estimating Reliability  Correlations, rank order versus exact agreement  Percent Agreement Exact versus close (number of agreements/number of scores x 100) Problem of chance agreements

Estimating Reliability  Kappa Coefficient Takes chance agreements into account Calculate expected frequencies and subtract Kappa ≥.70 acceptable Examine pattern of disagreements  Example Example Percent agreement = 63.8% r =.509 Kappa =.451

BelowMeetsExceedsTotal Below93113 Meets48214 Exceeds2169 Total

Estimating Reliability  Spearman-Brown prophecy formula  More is better

Reliability as error  Systematic error  Random error  SEM _______ SEM = SD x √ 1 - r xx

Factors affecting reliability  Time limits  Test length  Item characteristics Difficulty Discrimination  Heterogeneity of sample  Number of raters, quality of subjective scoring

Validity  Accuracy  Unified View (Messick) Use and Interpretation Evidential basis  Content  Criterion  Concurrent-Discriminant  Construct Consequential basis

Validity  Internal, structural  Multitrait-Multimethod (Campbell & Fiske)  Predictive

Test Development  Construct Representation Content analysis Review of research Direct observation Expert judgment (panels, ratings, Delphi) Instructional objectives

Test Development  Blueprint Content X Process Domain sampling Item frames Matching item type and response format to purpose  Item writing  Item Review (grammar, readability, cueing, sensitivity)

Test Development  Writing instructions  Form design (NAEP brown ink)  Field and pilot testing  Item analysis  Review and revision

Equating  Need to link across forms, people, or occasions  Horizontal equating  Vertical equating  Designs Common item Common persons

Equating  Equipercentile  Linear  IRT

Bias and Sensitivity  Sensitivity in item and test development  Differential results versus bias Differential Item Functioning (DIF) Importance of matching, legal versus psychometric Understanding diversity and individual differences

Item Analysis  Difficulty, p  Means and standard deviations  Discrimination, r-point biserial  Omits  Removing or revising “bad” items  Example Example

Factor Analysis  Method of evaluating structural validity and reliability  Exploratory (EFA) exampleexample  Confirmatory (CFA) exampleexample