Tests and Measurements

Slides:

Advertisements

Similar presentations

Advertisements

Test Development.

CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.

Topic 4B Test Construction.

Topics: Quality of Measurements

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

VALIDITY AND RELIABILITY

 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.

Item Response Theory in Health Measurement

Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.

AN OVERVIEW OF THE FAMILY OF RASCH MODELS Elena Kardanova

Psy 427 Cal State Northridge Andrew Ainsworth PhD Cal State Northridge - Psy 4271.

Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.

Chapter 4 Validity.

Item Analysis What makes a question good??? Answer options?

Reliability or Validity Reliability gets more attention: n n Easier to understand n n Easier to measure n n More formulas (like stats!) n n Base for validity.

Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.

Item Analysis Prof. Trevor Gibbs. Item Analysis After you have set your assessment: How can you be sure that the test items are appropriate?—Not too easy.

CORRELATION COEFFICIENTS What Does a Correlation Coefficient Indicate? What is a Scatterplot? Correlation Coefficients What Could a Low r mean? What is.

Multiple Choice Test Item Analysis Facilitator: Sophia Scott.

Today Concepts underlying inferential statistics

Chapter 7 Correlational Research Gay, Mills, and Airasian

Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.

Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.

Multivariate Methods EPSY 5245 Michael C. Rodriguez.

Copyright © 2001 by The Psychological Corporation 1 The Academic Competence Evaluation Scales (ACES) Rating scale technology for identifying students with.

CHAPTER 4 Research in Psychology: Methods & Design

Determining Sample Size

Near East University Department of English Language Teaching Advanced Research Techniques Correlational Studies Abdalmonam H. Elkorbow.

Foundations of Educational Measurement

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.

Technical Adequacy Session One Part Three.

Induction to assessing student learning Mr. Howard Sou Session 2 August 2014 Federation for Self-financing Tertiary Education 1.

Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.

Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.

Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.

Descriptive Statistics

Chapter 4: Test administration. z scores Standard score expressed in terms of standard deviation units which indicates distance raw score is from mean.

Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.

MEASUREMENT: SCALE DEVELOPMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.

1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.

Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”

© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.

Appraisal and Its Application to Counseling COUN 550 Saint Joseph College Ability, Intelligence, Aptitude and Achievement Testing For Class #12 Copyright.

Psychometrics. Goals of statistics Describe what is happening now –DESCRIPTIVE STATISTICS Determine what is probably happening or what might happen in.

©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Item Response Theory in Health Measurement

Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.

FIT ANALYSIS IN RASCH MODEL University of Ostrava Czech republic 26-31, March, 2012.

Topic #5: Selection Theory

Chapter 6 - Standardized Measurement and Assessment

Dan Thompson Oklahoma State University Center for Health Science Evaluating Assessments: Utilizing ExamSoft’s item-analysis to better understand student.

Intelligence. What is Intelligence? ▪ Definition: – The mental abilities to adapt to and shape the environment ▪ Involves reacting to and forming your.

Educational Research Chapter 5 Selecting Measuring Instruments Gay and Airasian.

Items analysis Introduction Items can adopt different formats and assess cognitive variables (skills, performance, etc.) where there are right and.

Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.

Logic of Hypothesis Testing

ARDHIAN SUSENO CHOIRUL RISA PRADANA P.

CHAPTER 4 Research in Psychology: Methods & Design

Evaluation of measuring tools: validity

Classical Test Theory Margaret Wu.

Reliability & Validity

Statistical significance & the Normal Curve

Test Development Test conceptualization Test construction Test tryout

PSY 614 Instructor: Emily Bullock, Ph.D.

EPSY 5245 EPSY 5245 Michael C. Rodriguez

Classroom Assessment A Practical Guide for Educators by Craig A. Mertler Chapter 8 Objective Test Items.

Tests are given for 4 primary reasons.

Presentation transcript:

46-320-01 Tests and Measurements Intersession 2006

Writing Items DeVellis (1991) Cultural/ethnic sensitivity Define Item Pool Avoid long items Appropriate level of reading Avoid double-barreled items Mix positively and negatively worded items Cultural/ethnic sensitivity

Item Format Dichotomous format Two alternatives Pros: Ease of construction and scoring, absolute judgment Cons: memorization, chance of being correct

Item Format Polytomous format More than two alternatives Pros: less chance guessing, fast time, distractors Corrected scores: Guessing?

Item Format Likert format Category format Visual Analogue scale Degree of agreement Five alternatives vs. six Reverse scoring Category format 10-point scale – why 10? Remember context Visual Analogue scale 100 cm line

Item Format Checklist Q-Sort Usually adjectives Increases options (9) Form normal distribution

Item Analysis Purpose: shorten a test and increase reliability and validity Item difficulty Proportion who get the item correct Probability of chance Optimum level Variable difficulty (0.3 to 0.7) Internal criteria = test score

Discriminability Extreme group method Point Biserial method Discrimination index Negative discriminator Point Biserial method Small test n Higher correlation, better the item

Discrimination Item U (20) M L Difficulty (U+M+L) Discrimination (U-L) 1 15 9 7 31 8 2 20 16 56 4 3 19 18 46 10 11 37 -6 5 13 35 6 14 39

Table Explained Class n = 60 Discrimination: rough index = U – L Item Difficulty: U + M + L Items: 2 = too easy 7 = too difficult 4 & 5 = negative discriminative value

Further Item Analysis Response Options Item Group 1 2 3 4 5 Upper Lower 20 16 10 9 11 7 8

Discrimination Index: Percentages Percent Passing Index of Discrimination (D) Item Upper Lower 1 75 35 40 2 100 80 20 3 95 45 50 4 -30 5 55 6 7 25

Item Characteristic Curve X axis: total test score (trait estimate) Y axis: proportion of test-takers with the item correct Often use class intervals

Discriminability Best scenario

Item Response Theory Each item has an item characteristic curve Specific range of difficulty can be identified with a test characteristic curve Difficulty and discriminability Sample items Peaked conventional vs. rectangular conventional vs. adaptive

Criterion-Referenced Tests Specify objectives – aids learning Give test to two groups Exposed vs. not Antimode – cutting score Any problems with this?

Test Manuals Proprietary - qualifications Nonproprietary Standards for Educational and Psychological Testing *reflects changes in federal law and measurement trends affecting validity testing individuals with disabilities or different linguistic backgrounds new types of tests as well as new uses of existing tests * Taken from apa.org

Test Manuals Should include: Be critical! How to administer (standard conditions) How to score How to interpret Information on reliability, validity, norms Be critical!

Base Rates and Hit Rates What does this test contribute beyond what is already know? Cutting score not necessarily correct decision Hit rate vs. base rate comparison False negatives and false positives

Taylor-Russell Tables What does the test contribute beyond base? Need Definition of success Base rate Selection ratio Test validity coefficient Determines likelihood someone selected on basis of test will succeed

Taylor-Russell Tables Source: Fisher, Schoenfeldt, & Shaw (2003), Table 7.2

Taylor-Russell Tables Best: validity high, selection rate low Bad: validity low, selection rate high Useless: no validity Selecting low scorers?

Incremental Validity Unique information from using a test Predicting future behavior and self-ratings Prediction should consider: Simpler method? Less expensive method? Less subject strain?

Mental Measurements Yearbook Test reviews