SETTING & MAINTAINING EXAM STANDARDS Raja C. Bandaranayake.

Slides:



Advertisements
Similar presentations
Principles of Standard Setting
Advertisements

Item Analysis.
Test Development.
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved
Topic 4B Test Construction.
M AKING A PPROPRIATE P ASS- F AIL D ECISIONS D WIGHT H ARLEY, Ph.D. DIVISION OF STUDIES IN MEDICAL EDUCATION UNIVERSITY OF ALBERTA.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.
Hypothesis Testing making decisions using sample data.
Methods of Standard Setting
Copyright © 2011 by Pearson Education, Inc. All rights reserved Statistics for the Behavioral and Social Sciences: A Brief Course Fifth Edition Arthur.
Medical school attendedPassing grade Dr JohnNorthsouth COM (NSCOM)80% Dr SmithEastwest COM (EWCOM)50% Which of these doctors would you like to treat you?
Standard Setting for Professional Certification Brian D. Bontempo Mountain Measurement, Inc. (503) ext 129.
Grading Scenarios.
Chapter 4 Validity.
Item Analysis What makes a question good??? Answer options?
Standard Setting Different names for the same thing Standard Passing Score Cut Score Cutoff Score Mastery Level Bench Mark.
Objective Exam Score Distribution. Item Difficulty Power Item
Cutting scores Using tests to improve decisions: Cutting scores & base rates.
SETTING & MAINTAINING EXAM STANDARDS
Control Charts for Variables
Item Analysis Prof. Trevor Gibbs. Item Analysis After you have set your assessment: How can you be sure that the test items are appropriate?—Not too easy.
Multiple Choice Test Item Analysis Facilitator: Sophia Scott.
VALIDITY & RELIABILITY Raja C. Bandaranayake. QUALITIES OF MEASUREMENT DEVICES  Validity Does it measure what it is supposed to measure?  Reliability.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Wastewater Treatment Plant Operator Exam Setting Performance Standards With The Modified Angoff Procedure.
Standardized Test Scores Common Representations for Parents and Students.
Classroom Assessment A Practical Guide for Educators by Craig A
Determining the Size of
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Measurement and Data Quality
COMPASS National and Local Norming Sandra Bolt, M.S., Director Student Assessment Services South Seattle Community College February 2010.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Office of Institutional Research, Planning and Assessment January 24, 2011 UNDERSTANDING THE DIAGNOSTIC GUIDE.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Go to Index Analysis of Means Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
Item 1 Picabo came in at a speed of 100 mph on the downhill. Tommy, on a bad day, came in at the same speed. The average female speed on the downhill is.
Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.
How to Fail a Student Lisa M. Beardsley-Hardy, PhD, MPH, MBA Director of Education General Conference of Seventh-day Adventists.
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.
1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5.
Reliability & Validity
Statistics (cont.) Psych 231: Research Methods in Psychology.
Cut Points ITE Section One n What are Cut Points?
CHAPTER OVERVIEW The Measurement Process Levels of Measurement Reliability and Validity: Why They Are Very, Very Important A Conceptual Definition of Reliability.
Issues concerning the interpretation of statistical significance tests.
Assessment and Testing
Ap statistics FRAPPY! 10. ideal solution part a students teachers studentsteachers Min Q1 Med Q3 Max.
Measurement Theory in Marketing Research. Measurement What is measurement?  Assignment of numerals to objects to represent quantities of attributes Don’t.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
PSYCHOMETRICS. SPHS 5780, LECTURE 6: PSYCHOMETRICS, “STANDARDIZED ASSESSMENT”, NORM-REFERENCED TESTING.
TEST SCORES INTERPRETATION - is a process of assigning meaning and usefulness to the scores obtained from classroom test. - This is necessary because.
Assessment Assessment is the collection, recording and analysis of data about students as they work over a period of time. This should include, teacher,
Inferential Statistics Psych 231: Research Methods in Psychology.
© 2009 Pearson Prentice Hall, Salkind. Chapter 5 Measurement, Reliability and Validity.
CLEAR 2011 Annual Educational Conference
ARDHIAN SUSENO CHOIRUL RISA PRADANA P.
Data Analysis and Standard Setting
Week 10 Slides.
Criterion Referencing Judges Who are the best predictors?
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Relationship between Standardized and Classroom-based Assessment
Presentation transcript:

SETTING & MAINTAINING EXAM STANDARDS Raja C. Bandaranayake

DEFINITIONS Standard setting is a process of determining how much is good enough. The standard or criterion level of performance is a point on the scale of measurement at which separation of competence and incompetence occurs. Cut-score, cut-off score or passing score represents this standard on a given test for making decisions pertaining to the purpose for which the test was conducted, e.g., to certify competence.

ERROR IN MEASUREMENT True score is a conceptual measure indicating true extent of competence in a given subject, e.g., Anatomy. Observed score is the score assigned as a result of taking a test, say in Anatomy. The difference between true and observed scores is indicative of the amount of error in the measurement. The reliability of a test and the associated standard error of measurement are estimates of the amount of error in the measurement.

DECISION ERRORS False positive: passing an incompetent examinee False negative: failing a competent examinee

NORM- & CRITERION-REFERENCED STANDARDS NORM-REFERENCED Relative Based on peer- performance Varies with each group Cut-off point not related to competence CRITERION- REFERENCED Absolute Not related to peer performance Standard set prior to exam Referenced to a defined level of performance

METHODS OF STANDARD SETTING 1.Test-centred methods Standards derived from hypothetical decisions based on test content before the test is answered. 2.Examinee-centred methods Standards derived from reviewing examinees’ performance before deciding cut-off score. 3.Compromise methods Provide flexibility for adjusting the standard based on the examinees’ performance on the test.

NEDELSKY (1954) METHOD: Example Consider N judges and n MCQ items of 1 in 5 type Judge A identifies 2 options in item 1 as those which a minimally competent examinee should eliminate as incorrect. MPL for that item for Judge A [MPL A1 ] = 1/(5-2) = 1/3 Similarly, in item 2 he identifies 3 options, giving an MPL A2 = 1/(5-3) = 1/2 He repeats this process for each item. The exam MPL for Judge A [MPL A ] = MPL A1 +MPL A2 + MPL A3 + ………….MPL An Similarly, Judge B’s MPL [MPL B ] is determined The MPL for the exam (= cut-off score) is: (MPL A + MPL B + MPL C +….MPL N ) / N

ANGOFF (1971) METHOD Example N judges consider 100 minimally competent examinees taking an MCQ exam of n items. Judge A estimates that, of these examinees, 50 should answer item 1 correctly, 20 item 2 correctly, 70 item 3 correctly, and so on to item n. The MPL for Judge A [MPL A ] = ( x n ) / n X 100 = (say) A%. Similarly, for Judges B, C, D, E, …..N, the MPLs would be B%, C%, D%, E% ……N%, respectively. The MPL (cut-off score) for the exam is: (A% + B% + C% + D% + E% +....N%) / N

EBEL (1972) METHOD Example Assume that Judge A assigns items in a 200-item MCQ test to the cells of a “relevance-by-difficulty” matrix, as follows. He then estimates the percentage of items in each cell of the matrix that a minimally competent examinee should be able to answer correctly (as indicated within the cell). Each cell also includes the products of these two values. EASY MEDIUM HARD ESSENTIAL 15 x 100% = x 80% = x 60% = 600 IMPORTANT 20 x 80% = x 60% = x 50% =1000 ACCEPTABLE 10 x 50% = x 40% = x 10% = 50 QUESTIONABLE 10 x 30% = x 20% = x 0% = 0

EBEL (1972) METHOD - contd. Example The MPL for Judge A [MPL A ] is then: ( ) / 200 = % Similarly, the MPL for Judges B [MPL B ], C [MPL c ], D [MPL D ] …..N [MPL N ] are determined. The MPL for the exam (cut-off score) is: (MPL A + MPL B + MPL c + MPL D + …..MPL N ) / N

PROPOSED EBEL MODIFICATION PROPOSED EBEL MODIFICATION EASY MEDIUM HARD ESSENT. 6x 100% = x 80% = x 50% = 350 IMPORT. 12 x 80% = x 60% = x 40% = 760 ACCEPT. 5 x 60% = x 50% = x 10% = 30 MPL: = =6000/100 = 60

Failure Rate % Cut-off score(%) f min f ma x c min c max A B HOFSTEE METHOD

Example A plot of cut-off scores for a given exam against resulting failure rates is given cmin = 40% cmax = 45% fmin = 10% fmax = 20% A = point representing cmin,fmax B = point representing cmax,fmin Line AB intersects the curve at a cut-off point of 42.5% Thus, operational cut-off score = 42.5%

CUT-OFF SCORE FOR 1 IN 5 MCQ [FRACS PART 1] Probability of guessing (=1 in 5) = 20% ‘Total ignorance’ score= 20% Maximum possible score=100% Effective range of scores= 20% to 100% Mid-point of this range= 60% Additional factor (as PG exam)= 5% Nominal cut-off score (60%+5%) = 65%

CUT-OFF SCORES: “MARKER QUESTIONS” 1. Comparison of exam scores Mean score in this exam: 56.7% Average exam mean score over last 4 years: 59.4% Thus mean score in this exam is: 2.7% lower Assuming this candidate group is of same standard as in last 4 yrs, this exam is: 2.7% harder

CUT-OFF SCORES: “MARKER QUESTIONS” - contd. 2. Comparison of “marker” scores Mean score in this exam on previously used questions (N=162): 62.5% Mean score on same questions when they were each last used: 60.5% Thus, when compared with previous candidates, this group of candidates, on these items, scored ( )% = 2.0% higher Thus this group of candidates is: 2.0% better than previous groups

CUT-OFF SCORES: “MARKER QUESTIONS” – contd. 3. Estimating examination difficulty Thus it is expected that their mean score in this exam would be:2.0%higher But their mean score in this exam is: 2.7%lower Thus this exam is really: 4.7%harder

CUT-OFF SCORES: “MARKER QUESTIONS” –contd. 4. Determining cut-off score The cut-off level for an average exam is:65.0% Thus the cut-off level for this exam should be (65 – 4.7)% = 60.3% Cut-off score = 60.3%

Failure Rate % Cut-off score(%) HOFSTEE CURVE