Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Defending Your Licensing Examination Programme Deborah Worrad Registrar and.

Slides:

Advertisements

Similar presentations

Knowledge Dietary Managers Association 1 PART II - DMA Certification Exam Blueprint and Exam Development-

Advertisements

Test Development.

Experts Respond to Your Questions 2006 Annual ConferenceAlexandria, Virginia Council on Licensure, Enforcement and Regulation Expect the Unexpected: Are.

Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.

Software Quality Assurance Plan

Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.

Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.

Evaluating tests and examinations What questions to ask to make sure your assessment is the best that can be produced within your context. Dianne Wall.

Assessment: Reliability, Validity, and Absence of bias

Chapter 4 Validity.

© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 8 Using Survey Research.

Chapter 7 Correlational Research Gay, Mills, and Airasian

Classroom Assessment A Practical Guide for Educators by Craig A

Codex Guidelines for the Application of HACCP

Conducting a Job Analysis to Establish the Examination Content Domain Patricia M. Muenzen Associate Director of Research Programs Professional Examination.

1 Development of Valid and Reliable Case Studies for Teaching, Diagnostic Reasoning, and Other Purposes Margaret Lunney, RN, PhD Professor College of.

Determining Sample Size

AICT5 – eProject Project Planning for ICT. Process Centre receives Scenario Group Work Scenario on website in October Assessment Window Individual Work.

Standard Setting Methods with High Stakes Assessments Barbara S. Plake Buros Center for Testing University of Nebraska.

Kaizen–What Can I Do To Improve My Program? F. Jay Breyer, Ph.D. Presented at the 2005 CLEAR Annual Conference September Phoenix,

New York State Education Department Understanding The Process: Science Assessments and the New York State Learning Standards.

Ch 6 Validity of Instrument

Foundations of Educational Measurement

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.

McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.

Quality in language assessment – guidelines and standards Waldek Martyniuk ECML Graz, Austria.

LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.

Psychometrics William P. Wattles, Ph.D. Francis Marion University.

Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.

© Grant Thornton | | | | | Guidance on Monitoring Internal Control Systems COSO Monitoring Project Update FEI - CFIT Meeting September 25, 2008.

Evaluating a Research Report

EDU 385 Education Assessment in the Classroom

Consumer behavior studies1 CONSUMER BEHAVIOR STUDIES STATISTICAL ISSUES Ralph B. D’Agostino, Sr. Boston University Harvard Clinical Research Institute.

Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.

EDU 8603 Day 6. What do the following numbers mean?

Eloise Forster, Ed.D. Foundation for Educational Administration (FEA)

Programme Objectives Analyze the main components of a competency-based qualification system (e.g., Singapore Workforce Skills) Analyze the process and.

6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)

Common to some 90% of organizations Acknowledged by CEOs to drive strategy Failure rates of 80%-90% Produces conflict & competition Some have advocated.

What is Science? or 1.Science is concerned with understanding how nature and the physical world work. 2.Science can prove anything, solve any problem,

Assessment and Testing

Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.

Copyright © Allyn & Bacon 2008 Intelligent Consumer Chapter 14 This multimedia product and its contents are protected under copyright law. The following.

Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”

McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:

Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.

Evaluate Phase Pertemuan Matakuliah: A0774/Information Technology Capital Budgeting Tahun: 2009.

Development of the Egyptian Code of Practice for Student Assessment Lamis Ragab, MD, MHPE Hala Salah, MD.

RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.

Chapter 6 - Standardized Measurement and Assessment

Chapter 3 Selection of Assessment Tools. Council of Exceptional Children’s Professional Standards All special educators should possess a common core of.

Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.

Presenters: Promoting Regulatory Excellence Steven S. Nettles, EdD and Lawrence J. Fabrey, PhD Applied Measurement Professionals Regulators’ Role in Establishing.

ICAD3218A Create User Documentation.  Before starting to create any user documentation ask ‘What is the documentation going to be used for?’.  When.

AHIMA’s Commission on Certification for Health Informatics and Information Management (CCHIIM) Test Development Process Jo Santos, RHIA Senior Manager,

Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –

AUDIT STAFF TRAINING WORKSHOP 13 TH – 14 TH NOVEMBER 2014, HILTON HOTEL NAIROBI AUDIT PLANNING 1.

EVALUATING EPP-CREATED ASSESSMENTS

NCATE Unit Standards 1 and 2

6 Selecting Employees and Placing Them in Jobs

CLEAR 2011 Annual Educational Conference

Concept of Test Validity

The Processes and Requirements for Developing and Maintaining a Certification Program for Safety Professionals Presenter: David B. West, CSP, ASP, PE,

Reliability and Validity of Measurement

TESTING AND EVALUATION IN EDUCATION GA 3113 lecture 1

AICT5 – eProject Project Planning for ICT

Presentation transcript:

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Defending Your Licensing Examination Programme Deborah Worrad Registrar and Executive Director College of Massage Therapists of Ontario

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Critical Steps Job Analysis Survey Blueprint for Examination Item Development & Test Development Cut Scores & Scoring/Analysis Security

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Subject Matter Experts Selection Broadly representative of the profession Specialties of practice Ethnicity Age distribution Education level Gender distribution Representation from newly credentialed practitioners Geographical distribution Urban vs. rural practice locations Practice settings

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Job Analysis Survey Provides the framework for the examination development a critical element for ensuring that valid interpretations are made about an individual’s exam performance a link between what is done on the job and how candidates are evaluated for competency

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Job Analysis Survey Comprehensive survey of critical knowledge, skills and abilities (KSAs) required by an occupation Relative importance, frequency and level of proficiency of tasks must be established

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Job Analysis Survey Multiple sources of information should be used to develop the KSAs The survey must provide sufficient detail in order to provide enough data to support exam construction (blueprint)

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Job Analysis Survey Good directions User friendly simple layout Demographic information requested from respondents Reasonable rating scale Pilot test

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Job Analysis Survey Survey is sent to either a representative sample (large profession) or all members (small) With computer technology the JAS can be done on line saving costs associated with printing and mailing Motivating members to complete the survey may be necessary

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Job Analysis Survey Statistical analysis of results must include elimination of outliers and respondents with personal agendas A final technical report with the data analysis must be produced

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Blueprint for Examination An examination based on a JAS provides the foundation for the programme content validity The data from the JAS on tasks and KSAs critical to effective performance is used to create the examination blueprint Subject Matter Experts review the blueprint to confirm results from data analysis

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Item Development Items must fit the test blueprint and be properly referenced Principles of item writing must be followed and the writers trained to create items that will properly discriminate at an entry level The writers must be demographically representative of practitioners

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Item Development Item editing is completed by a team of Subject Matter Experts (SMEs) for content review and verification of accuracy Items are converted to a second language at this point if required Items should be pre-tested with large enough samples

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Examination Psychometrics Options Computer adaptive model Paper and pencil model with item response theory (IRT) and pre-testing Equipercentile equating using an embedded set of items on every form for equating and establishing a pass score

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Test Development Relationship between test specifications and content must be logical and defensible Test questions are linked to blueprint which is linked to the JAS Exam materials must be secure

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Test Development Elements of test development differ depending on model you are using Generally - develop a test form ensuring Items selected meet statistical requirements Items match the blueprint No item cues another item No repetition of same items

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Cut Scores Use an approved method to establish minimal competence standards required to pass the examination This establishes the cut score (pass level)

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Cut Scores One method is the modified Angoff in which a SME panel makes judgements about the minimally competent candidate’s ability to answer each item correctly This is frequently used by testing programmes and does not take too long to complete

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Cut Scores The SMEs provide an estimate of the proportion of minimally competent candidates who would respond correctly to each item This process is completed for all items and an average rating established for each item Individual item rating data are analyzed to establish the passing score

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Scoring Scoring must be correct in all aspects: Scanning Error checks Proper key Quality control Reporting

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Scoring/Analysis Test item analysis on item difficulty and item discrimination must be conducted Adopt a model of scoring appropriate for your exam (IRT, equipercentile equating) Must ensure that the passing scores are fair and consistent eliminating the impact of varying difficulty among forms

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Scoring Adopting a scaled score for reporting results to candidates may be beneficial Scaling scores facilitates the reporting of any shifts in the passing point due to ease or difficulty of a form Cut scores may vary depending on the test form so scaling enables reporting on a common scale

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Security For all aspects of the work related to examinations, proper security procedures must be followed including: Passwords and password maintenance Programme software security Back-ups Encryption for transmissions Confidentiality agreements

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Security Exam administration security must include: Exam materials locked in fire proof vaults Security of delivery of exam materials Diligence in dealing with changes in technology if computer delivery of the exam is used

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Presentation Follow-up Please pick up a handout from this presentation -AND/OR- Presentation materials will be posted on CLEAR’s website

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Defending Your Licensing Examination Program Robert C. Shaw, Jr., PhD With Data

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 The Defense Triangle Content Reliability Criterion Test Score Use

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content Standard (1999) – “The content domain to be covered by a credentialing test should be defined clearly and justified in terms of the importance of the content...” We typically evaluate tasks along an importance dimension or a significance dimension that incorporates importance and frequency extent dimension

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content Task importance/significance scale points 4. Extremely 3. Very 2. Moderately 1. Not Task extent scale point 0. Never Performed

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content We cause each task to independently surpass importance/significance and extent exclusion rules We do not composite task ratings We are concerned about diluting tests with relatively trivial content (high extent-low importance) or including content that may be unfair to test (low extent-high importance)

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content Selecting a subset of tasks and labeling them critical is only defensible when the original list was reasonably complete We typically ask task inventory respondents how adequately the task list covered the job completely, adequately, inadequately We then calculate percentages of respondents who selected each option

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content Evaluate task rating consistency Were the people consistent? Intraclass correlation Were tasks consistently rated within each content domain? Coefficient alpha

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content We typically ask task inventory respondents in what percentages they would allocate items across content areas to lend support to the structure of the outline I encourage a task force to explicitly follow these results or follow the rank order Because items are specified according to the outline, we feel these results demonstrate broader support for test specifications beyond the task force

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content What percentage of items would you allocate to each content area?

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Reliability Test scores lack utility until one can show the measurement scale is reasonably precise

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Reliability Test score precision is often expressed in terms of Kuder-Richardson Formula 20 (KR 20) when items are dichotomously (i.e., 0 or 1) scored Coefficient Alpha when items are scored on a broader scale (e.g., 0 to 5) Standard Error of Measurement

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Reliability Standard (1999) – “Estimates of the reliability of test-based credentialing decisions should be provided.” “Comment:... Other types of reliability estimates and associated standard errors of measurement may also be useful, but the reliability of the decision of whether or not to certify is of primary importance”

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Reliability Decision Consistency Index Theoretic Second Attempt First Attempt

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion The criterion to which test scores are related can be represented by two planks Minimal Competence Expectation Criterion-Related Study

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion Most programs rely on the minimal competence criterion expressed in a passing point study

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion Judges’ expectations are expressed through text describing minimally competent practitioners item difficulty ratings We calculate an intraclass correlation to focus on the consistency with which judges’ gave ratings We find confidence intervals around the mean rating

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion We use the discrimination value to look for aberrant behavior from judges

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion Mean of judges’ ratings Passing score

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion One of my clients was sued in 1975 In spite of evidence linking test content to a 1973 role delineation study, the court would not dismiss the case Issues that required defense were discrimination or adverse impact from of test score use job-relatedness of test scores

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion Only after a criterion-related validation study was conducted was the suit settled

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion Theoretic model of these studies Critical Content Supervisor Rating Inventory TestCorrelation of Ratings and Test Scores

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion Test Bias Study Compare regression lines of job performance from test scores for focal and comparator groups There are statistical procedures available to determine whether slopes and intercepts significantly differ Differences in mean scores are not necessarily a critical indicator

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 The Defense Triangle Content Reliability Criterion Test Score Use

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Presentation Follow-up Presentation materials will be posted on CLEAR’s website

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Defending Your Program: Strengthening Validity in Existing Examinations Ron Rodgers, Ph.D. Director of Measurement Services Continental Testing Services (CTS)

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Waht Can Go Wrong? 1. Job/practice analysis & test specs 2. Item development & documentation 3. Test assembly procedures & controls 4. Candidate information: before & after 5. Scoring accuracy & item revalidation 6. Suspected cheating & candidate appeals 7. Practical exam procedures & scoring

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Job Analysis & Test Specs Undocumented (or no) job analysis Embedded test specifications Unrepresentative populations for job analysis or pilot testing Misuse of “trial forms” and data to support “live” examinations

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Item Development Do item authors and reviewers sign and understand non-disclosure agreements? How does each question reflect job analysis results and test specifications? Should qualified candidates be able to answer Qs correctly with information available during the examination?

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Item Development Do any questions offer cues that answer other questions on an exam? Do item patterns offer cues to marginally qualified, test-savvy candidates? Is longest answer always correct? If None of the above or All of the above Qs are used, are these always correct? True-False questions with clear patterns? Do other detectable patterns cue answers?

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Item Documentation Are all Qs supported by references cited for and available to all candidates? Do any questions cite item authors or committee members as “references”? Are page references cited for each Q? Are citations updated as new editions of each reference are published?

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Candidate Information Are all references identified to and equally available to all candidates? Are content outlines for each test provided to help candidates prepare? Are sample Qs given to all candidates? Are candidates told what they must/may bring and use during the examination?

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Test Assembly Controls Are parallel forms assembled to be of approximately equal difficulty? Is answer key properly balanced? Approx. equal numbers of each option Limit consecutive Qs with same answer Avoid repeated patterns of responses Avoid long series of Qs without an option

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Suspected Cheating Is potential cheating behavior at the test site clearly defined for onsite staff? Are candidates informed of possible consequences of suspected cheating? Are staff trained to respond fairly and appropriately to suspected cheating? Are procedures in place to help staff document/report suspected cheating?

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Scoring Controls How is accuracy of answer key verified? Do item analyses show any anomalies in candidate performance on test? Are oddly performing Qs revalidated? Identify ambiguities in sources or Qs Verify that each Q has one right answer Give credit to all candidates when needed Are scoring adjustments applied fairly? Are rescores/refunds issued as needed?

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Candidate Appeals How do candidates request rescoring? Do policies allow cancellation of scores when organized cheating is found? Harvested Qs on websites, in print Are appeal procedures available? Are appeal procedures explained? How is test security protected during candidate appeal procedures?

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Practical Examinations Is test uniform for all candidates? Is passing score defensible? Are scoring controls in place to limit bias for or against individual candidates? Are scoring criteria well-documented? Are judges well-trained to apply scoring criteria consistently? Are scoring judgments easy to record? How are marginal scores resolved?

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Presentation Follow-up Please pick up a handout from this presentation -AND/OR- Presentation materials will be posted on CLEAR’s website