What goes into developing a high-quality credentialing test?
Speakers Heidi Leenay Anthony Marini, PhD Chad Kinart, ATC, MBA Director of Education OptumHealth Anthony Marini, PhD Founder/Psychometrician Martek Chad Kinart, ATC, MBA Director of Partner Services Assessment Systems
Agenda The landscape of psychometrics and credentialing OptumHealth: How we started out The Test Development Cycle OptumHealth: How we achieved our goals
Landscape What is our goal? Assessments that provide accurate information about candidate competence in a profession. You know the profession and the candidates. We’re here to talk about what it means to “provide accurate information”
Landscape First: Definitions Accreditation Credentialing Certification Licensure Certificates Education Microcredentials First: Definitions Certification: Recognition of broad skills/knowledge with extensive assessment. Voluntary for candidate. Example – Certified Ophthalmic Assistant Licensure: Recognition of broad skills/knowledge with extensive assessment. Mandatory for candidate. (usually by Govt) Example – Certified Ophthalmic Assistant (required by law in State X) Certificates: Narrower of broad skills/knowledge with moderate assessment. Voluntary for candidate. Example – Certified LASIK assistant (LASIK is obviously a portion of ophthalmology) Education: Just recognition that you have completed learning, with no indication of mastery. Example – Completed a 6 month training program in ophthalmic assisting at a community college Microcredentials: Very narrow recognition of skills/knowledge. Often tied to specific products. Example – Took course and passed test to fit contact lenses from Brand X Accreditation
Landscape More on Accreditation… A stamp of approval that you developed your credentialing program aligned with best practices External to your organization (third party)
Landscape More on Accreditation… A stamp of approval that you developed your credentialing program aligned with best practices External to your organization (third party)
Landscape Accreditation: Who makes the standards? Additional standards: AERA/APA/NCME ITC Govt (in USA, Uniform Guidelines)
Landscape What are in these standards? Policies Test Development Board Governance Financials Test Development Policies Recertification Eligibility Landscape What are in these standards? There’s more than this yet! We focus on this one aspect today. It is relevant even if you do not get accredited, because it means you are following industry standards. This makes your assessment much, much more defensible and effective.
OptumHealth: How We Started Heidi, tell your story here
The Test Development Cycle So, how do we develop a credentialing test that is aligned with best practices? This section of the presentation will provide an overview of major steps in this process.
Item Development Construction and Review Where do test items/tasks come from? Defining content for a high stakes exam
Step 1: Define what the test covers Scope of practice How role fits in profession High level KSAs needed Sometimes defined for you? In some cases, the licensure/govt regulations define this ahead of time
Step 1: Define what the test covers Job Analysis You can’t just pick the topics Need real data to drive decisions Different ways to do this, but a common one is Job Task Analysis
Step 1: Define what the test covers Job Task Analysis Define domains and tasks in them Survey incumbents on importance and time spent Analyze which domains have more tasks and higher ratings
Step 1: Define what the test covers Job Task Analysis: Example Domain Tasks Mean TxI Sum TxI Percent A 14 4.21 58.94 24.02 B 11 3.64 40.04 16.32 C 21 3.75 78.75 32.09 D 17 3.98 67.66 27.57 Domain A doesn’t have many tasks, but they are quite important or frequent, so it gets an edge. Domain C has the most tasks, so it gets the largest weight on the test.
Item Development, Construction and Review Who develops the items/tasks for examinations and how are they selected? How should writers be selected and what characteristics should they possess? What conflict of interest issues should be considered in selecting item writers? What methods of compensation exist for acknowledging the work of writers?
Item Development, Construction and Review What is the nature of training for preparing item/task writers? What document resources are necessary to support writers? How do you enhance the sustainability of a writing team?
Item Development, Construction and Review What processes are available to help ensure the relevancy and quality of testing materials? What practices should be implemented to ensure the security of examination materials during the writing process?
Item Development, Construction and Review How can the principles of universal design help to ensure maximum accessibility? How can we ensure that important issues such as linguistic complexity are recognized in item development? If there is a need for a second language what challenges does exam translation create?
Item Development, Construction and Review What contribution can technology make in supporting both item development and exam construction activities?
Evaluating Evaluation Feedback from students Feedback from item analysis Difficulty index Discrimination index
Item Analysis Example Prop. Answering Correctly: 0.74 Discrimination Index: 0.42 Point Biserial: 0.40 Alt. Total Ratio Low (27%) High (27%) Point Biserial Key A 0.74 0.44 0.86 0.40 * B 0.09 0.17 0.03 -0.13 C 0.04 0.06 0.08 -0.03 D 0.13 0.33 -0.38 E 0.00 Other
Item Analysis Example Prop. Answering Correctly: 0.18 Discrimination Index: -0.03 Point Biserial: -0.01 Alt. Total Ratio Low (27%) High (27%) Point Biserial Key A 0.34 0.27 0.44 0.23 ? B 0.18 0.22 0.19 -0.01 * C 0.12 0.06 0.17 0.10 D 0.36 0.47 -0.29 E 0.00 Other
Standard Setting It is legally indefensible to pick an arbitrary cutscore like 75% Also indefensible to be quota-based Should be criterion-referenced
Standard Setting Some of the acceptable methods: Modified Angoff (most common) Nedelsky/Ebel (superseded by Angoff) Bookmark Contrasting Groups Borderline Group Hofstee Scalemark
Standard Setting Modified Angoff: driven by SME panel Define minimally competent candidate Rate items on % of MCCs that would get correct Discuss items with poor agreement Re-rate Evaluate the average rating… average score you’d expect from MCC
Linking/Equating If you have more than one form (required for NCCA!) they must be statistically equated If scores on Form 2 are higher, is it because the items are easier or it happened to have smarter candidates?
Linking/Equating Easier items on A Smarter candidates on A Form Mean score Mean score on anchors A 72 14 B 70 Form Mean score Mean score on anchors A 72 15 B 70 14 100 item test with 20 anchors First example: same score on anchor items so we know one group is NOT smarter than the other Second example: Score on anchors is higher so they are smarter
Feedback Reports What misconceptions exist What candidates want What can we offer and what disclaimers are required
Feedback Report for MCQ
Feedback Report for OSCE
Accommodations Providing accommodations in accordance with the Americans with Disabilities Act (ADA), Accessibility for Ontario with Disabilities Act (AODA) Documentation Common accommodations Other accommodations On site requirements
Retake Policy The current landscape (limited vs unlimited) From the candidates perspective From the examining board or regulatory perspective From the psychometric perspective
OptumHealth: How We Achieved Our Goals Heidi, tell your story here
Speaker Contact Information Anthony Marini, Ph.D. Senior Faculty Developer, Carleton University President, Martek Assessments Ltd. Ottawa Ontario, Canada E-mail: mmartek@rogers.com