Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Defending Your Licensing Examination Programme Deborah Worrad Registrar and.

Similar presentations


Presentation on theme: "Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Defending Your Licensing Examination Programme Deborah Worrad Registrar and."— Presentation transcript:

1 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Defending Your Licensing Examination Programme Deborah Worrad Registrar and Executive Director College of Massage Therapists of Ontario

2 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Critical Steps Job Analysis Survey Blueprint for Examination Item Development & Test Development Cut Scores & Scoring/Analysis Security

3 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Subject Matter Experts Selection Broadly representative of the profession Specialties of practice Ethnicity Age distribution Education level Gender distribution Representation from newly credentialed practitioners Geographical distribution Urban vs. rural practice locations Practice settings

4 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Job Analysis Survey Provides the framework for the examination development a critical element for ensuring that valid interpretations are made about an individual’s exam performance a link between what is done on the job and how candidates are evaluated for competency

5 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Job Analysis Survey Comprehensive survey of critical knowledge, skills and abilities (KSAs) required by an occupation Relative importance, frequency and level of proficiency of tasks must be established

6 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Job Analysis Survey Multiple sources of information should be used to develop the KSAs The survey must provide sufficient detail in order to provide enough data to support exam construction (blueprint)

7 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Job Analysis Survey Good directions User friendly simple layout Demographic information requested from respondents Reasonable rating scale Pilot test

8 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Job Analysis Survey Survey is sent to either a representative sample (large profession) or all members (small) With computer technology the JAS can be done on line saving costs associated with printing and mailing Motivating members to complete the survey may be necessary

9 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Job Analysis Survey Statistical analysis of results must include elimination of outliers and respondents with personal agendas A final technical report with the data analysis must be produced

10 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Blueprint for Examination An examination based on a JAS provides the foundation for the programme content validity The data from the JAS on tasks and KSAs critical to effective performance is used to create the examination blueprint Subject Matter Experts review the blueprint to confirm results from data analysis

11 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Item Development Items must fit the test blueprint and be properly referenced Principles of item writing must be followed and the writers trained to create items that will properly discriminate at an entry level The writers must be demographically representative of practitioners

12 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Item Development Item editing is completed by a team of Subject Matter Experts (SMEs) for content review and verification of accuracy Items are converted to a second language at this point if required Items should be pre-tested with large enough samples

13 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Examination Psychometrics Options Computer adaptive model Paper and pencil model with item response theory (IRT) and pre-testing Equipercentile equating using an embedded set of items on every form for equating and establishing a pass score

14 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Test Development Relationship between test specifications and content must be logical and defensible Test questions are linked to blueprint which is linked to the JAS Exam materials must be secure

15 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Test Development Elements of test development differ depending on model you are using Generally - develop a test form ensuring Items selected meet statistical requirements Items match the blueprint No item cues another item No repetition of same items

16 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Cut Scores Use an approved method to establish minimal competence standards required to pass the examination This establishes the cut score (pass level)

17 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Cut Scores One method is the modified Angoff in which a SME panel makes judgements about the minimally competent candidate’s ability to answer each item correctly This is frequently used by testing programmes and does not take too long to complete

18 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Cut Scores The SMEs provide an estimate of the proportion of minimally competent candidates who would respond correctly to each item This process is completed for all items and an average rating established for each item Individual item rating data are analyzed to establish the passing score

19 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Scoring Scoring must be correct in all aspects: Scanning Error checks Proper key Quality control Reporting

20 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Scoring/Analysis Test item analysis on item difficulty and item discrimination must be conducted Adopt a model of scoring appropriate for your exam (IRT, equipercentile equating) Must ensure that the passing scores are fair and consistent eliminating the impact of varying difficulty among forms

21 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Scoring Adopting a scaled score for reporting results to candidates may be beneficial Scaling scores facilitates the reporting of any shifts in the passing point due to ease or difficulty of a form Cut scores may vary depending on the test form so scaling enables reporting on a common scale

22 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Security For all aspects of the work related to examinations, proper security procedures must be followed including: Passwords and password maintenance Programme software security Back-ups Encryption for email transmissions Confidentiality agreements

23 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Security Exam administration security must include: Exam materials locked in fire proof vaults Security of delivery of exam materials Diligence in dealing with changes in technology if computer delivery of the exam is used

24 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Presentation Follow-up Please pick up a handout from this presentation -AND/OR- Presentation materials will be posted on CLEAR’s website

25 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Defending Your Licensing Examination Program Robert C. Shaw, Jr., PhD With Data

26 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 The Defense Triangle Content Reliability Criterion Test Score Use

27 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content Standard 14.14 (1999) – “The content domain to be covered by a credentialing test should be defined clearly and justified in terms of the importance of the content...” We typically evaluate tasks along an importance dimension or a significance dimension that incorporates importance and frequency extent dimension

28 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content Task importance/significance scale points 4. Extremely 3. Very 2. Moderately 1. Not Task extent scale point 0. Never Performed

29 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content We cause each task to independently surpass importance/significance and extent exclusion rules We do not composite task ratings We are concerned about diluting tests with relatively trivial content (high extent-low importance) or including content that may be unfair to test (low extent-high importance)

30 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content Selecting a subset of tasks and labeling them critical is only defensible when the original list was reasonably complete We typically ask task inventory respondents how adequately the task list covered the job completely, adequately, inadequately We then calculate percentages of respondents who selected each option

31 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content Evaluate task rating consistency Were the people consistent? Intraclass correlation Were tasks consistently rated within each content domain? Coefficient alpha

32 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content

33 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content We typically ask task inventory respondents in what percentages they would allocate items across content areas to lend support to the structure of the outline I encourage a task force to explicitly follow these results or follow the rank order Because items are specified according to the outline, we feel these results demonstrate broader support for test specifications beyond the task force

34 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Content What percentage of items would you allocate to each content area?

35 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Reliability Test scores lack utility until one can show the measurement scale is reasonably precise

36 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Reliability Test score precision is often expressed in terms of Kuder-Richardson Formula 20 (KR 20) when items are dichotomously (i.e., 0 or 1) scored Coefficient Alpha when items are scored on a broader scale (e.g., 0 to 5) Standard Error of Measurement

37 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Reliability Standard 14.15 (1999) – “Estimates of the reliability of test-based credentialing decisions should be provided.” “Comment:... Other types of reliability estimates and associated standard errors of measurement may also be useful, but the reliability of the decision of whether or not to certify is of primary importance”

38 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Reliability Decision Consistency Index Theoretic Second Attempt First Attempt

39 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion The criterion to which test scores are related can be represented by two planks Minimal Competence Expectation Criterion-Related Study

40 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion Most programs rely on the minimal competence criterion expressed in a passing point study

41 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion Judges’ expectations are expressed through text describing minimally competent practitioners item difficulty ratings We calculate an intraclass correlation to focus on the consistency with which judges’ gave ratings We find confidence intervals around the mean rating

42 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion We use the discrimination value to look for aberrant behavior from judges

43 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion Mean of judges’ ratings Passing score

44 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion One of my clients was sued in 1975 In spite of evidence linking test content to a 1973 role delineation study, the court would not dismiss the case Issues that required defense were discrimination or adverse impact from of test score use job-relatedness of test scores

45 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion Only after a criterion-related validation study was conducted was the suit settled

46 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion Theoretic model of these studies Critical Content Supervisor Rating Inventory TestCorrelation of Ratings and Test Scores

47 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Criterion Test Bias Study Compare regression lines of job performance from test scores for focal and comparator groups There are statistical procedures available to determine whether slopes and intercepts significantly differ Differences in mean scores are not necessarily a critical indicator

48 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 The Defense Triangle Content Reliability Criterion Test Score Use

49 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Presentation Follow-up Presentation materials will be posted on CLEAR’s website rshaw@goamp.com

50 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Defending Your Program: Strengthening Validity in Existing Examinations Ron Rodgers, Ph.D. Director of Measurement Services Continental Testing Services (CTS)

51 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Waht Can Go Wrong? 1. Job/practice analysis & test specs 2. Item development & documentation 3. Test assembly procedures & controls 4. Candidate information: before & after 5. Scoring accuracy & item revalidation 6. Suspected cheating & candidate appeals 7. Practical exam procedures & scoring

52 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Job Analysis & Test Specs Undocumented (or no) job analysis Embedded test specifications Unrepresentative populations for job analysis or pilot testing Misuse of “trial forms” and data to support “live” examinations

53 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Item Development Do item authors and reviewers sign and understand non-disclosure agreements? How does each question reflect job analysis results and test specifications? Should qualified candidates be able to answer Qs correctly with information available during the examination?

54 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Item Development Do any questions offer cues that answer other questions on an exam? Do item patterns offer cues to marginally qualified, test-savvy candidates? Is longest answer always correct? If None of the above or All of the above Qs are used, are these always correct? True-False questions with clear patterns? Do other detectable patterns cue answers?

55 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Item Documentation Are all Qs supported by references cited for and available to all candidates? Do any questions cite item authors or committee members as “references”? Are page references cited for each Q? Are citations updated as new editions of each reference are published?

56 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Candidate Information Are all references identified to and equally available to all candidates? Are content outlines for each test provided to help candidates prepare? Are sample Qs given to all candidates? Are candidates told what they must/may bring and use during the examination?

57 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Test Assembly Controls Are parallel forms assembled to be of approximately equal difficulty? Is answer key properly balanced? Approx. equal numbers of each option Limit consecutive Qs with same answer Avoid repeated patterns of responses Avoid long series of Qs without an option

58 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Suspected Cheating Is potential cheating behavior at the test site clearly defined for onsite staff? Are candidates informed of possible consequences of suspected cheating? Are staff trained to respond fairly and appropriately to suspected cheating? Are procedures in place to help staff document/report suspected cheating?

59 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Scoring Controls How is accuracy of answer key verified? Do item analyses show any anomalies in candidate performance on test? Are oddly performing Qs revalidated? Identify ambiguities in sources or Qs Verify that each Q has one right answer Give credit to all candidates when needed Are scoring adjustments applied fairly? Are rescores/refunds issued as needed?

60 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Candidate Appeals How do candidates request rescoring? Do policies allow cancellation of scores when organized cheating is found? Harvested Qs on websites, in print Are appeal procedures available? Are appeal procedures explained? How is test security protected during candidate appeal procedures?

61 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Practical Examinations Is test uniform for all candidates? Is passing score defensible? Are scoring controls in place to limit bias for or against individual candidates? Are scoring criteria well-documented? Are judges well-trained to apply scoring criteria consistently? Are scoring judgments easy to record? How are marginal scores resolved?

62 Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Presentation Follow-up Please pick up a handout from this presentation -AND/OR- Presentation materials will be posted on CLEAR’s website


Download ppt "Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Defending Your Licensing Examination Programme Deborah Worrad Registrar and."

Similar presentations


Ads by Google