Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.

Slides:



Advertisements
Similar presentations
Ed-D 420 Inclusion of Exceptional Learners. CAT time Learner-Centered - Learner-centered techniques focus on strategies and approaches to improve learning.
Advertisements

An Introduction to Test Construction
Standardized Scales.
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Chapter Fifteen Understanding and Using Standardized Tests.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
Assessment: Reliability, Validity, and Absence of bias
Chapter 4 Validity.
Concept of Measurement
Meta-analysis & psychotherapy outcome research
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Tests and Measurement Donna Sundre, EdD Robin D. Anderson, PsyD.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Grade 12 Subject Specific Ministry Training Sessions
Classroom Assessment A Practical Guide for Educators by Craig A
Classroom Assessment A Practical Guide for Educators by Craig A
Understanding Validity for Teachers
VALIDITY. Validity is an important characteristic of a scientific instrument. The term validity denotes the scientific utility of a measuring instrument,
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Quality Improvement Prepeared By Dr: Manal Moussa.
Chapter 1 Assessment in Elementary and Secondary Classrooms
Measurement and Data Quality
1 Development of Valid and Reliable Case Studies for Teaching, Diagnostic Reasoning, and Other Purposes Margaret Lunney, RN, PhD Professor College of.
Determining Sample Size
Measurement in Exercise and Sport Psychology Research EPHE 348.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Induction to assessing student learning Mr. Howard Sou Session 2 August 2014 Federation for Self-financing Tertiary Education 1.
Classroom Assessments Checklists, Rating Scales, and Rubrics
Miller Function & Participation Scales (M-FUN)
1 EPSY 546: LECTURE 1 INTRODUCTION TO MEASUREMENT THEORY George Karabatsos.
EDU 385 Education Assessment in the Classroom
Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.
Teaching Today: An Introduction to Education 8th edition
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 7 Selecting, Administering,
Validity Is the Test Appropriate, Useful, and Meaningful?
Item specifications and analysis
Session 7 Standardized Assessment. Standardized Tests Assess students’ under uniform conditions: a) Structured directions for administration b) Procedures.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
Assessment and Testing
Copyright © Allyn & Bacon 2008 Intelligent Consumer Chapter 14 This multimedia product and its contents are protected under copyright law. The following.
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
CHAPTER 2 Research Methods in Industrial/Organizational Psychology
Psychometrics. Goals of statistics Describe what is happening now –DESCRIPTIVE STATISTICS Determine what is probably happening or what might happen in.
Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.
TOTAL QUALITY MANAGEMENT
PSYCHOMETRICS. SPHS 5780, LECTURE 6: PSYCHOMETRICS, “STANDARDIZED ASSESSMENT”, NORM-REFERENCED TESTING.
Principles of Assessment and Outcome Measurement for Physical Therapists ksu. edu. sa Dr. taher _ yahoo. com Mohammed TA, Omar,
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 25 Critiquing Assessments Sherrilene Classen, Craig A. Velozo.
Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.
Chapter 1 Assessment in Elementary and Secondary Classrooms
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.
Concept of Test Validity
Understanding Results
Internal assessment criteria
CHAPTER 2 Research Methods in Industrial/Organizational Psychology
Week 3 Class Discussion.
پرسشنامه کارگاه.
TESTING AND EVALUATION IN EDUCATION GA 3113 lecture 1
Chapter 4 Standardized Tests.
REVIEW I Reliability scraps Index of Reliability
Chapter 3: How Standardized Test….
Presentation transcript:

Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy

WHY SHOULD THERAPISTS USE STANDARDISED TESTS?  Most testing errors occur because therapist:  collected insufficient data  collected inaccurate data  used unsystematic data-collection methods  obtained irrelevant data  failed to verify data collected  obtained data that have been contaminated by bias or prejudice (e.g. either by the person completing the self-report or proxy survey or by the therapist)  not accurately communicated data collected

 A rigorously developed and psychometrically robust test will:  involve the collection of sufficient data for the purpose of the test (e.g. to screen for a specific impairment)  have established reliability so data are collected accurately  use a systematic data-collection method  have established validity so that data obtained are related to the stated purpose and focus of the test  provide information about confidence intervals so the therapist can judge how likely it is that this test result has provided a true picture of the person’s ability and/or deficits  reduce the influence of bias or prejudice on test results  have a record form for recording, analysing and communicating scores.

 The major advantage of a statistically-based assessment – and it is crucial to professional effectiveness and quality of decision making – is that it provides an objective source of data, unbiased by subjective feeling  Definition of objective: Facts or findings which are clearly observable to and verifiable by others, as reflecting reality.  Definition of subjective An observation not rigidly reflecting measurable reality; may imply observer bias; may not be verifiable by others in the same situation.

WHAT IS A STANDARDISED TEST?  The AOTA defines the word standardised as:  ‘made standard or uniform; to be used without variation; suggests an invariable way in which a test is to be used, as well as denoting the extent to which the results of the test may be considered to be both valid and reliable’

What makes a test standardized?  A measurement tool that is published  Has been designed for a specific purpose for use with a particular population  Should have detailed instructions explaining how and when it should be administered and scored and how to interpret scores (a test protocol).  It should also present the results of investigations to evaluate the measure’s psychometric properties  Details of any investigations of reliability and validity should also be given.  The conditions under which standardised tests are administered have to be exactly the same

WHAT IS AN UN-STANDARDISED ASSESSMENT?  Are assessments that provide the therapist with information but have no precise comparison to a norm or a criterion.  Some un-standardized tests are structured assessments that are constructed and organized to provide guidelines for the content and process of the assessment, but their psychometric properties have not been researched and documented

STANDARDISATION  A process of taking an assessment and:  Developing a fixed protocol for its administration and scoring  And conducting psychometric studies to evaluate whether the resultant assessment has acceptable levels of validity and reliability.  There are two ways in which assessments can be standardized: 1. In terms of procedures, materials and scoring 2. In terms of normative standardization

To standardize the test materials:  The exact test materials should be listed or included (precise details of how to construct the test -with exact sizes,colours, fabric of materials etc.- are required)  OR, provide materials as part of a standardized test battery

To standardize the method of administration:  The test conditions should be described in detail  The number of people tested at any one time should be specified (individual or group)  Information about time required for administration should be given  Detailed written instructions for the therapist should be provided  Exact wording for any instructions to be given to the test taker are required

Example:

To standardize the scoring system:  Clear guidelines need to be provided for scoring  Scoring methods vary from test to test and may include the use of raw scores that may then be converted to another type of score  Information should be included to guide the therapist in the interpretation of scores  Therapists should be aware that a number of factors may influence the person’s performance on the test, including test anxiety, fatigue, interruptions and distractions

CONSTRUCTING A STANDARDISED TEST  A 10-step test construction process: 1. Identify the primary purpose(s) for which test scores will be used. 2. Identify behaviours that represent the construct or define the domain. 3. Prepare a set of test specifications, delineating the proportion of items that should focus on each type of behaviour identified in step two. 4. Construct an initial pool of items. 5. Have items reviewed (and revise as necessary). 6. Hold preliminary tryouts (and revise as necessary). 7. Field-test the items on a large sample representative of the examinee population for whom the test is intended. 8. Determine statistical properties of item scores and, when appropriate, eliminate items that do not meet pre-established criteria. 9. Design and construct reliability and validity studies for the final form of the test. 10. Develop guidelines for test administration, scoring and for the interpretation of test scores (e.g. prepare norm tables, suggest recommended cutting scores or standards for performance etc.).

CRITERION-REFERENCED TESTS  Is a test that examines performance against pre- defined criteria and produces raw scores that are intended to have a direct, interpretable meaning  The person’s score on this type of test is interpreted by comparing the score with a pre-specified standard of performance or against specific content and/or skills  Therapists use this type of test to judge whether the person has mastered the required standard (for example to judge whether the person has sufficient independence to be safely discharged from hospital to live alone at home)  The person’s performance is NOT compared to the ability of other people

CRITERION-REFERENCED TESTS  Criterion-referenced tests usually have one of two main purposes, which are:  Estimation of the domain score (i.e. the proportion of items in the domain that the subject can pass correctly)  Mastery allocation (The observed test results are used to classify people into the mastery categories).  Criterion-referenced tests are important because therapists are concerned with desired outcomes  In a therapy assessment a score representing master might be labelled able or independent or observed,  while a score representing non-master might be labelled unable or dependent or not observed.

NORM-REFERENCED TESTS  A person’s performance can be considered in the light of expected, or ‘normal’, performance by comparing it to the range of scores obtained by a representative group of people  Norms is defined as ‘sets of scores from clearly defined samples’  Norms indicate the average performance on the test and the varying degrees of performance above and below average  Norms should be based on a large, representative sample of people for whom the test was designed  The normative sample should reflect the attributes of the client population

CRITERIA FOR JUDGING NORMS  Norms should be relevant to your area of practice  The normative sample should be representative of your client group  Norms should be up to date  Comparable (If you want to compare scores on tests with different norm groups)  Norms should be adequately described

Any Questions? Assessment in OT