TESTS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.

TESTS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON

STRUCTURE OF THE CHAPTER What are we testing? Parametric and non-parametric tests Norm-referenced, criterion-referenced and domain-referenced tests Commercially produced tests and researcher- produced tests Constructing a test Software for preparation of a test Devising a pre-test and post-test Ethical issues in testing Computerized adaptive testing

INITIAL CONSIDERATIONS What are we testing (e.g. achievement, aptitude, attitude, personality, intelligence, social adjustment etc.)? Are we dealing with parametric or non-parametric tests? Are they norm-referenced or criterion-referenced? Are they available commercially for researchers to use or will researchers have to develop home produced tests? Do the test scores derive from a pre-test and post-test in the experimental method? Are they group or individual tests? Do they involve self-reporting or are they administered tests?

WHAT ARE WE TESTING? Handbook of Psychoeducational Assessment Handbook of Psychological and Educational Assessment of Children: Intelligence, Aptitude and Achievement The Eighteenth Mental Measurements Yearbook Tests in Print VII

PARAMETRIC AND NON- PARAMETRIC TESTS Parametric tests: –assume that there is a normal curve of distribution of scores in the population –assume that there are continuous and equal intervals between the test scores, and, with tests that have a true zero –use standardized scores Non-parametric tests: –make few or no assumptions about the distribution of the population or the characteristics of that population –are useful for small samples

NORM-REFERENCED TESTS Norm-referenced tests: –compare students’ attainments relative to other students’ attainments –Are usually standardized to the curve of distribution –provide the researcher with information on how well one student has achieved in comparison to another, enabling rank orderings of performance and achievement to be constructed

CRITERION- REFERENCED TESTS Criterion-referenced tests: –are not based on, or intended to, compare student with student but require the student to fulfil a given set of criteria, a predefined and absolute standard or outcome –provides the researcher with information about exactly what a student has learned and can do –A driving test is an example of a criterion- referenced test: if the candidate meets the requirements then s/he passes the test, regardless of, and without reference to, other candidates (i.e. s/he is not being compared to other candidates)

DOMAIN- REFERENCED TESTS Domain-referenced tests: –The domain to be assessed is specified clearly. –A domain is the particular field or area of the subject that is being tested (e.g. light in science). –The domain is set out in depth and breadth. Test items are then selected from this full domain, with careful attention to sampling to ensure representativeness of the wider field in the test items. –The student’s achievements on the test are computed to yield a proportion of the maximum score possible. This is used as an index of the proportion of the overall domain that s/he has grasped. –Inferences are being made from a limited number of items to the student’s achievements in the whole domain.

COMMERCIALLY PRODUCED TESTS Are objective Have been piloted and refined Have been standardized across a named population Declare how reliable and valid they are Tend to be parametric Include instructions for administration Are straightforward and quick to administer and mark Guides to the interpretation of the data are usually included in the manual Save researchers the task of having to devise, pilot and refine their own test

COMMERCIALLY PRODUCED TESTS Are expensive Are often targeted to special, rather than to general populations May not be exactly suited to the researcher’s specific purposes May be culturally/linguistically biased May have restricted release or availability

RESEARCHER PRODUCED TESTS Are cheap Are targeted to the population/sample in hand Fit the local context and situation Fit the researcher’s specific purposes (fitness for purpose) May be culturally/linguistically biased May have restricted release or availability

RESEARCHER PRODUCED TESTS Are time-consuming to devise, pilot, refine and administer Are unstandardized May require extensive procedures for validation and reliability testing Often yield non-parametric data Have limited generalizability

CONSTRUCTING A TEST Step 1: Consider the basis of the test (classical test theory/item response theory) Step 2: Consider the purposes of the test Step 3: Consider the type of test Step 4: Consider the objectives of the test Step 5: Write the test specifications, items and content

CONSTRUCTING A TEST Step 6: Construct the test, involving item analysis, item discriminability, item difficulty and distractors Step 7: Plan the format, layout, form and timing of the test Step 8: Pilot the test Step 9: Address validity and reliability Step 10: Devise the manual of instructions for the administration, scoring, marking, weighting and data treatment of the test

CONSTRUCTING A TEST Address classical test theory or item response theory Classical test theory: –assumes that there is a ‘true score’, which is the score which an individual would obtain on that test if the measurement was made without error and the individual test taker would obtain on that same test if s/he took it on an infinite number of occasions.

CONSTRUCTING A TEST Item response theory assumes that: It is possible to measure single, specific traits, abilities, attributes that, themselves, are not observable It is possible to identify objective levels of difficulty of an item It is possible to devise items that discriminate between individuals An item can be described independently of any particular sample of people responding to it A testee’s proficiency can be described in terms of his/her achievement of an item of a known difficulty level

CONSTRUCTING A TEST Item response theory assumes that: Traits are unidimensional and that single traits are specifiable A set of items can measure a common trait or ability A testee’s response to any one test item will not affect his /her response to another test item The probability of the correct response to an item does not depend on the number of testees who might be at the same level of ability

SOFTWARE FOR PREPARING A TEST Software and online testing can remove some of the burden of layout, marking, data entry and analysis, as these can be done automatically Optical mark scanners can read in marks from hard copy into a computer file

DEVISING A PRE-TEST AND POST- TEST The pre-test may have questions which differ in form or wording from the post-test, though the two tests must test the same content, i.e. ‘alternate forms’ of a test In an experiment the pre-test and post-test must be the same for the control and experimental groups. Care must be taken in the construction of a post-test to avoid making the test easier to complete by one group than another. The level of difficulty must be the same in both tests.

ETHICAL ISSUES IN TESTING How ethical are these? Ensuring coverage of the objectives and program that will be tested; Restricting the coverage of the program content and objectives to those only that will be tested; Preparing students with ‘exam technique’; Practice with past/similar papers; Directly matching the teaching to specific test items, where each piece of teaching and contents is the same as each test item;

ETHICAL ISSUES IN TESTING How ethical are these? Practice on an exactly parallel form of the test; telling students in advance what will appear on the test; Practice on, and preparation of, the identical test itself without teacher input; Practice on, and preparation of, the identical test itself with teacher input, maybe providing sample answers. Inflating or adjusting marks.

ETHICAL ISSUES IN TESTING Tests must be valid and reliable The administration, marking and use of the test should only be undertaken by suitably competent/qualified people Access to test materials should be controlled Tests should benefit the testee (beneficence) Clear marking and grading protocols should operate Test results must be reported in a way that cannot be misinterpreted

ETHICAL ISSUES IN TESTING The privacy and dignity of individuals should be respected Individuals should not be harmed by the test or its results (non-maleficence) Informed consent to participate in the test should be sought

COMPUTERIZED ADAPTIVE TESTING Which particular test items to administer is based on the subjects’ responses to previous items, i.e. it adapts the test to the student’s performance on prior items: if an item is too hard then the next item could adapt to this and be easier, and if a testee was successful on an item the next item could be harder. Avoids the problem of tests being too easy or too difficult. The first item is pitched in the middle of the assumed ability range; if the testee answers it correctly then it is followed by a more difficult item, and if the testee answers it incorrectly then it is followed by an easier item.

COMPUTERIZED ADAPTIVE TESTING The test is scored instantly. Requires a large item pool for each area of content domain to be developed, with sufficient numbers, variety and spread of difficulty. All items must measure a single aptitude or dimension. Items must be independent of each other, i.e. a person’s response to an item should not depend on that person’s response to another item.

DEVISING A PRE-TEST AND POST- TEST Software and online testing can remove some of the burden of layout, marking, data entry and analysis, as these can be done automatically Optical mark scanners can read in marks from hard copy into a computer file

TESTS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.

Similar presentations

Presentation on theme: "TESTS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

TESTS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.

Similar presentations

Presentation on theme: "TESTS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON."— Presentation transcript:

Similar presentations

About project

Feedback