1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.

Slides:



Advertisements
Similar presentations
Assessing Student Performance
Advertisements

Item Analysis.
Test Development.
Issues of Reliability, Validity and Item Analysis in Classroom Assessment by Professor Stafford A. Griffith Jamaica Teachers Association Education Conference.
Chapter 4 – Reliability Observed Scores and True Scores Error
Item Response Theory in Health Measurement
Designing Scoring Rubrics. What is a Rubric? Guidelines by which a product is judged Guidelines by which a product is judged Explain the standards for.
Models for Measuring. What do the models have in common? They are all cases of a general model. How are people responding? What are your intentions in.
Test Construction Processes 1- Determining the function and the form 2- Planning( Content: table of specification) 3- Preparing( Knowledge and experience)
Item Analysis What makes a question good??? Answer options?
Chapter 8 Developing Written Tests and Surveys Physical Fitness Knowledge.
SURVEY RESEARCH. Topics Appropriate to Survey Research Descriptive Exploratory Explanatory.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
ANALYZING AND USING TEST ITEM DATA
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Why Scale -- 1 Summarising data –Allows description of developing competence Construct validation –Dealing with many items rotated test forms –check how.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Essay Assessment Tasks
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
C M Clarke-Hill1 Collecting Quantitative Data Samples Surveys Pitfalls etc... Research Methods.
Authentic Assessment Principles & Methods
Modern Test Theory Item Response Theory (IRT). Limitations of classical test theory An examinee’s ability is defined in terms of a particular test The.
Technical Adequacy Session One Part Three.
Measurement Construction Psych 818. Focus Majority of measurement in social sciences relies upon responses to a probe item – Constructed response – Checklists.
Classroom Assessments Checklists, Rating Scales, and Rubrics
Classroom Assessment A Practical Guide for Educators by Craig A
Cara Cahalan-Laitusis Operational Data or Experimental Design? A Variety of Approaches to Examining the Validity of Test Accommodations.
CONSTRUCTING OBJECTIVE TEST ITEMS: MULTIPLE-CHOICE FORMS CONSTRUCTING OBJECTIVE TEST ITEMS: MULTIPLE-CHOICE FORMS CHAPTER 8 AMY L. BLACKWELL JUNE 19, 2007.
Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.
Measuring Complex Achievement
Techniques to improve test items and instruction
EDU 385 Education Assessment in the Classroom
Session 2 Traditional Assessments Session 2 Traditional Assessments.
Teaching Today: An Introduction to Education 8th edition
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Assessing the Quality of Research
Learning Objective Chapter 9 The Concept of Measurement and Attitude Scales Copyright © 2000 South-Western College Publishing Co. CHAPTER nine The Concept.
MEASUREMENT: SCALE DEVELOPMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
Classroom Assessment (1) EDU 330: Educational Psychology Daniel Moos.
Assessment and Testing
The ABC’s of Pattern Scoring
Review: Alternative Assessments Alternative/Authentic assessment Real-life setting Performance based Techniques: Observation Individual or Group Projects.
1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Item Response Theory in Health Measurement
Tests and Measurements
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
Chapter 6 - Standardized Measurement and Assessment
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Do not on any account attempt to write on both sides of the paper at once. W.C.Sellar English Author, 20th Century.
Educational Research Chapter 8. Tools of Research Scales and instruments – measure complex characteristics such as intelligence and achievement Scales.
Lesson 2 Main Test Theories: The Classical Test Theory (CTT)
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 7 Assessing and Grading the Students.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
 Good for:  Knowledge level content  Evaluating student understanding of popular misconceptions  Concepts with two logical responses.
Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –
Presentation by: Muhammad Riaz Anjum Nasir Mahmood PRESENTATION WRITING ESSAY TYPE QUESTIONS.
Attitude Scales Measurements
Classroom Assessments Checklists, Rating Scales, and Rubrics
Classroom Assessment A Practical Guide for Educators by Craig A
EDU 385 Session 8 Writing Selection items
Item Analysis: Classical and Beyond
Classroom Assessments Checklists, Rating Scales, and Rubrics
Test Development Test conceptualization Test construction Test tryout
Evaluation of measuring tools: reliability
What Are Rubrics? Rubrics are components of:
Item Analysis: Classical and Beyond
Item Analysis: Classical and Beyond
Tests are given for 4 primary reasons.
Presentation transcript:

1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing test items

2 Item Analysis - Outline 4. Item Analysis A. Distracter measures B. Item difficulty measure P C. Item discrimination measures 5. Item Response Theory A. ICCS B. Adaptive testing

3 1. Types of test items A.Selected response Multiple choice Likert scale Category Q-sort B.Constructed response

4 A. Selected response Multiple choice or forced choice Task is to choose between set answers Advantage: ease of scoring Advantage: scoring requires little skill Disadvantage: may test memory rather than comprehension

5 A. Selected response Multiple choice or forced choice Correct response must be distinct; rest of alternatives are distracters Distracters should not be obvious or ambiguous If distracters are bad, more = less reliable test Use 3-4 distracters per item

6 A. Selected response Multiple choice or forced choice Likert format Test-taker chooses a point on a scale that expresses their attitude or belief Data lend themselves to factor analysis

7 Likert scale example item Parking costs at the university are fair Stronglyagreeneutraldisagreestrongly agreedisagree Parking costs at the university are fair 1234 Stronglyagreedisagreestrongly agreedisagree

8 A. Selected response Multiple choice or forced choice Likert format Category Similar to Likert but with more choices Test-taker’s commitment Reliability depends on good instructions & # of categories (≤ 10) Scoring shows context effects

9 A. Selected response Multiple choice or forced choice Likert format Category Q-sort A large set of cards each with statement referring to a “target” Test-taker sorts cards into piles in terms of how accurate statements are as a description of target Generally 9 piles

10 1. Types of test items A.Selected response B.Constructed response Free response Fill-in-the-blank Essay tests Portfolios In-basket technique

11 B. Constructed response items Free response Test-taker responds without constraint Describes what is important to him/her

12 B. Constructed response items Free response Fill-in-the-blank Used to test for knowledge or to find out about beliefs and attitudes

13 B. Constructed response items Free response Fill-in-the-blank Essay tests Preferred when you want to assess test-taker’s ability to think analytically, integrate ideas, and express himself

14 B. Constructed response items Free response Fill-in-the-blank Essay tests Portfolios Not really a test Collections of things the person being evaluated has produced Let you evaluate things you can’t assess with a selected response test

15 B. Constructed response items Free response Fill-in-the-blank Essay tests Portfolios In-basket technique Used in business Job candidate says how he or she would deal with a set of “everyday” problems Requires expert raters to grade response

16 B. Constructed response items Strengths Assess higher-order skills More useful feedback to test-taker Positive influence on study habits? Easier to create items

17 B. Constructed response items Weaknesses Time consuming to use Possible subjectivity in scoring

18 2. Parts of test items A.Stimulus or item stem What the subject responds to

19 2. Parts of test items A.Stimulus or item stem B.Response format or method Typically multiple choice or constructed response

20 2. Parts of test items A.Stimulus or item stem B.Response format or method C.Conditions governing the response e.g., time limits; allowing probes for ambiguous responses; how response is recorded...

21 2. Parts of test items A.Stimulus or item stem B.Response format or method C.Conditions governing the response D.Procedures for scoring the response particularly important for constructed response items

22 2. Parts of test items To some extent, your choices on each of these parts will be dictated by: Precedent What did you do last time? Experience Did that work? Practical considerations How many people have to be tested? How much time is available?

23 3. Writing test items – guidelines Define clearly Why are you testing? What do you want to know?

24 3. Writing test items – guidelines Define clearly Generate a pool of potential items The larger the pool of items you select from, the better the test Selection from this pool based on item-analysis (see below)

25 3. Writing test items – guidelines Define clearly Generate a pool of potential items Monitor reading level If level is too low, more sophisticated test- takers may get bored If level is too high, you’re testing reading skill as well as domain you think you’re testing

26 3. Writing test items – guidelines Define clearly Generate a pool of potential items Monitor reading level Use unitary items Then the meaning of the response is clear

27 3. Writing test items – guidelines Define clearly Generate a pool of potential items Monitor reading level Use unitary items Avoid long items Longer items are more likely to be mis- interpreted by test- takers Short items are more likely to be unitary

28 3. Writing test items - guidelines Define clearly Generate a pool of potential items Monitor reading level Use unitary items Avoid long items Break any response “set” Use reverse-scored items to prevent test- taker’s from getting into a response set such as just responding “5” for every item on a Likert scale

29 4. Item analysis A.Multiple choice distracter measures B.Item difficulty measure P C.Item discrimination index D D.Item – total correlation

30 A. Multiple choice distracter measures How many people choose each distracter? Distracters should be equally attractive Correct choice should be based on knowledge Those without knowledge should choose randomly

31 B. Item Difficulty Measure P Difficulty determined by item and population tested P(i) = # got item correct # taking test

32 B. Item Difficulty Measure P P =.50 is best P = 0 or P = 1 – such items do not distinguish ability levels

33 C. Discrimination Index D Extreme groups method U = # getting item correct in ‘top’ group L = # getting item correct in ‘bottom’ group n U = # in upper group n L = # in lower group D = U – L n U n L

34 D. Item Total Correlation Good item High correlation People who get item correct have high score on the test People who get item wrong have low score on the test Poor item Low correlation: look at wording – may be testing reading skill

35 5. Item Response Theory (IRT) A.Item characteristic curves B.Problems with IRT C.Adaptive testing using computers

36 A. Item characteristic curves Most important idea: Item Characteristic Curves (ICCs) One curve for each test item X axis: test-taker ability (given by test score) Y axis: probability of choosing an answer

Test Score Probability of correct response Item 1 Item 2 Item 3 % of people writing the test who got item correct Ability Each curve shows how probability of getting item correct changes with ability

38 A. Item Characteristic Curves Slope: how quickly the curve rises. indicates how well item discriminates among persons of differing abilities like P(i) in Classical Test Theory but sample-invariant

39 A. ICCs are sample invariant P(i) = # got item correct # taking test P integrates item and group tested: ICCs separate item difficulty and test-taker ability and present both visually in one graph – that is, you can see item difficulty independently of test-taker ability

40 B. Problems with IRT Obtaining stable estimates of IRT parameters requires rather large samples Computationally complex IRT model assumes that the trait being measured is one- dimensional. It may not be.

41 C. Adaptive Testing Using Computers computer selects harder or easier questions as test-taker gets each question right or wrong lets you tailor questions for each test-taker test-taker does not spend most of their time with questions that are too easy or too difficult

42 C. Adaptive Testing Using Computers Facilitates testing of groups of varying ability Output = level of difficulty test-taker can deal with