Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs.

Slides:



Advertisements
Similar presentations
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
Advertisements

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.
QUANTITATIVE DATA ANALYSIS
Chapter 13 Conducting & Reading Research Baumgartner et al Data Analysis.
Topics: Inferential Statistics
Lesson Fourteen Interpreting Scores. Contents Five Questions about Test Scores 1. The general pattern of the set of scores  How do scores run or what.
Reliability and Validity
1 EXPLORING PSYCHOLOGY (7th Edition) David Myers PowerPoint Slides Aneeq Ahmad Henderson State University Worth Publishers, © 2008.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
SHOWTIME! STATISTICAL TOOLS IN EVALUATION DESCRIPTIVE VALUES MEASURES OF VARIABILITY.
Evaluating a Norm-Referenced Test Dr. Julie Esparza Brown SPED 510: Assessment Portland State University.
Measures of Central Tendency
The Data Analysis Plan. The Overall Data Analysis Plan Purpose: To tell a story. To construct a coherent narrative that explains findings, argues against.
Statistics. Question Tell whether the following statement is true or false: Nominal measurement is the ranking of objects based on their relative standing.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 16 Descriptive Statistics.
Correlation.
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
Foundations of Educational Measurement
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Technical Adequacy Session One Part Three.
Statistics Chapter 9. Statistics Statistics, the collection, tabulation, analysis, interpretation, and presentation of numerical data, provide a viable.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Chapter Twelve Census: Population canvass - not really a “sample” Asking the entire population Budget Available: A valid factor – how much can we.
Descriptive Statistics Descriptive Statistics describe a set of data.
Chapter 11 Descriptive Statistics Gay, Mills, and Airasian
Descriptive Statistics
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Descriptive Statistics
Reliability & Validity
Surveys and Attitude Measurement The reason surveys seem to be everywhere is that they are tremendously flexible— you can ask people about anything, and.
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
Correlation and Prediction Error The amount of prediction error is associated with the strength of the correlation between X and Y.
EDU 8603 Day 6. What do the following numbers mean?
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Descriptive Statistics Descriptive Statistics describe a set of data.
UTOPPS—Fall 2004 Teaching Statistics in Psychology.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Research Methods. Measures of Central Tendency You will be familiar with measures of central tendency- averages. Mean Median Mode.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Numerical Measures of Variability
 Two basic types Descriptive  Describes the nature and properties of the data  Helps to organize and summarize information Inferential  Used in testing.
Three Broad Purposes of Quantitative Research 1. Description 2. Theory Testing 3. Theory Generation.
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
LESSON 5 - STATISTICS & RESEARCH STATISTICS – USE OF MATH TO ORGANIZE, SUMMARIZE, AND INTERPRET DATA.
Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Chapter 2 Norms and Reliability. The essential objective of test standardization is to determine the distribution of raw scores in the norm group so that.
Chapter 12 Understanding Research Results: Description and Correlation
Statistical analysis.
Statistical analysis.
Teaching Statistics in Psychology
Reliability & Validity
Science of Psychology AP Psychology
Summary descriptive statistics: means and standard deviations:
Research Statistics Objective: Students will acquire knowledge related to research Statistics in order to identify how they are used to develop research.
Statistical Evaluation
Evaluation of measuring tools: reliability
Using statistics to evaluate your test Gerard Seinhorst
Summary descriptive statistics: means and standard deviations:
Analyzing test data using Excel Gerard Seinhorst
15.1 The Role of Statistics in the Research Process
Presentation transcript:

Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs

What we’re going to do  Review some simple, descriptive statistics.  Discuss the concept of random error.  Identify important item characteristics.  Conduct an item analysis using real data and actual test items.

Measures of central tendency  Mean = the sum of the test scores ÷ the number of test scores (i.e., an average)  Median = the middle score

Individual differences  Range = highest score - lowest score  Standard deviation (roughly!) = the range ÷ 5, or (more precisely) = the square root of the average, squared deviation score 1  Variance = the standard deviation squared 1. A deviation score is an individual’s score minus the group mean score

Score distribution

Score distribution (cont.)  Licensure test scores are generally NOT normally distributed, as shown in the preceding slide.  They are often left skewed (i.e., scores are concentrated on the right-hand side of the distribution).  But it doesn’t matter. We’re going to treat score distributions AS IF they are normal.

Individual differences (variance) and random error  Under classical measurement theory, individual score differences are the result of: 1. true differences in achievement and 2. random error  We are interested in the former and want to minimize the latter.

Estimating the influence of random error on score results  A RELIABLE test generates scores that are reasonably free from the influence of random error (i.e., the test has a high degree of precision).  A reliability coefficient indicates a test’s precision of measurement.  The general index of reliability is KR-20.

KR-20  KR-20 can range in value from 0 (perfectly unreliable) to 1 (perfectly reliable).  KR-20 values for licensure exams should range above.90.  KR-20 values are affected by the number of items on the test and by how strongly the items relate to (or correlate with) one another. Shorter tests are generally LESS reliable than longer ones and anything that restricts test score variance will also reduce the the value of KR-20.

Standard error of measurement (SEM)  SEM offers another means of examining the influence of random error.  It is an estimate of the standard deviation of test scores for any person resulting from repeated administrations of similar [parallel] test forms.  With qualifications, SEM can be used to place confidence intervals around a person’s actual score.

Item difficulty  Item difficulty is estimated by the p-value. It is the percentage of test takers who correctly answer the item.  Item p-values at.50 offer the greatest contribution to test reliability.  p-values are potentially biased on the sample of test takers from which they were calculated.

Item discrimination  Item discrimination describes an item’s ability to differentiate between persons who are knowledgeable about item content from those who are not.  Item discrimination is typically estimated by rpb (point-biserial correlation).  rpb indicates the strength of relationship (correlation) between how individuals answer an item and their score total.

Item discrimination (cont.)  High achievers are expected to answer an item correctly more frequently than low achievers. Consequently, an rpb should be positive.  rpbs above.30 are highly discriminating (and offer the greatest contribution to test reliability).  rpbs, like p-values, are potentially biased on the sample from which they were calculated.

Item omits  Omits indicate the number of persons who failed to respond to an item.  Numerous omits (assuming no correction for guessing) may indicate a problem with the amount of time allotted for the test.  Extensive non-response is a threat to valid score interpretation and use.

Resources for additional help  Haladyna, T. (2004). Developing and Validating Multiple-Choice Test Items.Third Edition. Lawrence Erlbaum Associates, Inc. Publishers. Mahwah, NJ.  Osterlind, S. (1998). Constructing Test Items: Multiple-Choice, Constructed-Response, Performance, and Other Formats. Second Edition. Kluwer Academic Publishers Group. Norwell, MA. or Contact your state land grant university’s college of education, department of psychology, or testing service about performing and interpreting item analysis reports. or try: