Using statistics to evaluate your test Gerard Seinhorst

Slides:



Advertisements
Similar presentations
Psychology: A Modular Approach to Mind and Behavior, Tenth Edition, Dennis Coon Appendix Appendix: Behavioral Statistics.
Advertisements

Table of Contents Exit Appendix Behavioral Statistics.
Descriptive (Univariate) Statistics Percentages (frequencies) Ratios and Rates Measures of Central Tendency Measures of Variability Descriptive statistics.
QUANTITATIVE DATA ANALYSIS
Edpsy 511 Homework 1: Due 2/6.
Measures of Central Tendency
Statistics Used In Special Education
Statistics. Question Tell whether the following statement is true or false: Nominal measurement is the ranking of objects based on their relative standing.
Data Handbook Chapter 4 & 5. Data A series of readings that represents a natural population parameter A series of readings that represents a natural population.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Statistics Recording the results from our studies.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Trying an Experiment BATs Conduct a memory experiment with real participants in a professional and ethical way Collect data Have you got all your materials.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
QUANTITATIVE RESEARCH AND BASIC STATISTICS. TODAYS AGENDA Progress, challenges and support needed Response to TAP Check-in, Warm-up responses and TAP.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Measures of Central Tendency And Spread Understand the terms mean, median, mode, range, standard deviation.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Experimental Research Methods in Language Learning Chapter 9 Descriptive Statistics.
Measures of Central Tendency Foundations of Algebra.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Data Analysis.
Chapter 6: Analyzing and Interpreting Quantitative Data
RESEARCH & DATA ANALYSIS
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
Quality Control: Analysis Of Data Pawan Angra MS Division of Laboratory Systems Public Health Practice Program Office Centers for Disease Control and.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
Chapter 7: The Distribution of Sample Means
CCGPS Coordinate Algebra Unit 4: Describing Data.
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Psychometrics: Exam Analysis David Hope
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
AP PSYCHOLOGY: UNIT I Introductory Psychology: Statistical Analysis The use of mathematics to organize, summarize and interpret numerical data.
STATS DAY First a few review questions. Which of the following correlation coefficients would a statistician know, at first glance, is a mistake? A. 0.0.
Outline Sampling Measurement Descriptive Statistics:
A QUANTITATIVE RESEARCH PROJECT -
Descriptive Statistics ( )
Experimental Research
Different Types of Data
Basic Statistics Module 6 Activity 4.
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Research Methods in Psychology PSY 311
Basic Statistics Module 6 Activity 4.
How Psychologists Ask and Answer Questions Statistics Unit 2 – pg
PCB 3043L - General Ecology Data Analysis.
Statistics.
Descriptive Statistics I REVIEW
Measures of Central Tendency
Analyzing and Interpreting Quantitative Data
Central Tendency and Variability
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Statistical Reasoning in Everyday Life
Measures of Central Tendency and Dispersion
STATS DAY First a few review questions.
Numerical Descriptive Measures
Research Statistics Objective: Students will acquire knowledge related to research Statistics in order to identify how they are used to develop research.
Module 8 Statistical Reasoning in Everyday Life
Descriptive and inferential statistics. Confidence interval
Summary descriptive statistics: means and standard deviations:
Sampling Distributions
Basic Statistics for Non-Mathematicians: What do statistics tell us
Numerical Descriptive Statistics
Analyzing test data using Excel Gerard Seinhorst
Ms. Saint-Paul A.P. Psychology
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
Presentation transcript:

Using statistics to evaluate your test Gerard Seinhorst STANAG 6001 Testing Workshop 2018 Workshop C2 Kranjska Gora, Slovenia

WORKSHOP OBJECTIVES Understand how to describe and analyze test data, and draw conclusions from it Learn how to calculate and interpret the B-Index Learn how to create a test summary report from raw test data Understand how quantitative data analysis can support your claims about the test

B-INDEX The B-Index is an item statistic that indicate the degree to which the Masters (those who passed, e.g., a Level 3 test) outperformed the Non-Masters (test takers who failed the Level 3 test) on each item. Calculation of the B-Index: Determine what the cut score for passing the test is; e.g. 70% Split the scores in a group of Masters (at least 70% correct on the test) and Non-Masters. For each item, subtract the FV for the Non-Masters from the FV for the Masters. Interpretation of the B-Index is similar to that for the DI.

SMALL-GROUP WORK Work in small groups (2-4 persons) Each group should have: a handout a flash drive with the data file a laptop with MS Excel (preferably in English!) at least one group member who is familiar with doing calculations in MS Excel The data file is named Workshop C2_Dataset Activity.xlsx and can be found on the flash drive The handout gives some guidance, but ask for help whenever needed Work on the activities until 11.45hrs At 11.45hrs discussion of findings in plenary

Remember… Numbers are like people: torture them enough and they’ll tell you anything. ANONYMOUS

ANALYSIS OF QUANTITATIVE TEST DATA TEST ANALYSIS - Describing/analyzing test results and test population Measures of Central Tendency Measures of Dispersion Reliability estimates ITEM ANALYSIS - Describing/analyzing individual item characteristics Item Difficulty Item Discrimination Distractor Efficiency Descriptive Statistics

TEST ANALYSIS: Describing/Analyzing test results Measures of Central Tendency Gives us an indication of the typical score on a test Answers questions such as: In general, how did the test takers do on the test? Was the test easy or difficult for this group? How many test takers passed the test? Statistics: Mean (average score) Mode (most frequent score) Median (the middle point in a rank-ordered set of scores)

TEST ANALYSIS: Describing/Analyzing test results Measures of Central Tendency When the mean, mode and median are all very similar, we have a “normal distribution” of scores (bell-shaped curve) When they are not similar, the results are ‘skewed’

MEAN MEDIAN MODE Which measure of central tendency should you use? Depends on your data: If there are no extreme scores, use the MEAN {8, 9, 10, 10, 11, 11, 12, 13, 14} If there are extreme scores, use the MEDIAN {2, 9, 10, 10, 11, 12, 12, 12, 13} If your data cannot be rank-ordered (nominal variables, e.g., gender or occupation), or if one score occurs substantially more often than any other score, use the MODE {8, 10, 11, 12, 13, 13, 13, 13, 13} Use the measure that best indicates the ‘typical’ score in your data set

TEST ANALYSIS: Describing/Analyzing test results Measures of Dispersion Gives us an indication of how similar or spread out the scores are Answers questions such as: How much difference is there between the highest and lowest score? How similar were the test takers’ results? Are there any extreme scores (‘outliers’)? Statistics: Range (difference between highest and lowest score) Standard Deviation (average distance of the scores from the mean)

TEST ANALYSIS: Describing/Analyzing test results Standard Deviation (SD, s.d. or ) Small SD: scores are mostly close to the mean Large SD: scores are spread out Example scores of test 1: 48, 49, 50, 51, 52 scores of test 2: 10, 20, 40, 80, 100 MEAN of both tests (250:5) = 50 RANGE test 1 (52 minus 48) = 4 test 2 (100 minus 10) = 90 STANDARD DEVIATION test 1 = 1.58 test 2 = 38.73

VISUALIZING DATA – Bar Chart

VISUALIZING DATA – Box Plot maximum score mean n = 32 25% of the scores 25% of the scores 25% of the scores 25% of the scores median minimum score outlier

ITEM DISCRIMINATION (DI) The degree to which test takers with high overall test scores also got a particular item correct Indicates how well an item distinguishes between high achievers and low achievers Calculation: (FVupperFVlower) FV top group (1/3 of test takers with the highest test scores) minus FV bottom group (1/3 of test takers with the lowest test scores) Ranges from -1.00 to +1.00 Optimal values: .40 and above  very good item .30 - .39  reasonably good item, possibly room for improvement .20- .29  acceptable, but needing improvement <.20  poor item, to be rejected or revised

DISTRACTOR ANALYSIS Distractor Efficiency is the degree to which a distractor worked as intended, i.e., attracting the low achievers, but not the high achievers. The Distractor Efficiency is the number of test takers that selected that particular distractor, divided by the total number of test takers Example: A distractor that is chosen by less than 7% of the test takers (less than 0.07) is normally not functioning well and should be revised. However, bear in mind that the easier the item, the lower the distractor efficiency will be. Item # A * B C D Omitted 14 140 2 12 46 % selected 70% 1% 6% 23% 0%

OPTIMAL VALUES STATISTIC OPTIMAL VALUE Limitation TEST ANALYSIS ITEM ANALYSIS Mean, mode, median N/A * is affected by test taker ability, should be interpreted in relation to max. possible score Range N/A * SD N/A * FV 0.30 - 0.70 Depends on test population, test type/purpose DI > 0.40 Is affected by range of test takers’ ability Indicates only how often a distractor was chosen, not if it was chosen by a high achiever or low achiever Distractor Efficiency ≥ 0.07 * Note: Descriptive statistics do not have an optimal value – they merely describe and summarize test or population characteristics without one value a priori being ‘better’ than another

TEST RELIABILITY (Alpha) Test score reliability is an estimate of the likelihood that scores would remain consistent over time if the same test was administered repeatedly to the same learners. A reliability coefficient of .85 indicates that 85% of the variation in observed scores was due to variation in the “true” scores, and that 15% cannot be accounted for and is called ‘error’ (owing to chance) Reliability coefficients range from .00 to 1.00. Ideal score reliabilities are >.80. Higher reliabilities = less measurement error

STANDARD ERROR of MEASUREMENT (SEM) An obtained test score is an estimate of a person’s “true” test score The “true” score is the score that a test taker would get if s/he took the test infinite times SEM indicates how accurate a test taker’s obtained score is. An obtained score is more accurate if it is closer to a test taker’s “true” score The smaller the SEM, the less error and the greater the precision of the test score As the reliability of a test increases, the SEM decreases A test with a reliability coefficient of 1.00 has a SEM of zero – there is no error

STANDARD ERROR of MEASUREMENT (SEM) In a normal distribution it can be expected that there is a 68% chance that the true score is between 1 SEM below of above the obtained score there is a 95% chance that the true score is between 2 SEMs below or above the obtained score

STANDARD ERROR of MEASUREMENT (SEM) Example obtained score = 70 SEM = 4 (SEMs are expressed in the same units as test scores) there is 68% chance that the test taker’s true score is between 66 and 74 points (70 minus or plus 4 [-/+ 1 SEM] we can be 95% certain that his true score is between 62 and 78 points (70 minus or plus 8 [-/+ 2 SEMs]) If SEM = 2 there is 68% chance that his true score is between 68 and 72 points (70 minus or plus 2 [-/+ 1 SEM]) 70 68 72 74 76 66 64 78 62 - 1 SEM + 1 SEM - 2 SEMs + 2 SEMs 68% 95%

STANDARD ERROR of MEASUREMENT (SEM) The SEM not only indicates how accurate the test is, but can be used to adjust your cut score pass point based on that accuracy. Another example 100 item test (max. obtainable score: 100) Pass point: 70 (70%) Reliability (alpha): 0.69 SEM: 3 Due to the comparatively low reliability, you can be less confident that the pass score truly represents the pass/fail point. There is fair chance that a test taker with an obtained score of 69 might have a “true” score of 70 or 71 Potentially this leads to a higher number of false negatives (Masters who fail) Dropping the pass point 1 SEM would change the passing score to 67 (67%). This will diminish the number of false negatives, but increases the number of false positives.