1 Automatic Essay Scoring is Here and Now Online Welcome to CIT S234 Gary Greer University of Houston Downtown & Michelle Overstreet The College Board.

Slides:



Advertisements
Similar presentations
Project VIABLE: Behavioral Specificity and Wording Impact on DBR Accuracy Teresa J. LeBel 1, Amy M. Briesch 1, Stephen P. Kilgus 1, T. Chris Riley-Tillman.
Advertisements

Comparing Two Means: One-sample & Paired-sample t-tests Lesson 12.
Chapter Eight & Chapter Nine
A Placement Validity Study for Freshman Composition and College Algebra Gary Greer University of Houston Downtown Annual Conference on.
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
INFERENTIAL STATISTICS. Descriptive statistics is used simply to describe what's going on in the data. Inferential statistics helps us reach conclusions.
Copyright © Allyn & Bacon (2007) Data and the Nature of Measurement Graziano and Raulin Research Methods: Chapter 4 This multimedia product and its contents.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
statistics NONPARAMETRIC TEST
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
Statistical Inference About Means and Proportions With Two Populations
QUANTITATIVE DATA ANALYSIS
Chapter 10b Hypothesis Tests About the Difference Between the Means of Two Populations: Independent Samples, Small-Sample CaseHypothesis Tests About the.
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
Chapter 19 Data Analysis Overview
EXPERIMENTAL DESIGN Random assignment Who gets assigned to what? How does it work What are limits to its efficacy?
STATISTICS. DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS.
Inferential Statistics
Introduction.  Classification based on function role in classroom instruction  Placement assessment: administered at the beginning of instruction 
Automated Essay Evaluation Martin Angert Rachel Drossman.
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
CHAPTER 4 Research in Psychology: Methods & Design
Foundations of Educational Measurement
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
1 1 Slide Slides by John Loucks St. Edward’s University.
1 1 Slide © 2005 Thomson/South-Western Chapter 10 Statistical Inference About Means and Proportions With Two Populations n Inferences About the Difference.
1 1 Slide © 2005 Thomson/South-Western AK/ECON 3480 M & N WINTER 2006 n Power Point Presentation n Professor Ying Kong School of Analytic Studies and Information.
Chapter 14 Nonparametric Statistics. 2 Introduction: Distribution-Free Tests Distribution-free tests – statistical tests that don’t rely on assumptions.
How to Employ ACCUPLACER Scores for Comparison Group Equating Gary Greer University of Houston Downtown NCTA San Diego September, 2011.
EDU 8603 Day 6. What do the following numbers mean?
© Copyright McGraw-Hill CHAPTER 13 Nonparametric Statistics.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Ordinally Scale Variables
Lecture 9 TWO GROUP MEANS TESTS EPSY 640 Texas A&M University.
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.4 Analyzing Dependent Samples.
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Tuesday PM  Presentation of AM results  What are nonparametric tests?  Nonparametric tests for central tendency Mann-Whitney U test (aka Wilcoxon rank-sum.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
T-test for dependent Samples (ak.a., Paired samples t-test, Correlated Groups Design, Within-Subjects Design, Repeated Measures, ……..) Next week: Read.
Copyright © 2011, 2005, 1998, 1993 by Mosby, Inc., an affiliate of Elsevier Inc. Chapter 19: Statistical Analysis for Experimental-Type Research.
Stats/Methods II JEOPARDY. Jeopardy Chi-Square Single-Factor Designs Factorial Designs Ordinal Data Surprise $100 $200$200 $300 $500 $400 $300 $400 $300.
Chapter 6 - Standardized Measurement and Assessment
HL Psychology Internal Assessment
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Nonparametric Statistics.
Chapter 4 Statistical Inference  Estimation -Confidence interval estimation for mean and proportion -Determining sample size  Hypothesis Testing -Test.
Chapter 13 Understanding research results: statistical inference.
Lesson 3 Measurement and Scaling. Case: “What is performance?” brandesign.co.za.
Statistics for Education Research Lecture 4 Tests on Two Means: Types and Paired-Sample T-tests Instructor: Dr. Tung-hsien He
Inferential Statistics Assoc. Prof. Dr. Şehnaz Şahinkarakaş.
Quantitative Methods in the Behavioral Sciences PSY 302
Statistics & Evidence-Based Practice
Inferential Statistics
Statistical Significance
Inferential Statistics
CHAPTER 4 Research in Psychology: Methods & Design
Evaluation of measuring tools: validity
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Inferential Statistics
Happy new year Welcome back.
Inferential statistics,
Statistics and Research Desgin
Introduction to Statistics
What are their purposes? What kinds?
15.1 The Role of Statistics in the Research Process
InferentIal StatIstIcs
Inferential testing.
Presentation transcript:

1 Automatic Essay Scoring is Here and Now Online Welcome to CIT S234 Gary Greer University of Houston Downtown & Michelle Overstreet The College Board Tuesday, Oct 25, 2005 (9:15 AM - 10:15 AM) Coral Tower, Lobby Level Coral Tower, Lobby Level

2 AES, AI, ACCUPLACER/WritePlacer When essays are scored by human experts, the scoring characteristics can be mapped by Artificial Intelligence (AI) and used in Automatic Essay Scoring (AES). AI is used to identify and internalize essay features into scoring models (algorithms). The algorithms are verified in simulation and subsequently on live essays. The algorithms are used by AES to score an essay.

3 Automatic Essay Scoring AI maps salient characteristics of freshman essays (about 300) into a linear model of each score (for example, 6s ; 8s ; 10s, etc.) AES is carried out by mathematically matching live essays to these predetermined linear models to predict a score. AES algorithms specify whether an essay’s characteristics mathematically match the semantic space previously specified by human graders.

4 AES AES therefore emulates human raters by repeatedly evaluating characteristic essay features such as Structure, Content, Style, Syntax, Discourse, and Word choice to predict a maximum likelihood estimate of a score according to the algorithms copied from the 300 human-expert scored essays. AES’s performance has been verified in national level studies and now waits for users to conduct performance tests at local levels. We conducted our local performance study with ACCUPLACER/WritePlacer.

5 WritePlacer employs AI called Intellimetrics WritePlacer infers and internalizes the rubric and pooled judgments of human scorers by analyzing over 300 semantic, syntactic and discourse features in five categories: Focus and Unity Development and Elaboration Organization and Structure Sentence Structure Mechanics and Conventions

6 ACCUPLACER/WritePlacer is Online ACCUPLACER Online offers an option for AES called WritePlacer Plus. Delivery is online, testing time is reduced, reliability is enhanced, and scoring is immediate. At U. of Houston Downtown we asked whether this AES is the same as human- expert scoring. In other words, does this AES differ from human scoring?

7 We Conducted a local Study Research Question 1 What is the correlation between WritePlacer scores and human expert scores? Is it significant? Research Question 2 Do distributions of scores differ? (Are the medians equal?)

8 Our Hypotheses Hypothesis 1 A significant correlation exists between WritePlacer scores and human expert scores. (Ho : correlation = 0) Hypothesis 2 The Median WritePlacer score is equal to the Median human expert score. (Ho: Medians are equal.)

9 Our Method Participants were 112 randomly selected, college freshmen examinee essay takers. Their essays were twice scored : 1st by WritePlacer’s AES and 2nd by human experts. Correlation between scores was obtained. To see whether the median scores differed, a non- parametric test statistic was obtained.

10 Table 1 Frequencies of Differences DifferenceFrequencyPercentWho Scored higher? -244%AES 76%AES 06760%identical 12825%Human 265%Human Total 112 Total 100%

11 Table 2 – Significance Tests Medians Test n Mean Rank Sum of Ranks Wilcoxon Test Statistic AES p>.05 Human Correlation rho =.724 p<.05

12 Discussion of Tables Table 1 indicates that 91% of the paired scores were identical or agreed within 1 point and that 9% differed by 2 points. [The 10 (9%) that differed by 2 points were split 60%-40%: 6 where Human > AES and 4 where AES >Human)]. Table 2 shows inferential statistics supporting a conclusion that AI scoring assigns the same scores to essays as human experts assign to (the same) essays.

13 Findings The correlation between WritePlacer scores and human-expert scores is significant :  r =.72 p<.05. The distributions of WritePlacer scores and human-expert scores are the same):  Wilcoxon W p>.05

14 Conclusions Scoring essays by AES (as implemented within ACCUPLACER/WritePlacer) is consistent with scoring essays by human experts. (Interrater reliability is significant.) AES scoring of essays is not subject to unreliability (inconsistency) due to fatigue. AES never gets tired ! AES scoring is efficient and effective.

15 Additional Issues: 1. Measurement error is eliminated. 2. Essay supplemented by MC items = increased confidence about placement. 3. Efficiency/ faculty freed for instruction. 4. GMAT/MCAT/SAT are adopting AES. 5. Deep Blue learned chess moves.

16 Thank you