Automatic Essay Scoring Evaluation of text coherence for electronic essay scoring systems (E. Miltsakaki and K. Kukich, 2004) Universität des Saarlandes.

Slides:



Advertisements
Similar presentations
The Teacher Work Sample
Advertisements

Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing.
Tree Diagrams 1. Learning Objectives Upon completing this module, you will be able to:  Understand the purpose and use of a Tree Diagram (TD)  Construct,
Why this Research? 1.High School graduates are facing increased need for high degree of literacy, including the capacity to comprehend texts, but comprehension.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
| ERK/ CEFR in Context 23 January 2015, Groningen Estelle Meima Language Centre.
VALIDITY.
CS 4705 Algorithms for Reference Resolution. Anaphora resolution Finding in a text all the referring expressions that have one and the same denotation.
CS 4705 Lecture 21 Algorithms for Reference Resolution.
1 Special Electives of Comp.Linguistics: Processing Anaphoric Expressions Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Centering theory and its direct applications
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
1 Pragmatics: Discourse Analysis J&M’s Chapter 21.
Presentation slide 1.1 Aims of the literacy module – the main features and teaching strategies used during English lessons – the role of the TA in supporting.
14: THE TEACHING OF GRAMMAR  Should grammar be taught?  When? How? Why?  Grammar teaching: Any strategies conducted in order to help learners understand,
Introduction.  Classification based on function role in classroom instruction  Placement assessment: administered at the beginning of instruction 
Automated Essay Evaluation Martin Angert Rachel Drossman.
Becoming a Teacher Ninth Edition
Lecture 3 DESIGN AND PROCEDURE Prepared by: Ms. Mahaya Ahmad.
UNIVIRTUAL FOR INSTRUCTIONAL DESIGN Versione 00 del 29/07/2009.
Introduction to Computer Aided Process Planning
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Symposium 2001June 24, 2001 Curriculum Is Just the Beginning Chris Stephenson University of Waterloo.
All you need to know about The Regents Exam. History of the Regents Test Regents Policy was created in 1972 by the Board of Regents of the University.
What is Readability?  A characteristic of text documents..  “the sum total of all those elements within a given piece of printed material that affect.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Facilitating Peer Assessment Assessment Workshops in Composition.
1 Special Electives of Comp.Linguistics: Processing Anaphoric Expressions Eleni Miltsakaki AUTH Fall 2005-Lecture 3.
The Developmental Reading & English Placement Test
Ways for Improvement of Validity of Qualifications PHARE TVET RO2006/ Training and Advice for Further Development of the TVET.
Relationship between Physics Understanding and Paragraph Coherence Reva Freedman November 15, 2012.
A Comparison of Features for Automatic Readability Assessment Lijun Feng 1 Matt Huenerfauth 1 Martin Jansche 2 No´emie Elhadad 3 1 City University of New.
March 26-28, 2013 SINGAPORE CDIO Asian Regional Meeting and Workshop on Engineering Education and Policies for Regional Leaders Programme Evaluation (CDIO.
Automatic Readability Evaluation Using a Neural Network Vivaek Shivakumar October 29, 2009.
A system for generating teaching initiatives in a computer-aided language learning dialogue Nanda Slabbers University of Twente Netherlands June 9, 2005.
1 Special Electives of Comp.Linguistics: Processing Anaphoric Expressions Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
The “Fast-Food” Essay (Ideas from
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
11/3/14 Do Now: Take out: -Notes and outline -Copies of Dialectical Journals -Gatsby books Homework: Gatsby Literary Analysis Essay due 11/4 by 11:59pm.
1 Special Electives of Comp.Linguistics: Processing Anaphoric Expressions Eleni Miltsakaki AUTH Fall 2005-Lecture 5.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Certificate IV in Project Management Assessment Outline Course Number Qualification Code BSB41507.
Certificate IV in Project Management Assessment Outline Course Number Qualification Code BSB41507.
Investigate Plan Design Create Evaluate (Test it to objective evaluation at each stage of the design cycle) state – describe - explain the problem some.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Automatic Writing Evaluation
Text-To-Speech System for English
Formulate the Research Problem
What to Look for Mathematics Grade 6
What to Look for Mathematics Grade 1
Learning About Language Assessment. Albany: Heinle & Heinle
SPEAKING ASSESSMENT Joko Nurkamto UNS Solo 11/8/2018.
Algorithms for Reference Resolution
SPEAKING ASSESSMENT Joko Nurkamto UNS Solo 12/3/2018.
Measurement What is it and why do it? 2/23/2019
Deputy Commissioner Jeff Wulfson Associate Commissioner Michol Stapel
Assessment Elementary Mathematics
Information Retrieval
Presentation transcript:

Automatic Essay Scoring Evaluation of text coherence for electronic essay scoring systems (E. Miltsakaki and K. Kukich, 2004) Universität des Saarlandes Computational Models of Discourse Summer semester, 2009 Israel Wakwoya May 2009

Automatic Essay Scoring: Intorduction Why automatic essay scoring?  to reduce laborious human effort Software systems do the task fully automatically Computer generated scores match human accuracy  to test theoretical hypothesis in NLP e.g What is the role of Rough-Shifts in Centering Theory?  to explore practical solutions e.g Is it possible to improve the systems’ performance ?

Essay scoring systems: Approaches Length based, Indirect approach  Fourth root of number of words in an essay as an accurate measure(Page,1966)  Surface features -- Features proxies essay length in words number of commas number of prepositions number of uncommon words  Rationale: Using direct measures is a computationally expensive task

Essay scoring systems: Approaches  Two main weaknesses of indirect measures Susceptible to deception, why? Lack explanatory power e.g: difficult to give instructional feed back to students  The need for more direct measures How do human experts evaluate an essay? Writing features ETS’s GMAT writing evaluation criteria Linguistic features

Essay scoring systems: Approaches Intelligent Essay Assessor (IEA)  Employs Latent Semantic Analysis The degree to which vocabulary patterns reflect semantic and linguistic competence Transitivity relations and collocation effects among vocabulary terms Measures semantic relatedness of documents regardless of vocabulary overlap  More closely represents the criteria used by human experts

Essay scoring systems: Approaches Electronic Essay Rater, e-rater  Employs NLP techniques Sentence parsing Discourse structure evaluation Vocabulary assessment, …..  Writing features chosen from criteria defined for GMAT essay evaluation Syntactic variety, argument development, logical organization and clear transitions …… The GMAT test

Electronic Essay Rater, e-rater Research Questions  Coherence features not explicitly represented  Is it possible to enhance e-raters performance by adding coherence features?  What is the role of Rough-shift transitions in Centering Theory?  Is it possible to use Rough-shift transitions as a potential measure for discourse incoherence?

The Centering Model Discourse  Sequence of textual segments  Segments consist of utterances, U i – U n  Forward-looking Center, Cf(U i )  Preferred Center, Cp  Backward-looking Center, Cb

The Centering Model Centering transitions  Four types: Continue, Retain, Smooth-shift, Rough shift  Transition Ordering Rule Continue > Retain > Smooth-Shift > Rough-Shift  Rules for computing transitions

The Centering Model Centering transitions Example  John went to his favorite music store to buy a piano.

The Centering Model Centering transitions Example  John went to his favorite music store to buy a piano. Cb = ?, Cf = John > store > piano, Transition = none  He had frequented the store for many years.

The Centering Model Centering transitions Example  John went to his favorite music store to buy a piano. Cb = ?, Cf = John > store > piano, Transition = none  He had frequented the store for many years. Cb =(He=John), Cf = (He=John) > store, Transition = continue

The Centering Model Cf ranking  Preferred center = the highest ranked member of the Cf set  Ranking by salience status of entities in an utterance  Cf ranking rule M-Subject > M - indirect object > M- direct object > M – QIS, Pro-ARB > S1-subject > S1- indirect object > S1- direct object > S1-other > S1-QIS, Pro-ARB > S2-subject >…

The Centering Model Cf Ranking Example:  John had a terrible headache

The Centering Model Cf Ranking Example:  John had a terrible headache Cb = ?, Cf = John>Headache, Transition = none

The Centering Model Cf Ranking Example:  John had a terrible headache Cb = ?, Cf = John>Headache, Transition = none  When the meeting was over, he rushed to the pharmacy store

The Centering Model Cf Ranking Example:  John had a terrible headache Cb = ?, Cf = John>Headache, Transition = none  When the meeting was over, he rushed to the pharmacy store Cb = John, Cf = John > pharmacy store > meeting, Transition = continue

The Centering Model Cf Ranking  Modifications Pronominal I Penalize the use of I’s, why? Constructions containing verb to be Predicational case  E.g: John is happy/a doctor/ the President Specificational case  E.g: The cause of his illness is this virus here

The Centering Model Cf Ranking  Modifications Pronominal I Penalize the use of I’s, why? Constructions containing verb to be Predicational case  E.g: John is happy/a doctor/ the President Specificational case  E.g: The cause of his illness is this virus here  Another example of an individual who has achieved success in the business world through the use of conventional methods is Oprah Winfrey

The Centering Model Cf Ranking Complex NP’s  Property evoking multiple discourse entities E.g: his mother, software industry Ordering from left to right  Possessive constructions Linearization according to the genitive construction E.g: The secret of TLP’s success  TLP’s success’s secret, the rank from left to right

The role of Rough-Shift transitions Are Rough-shifts valid transitions? Hypothesis: “the incoherence found in students essays is not due to the processing load imposed on the reader to resolve anaphoric references”

The role of Rough-Shift transitions Incoherence due to introducing too many undeveloped topics Rough-shifts measure discourse continuity even when anaphora resolution is not an issue Rough shifts are the result of absent and extremely short-lived Cb’s

Implementation Used corpus of 100 essays randomly selected from pool of GMAT essays The essays cover full range of the scoring scale, where 1 is the lowest and 6 is the highest Applied the Centering algorithm to the corpus and calculated the percentage of Rough-shifts in each essay Run multiple regression to evaluate the contribution of Rough-Shifts to the performance of e-rater

Implementation Manually tagged Co-referring expressions and Preferred Centers Automated Discourse segmentation and the Centering Algorithm The percentage of Rough-Shifts = number of Rough-shifts / the total number of identified transitions

An example of coherent text Yet another company that strives for the “big bucks“ through conventional thinking is Famous name’s Baby Food. This company does not go beyond the norm in their product line, product packaging or advertising. If they opted for an extreme market-place, they would be ousted. Just look who their market is! As new parents, the Famous name customer wants tradition, quality and trust in their product of choice. Famous name knows this and gives it to them by focusing on “all natural“ ingredients, packaging that shows the happiest baby in the world and feel good commercials the exude great family values. Famous name has really stuck to the typical ways of doing things and in return has been awarded with a healthy bottom line.

An example of coherent text

An example of incoherent text

Study Results

Summary Essay scoring systems provide the opportunity to test theoretical hypotheses in NLP Local discourse coherence is a significant contributor to evaluation of essays Centering theory’s Rough-shift transitions capture the source of incoherence in Essays Rough-shifts reflect the incoherence perceived when identifying the topic of a discourse structure Rough-shift based metric improves performance, provides capability of instructional feedback

References E. Miltsakaki and K. Kukich: The Role of Centering Theory's Rough-Shift in the Teaching and Evaluation of Writing Skills. In: Proceedings of ACL 2000 E. Miltsakaki and K. Kukich: Evaluation of text coherence for electronic essay scoring systems, In: Natural Language Engineering 10:1, 2004 Hearst, M., Kukich, K., Hirschman, L., Breck, E., Light, M., Burge,J., Ferro, L., Landauer, T. K., Laham, D., and Foltz, P. W., The Debate on Automated Essay Grading, in IEEE Intelligent Systems (Sept/Oct 2000)

The End! Many thanks!!