BEA 2015 University of Pittsburgh

Slides:



Advertisements
Similar presentations
On-Demand Writing Assessment
Advertisements

Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing.
Got Rubrics?. What is a “Good” Rubric? aka Scoring Guide.
Designing Scoring Rubrics. What is a Rubric? Guidelines by which a product is judged Guidelines by which a product is judged Explain the standards for.
About the ACT Essay 1. The ACT Essay On the ACT Writing Test, students have 30 minutes to read a short prompt and to plan and write an essay in response.
The CAHSEE Essay Ms. Stronks Lawndale High School.
Qualitative Grading Notes compiled by Mary D’Alleva January 18 th, 2005 Office of Faculty Development.
TUSD Scoring Extended Writing Using the PARCC Rubric as Framework September 2014.
English A Language and Literature Preparing for Paper Two What must you be able to do?
Education 3504 Week 3 reliability & validity observation techniques checklists and rubrics.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
What do you think? Why do you think it?
Introduction.  Classification based on function role in classroom instruction  Placement assessment: administered at the beginning of instruction 
THE ESSAY WRITING PROCESS A. Introduction B. Body C. Conclusion.
Automated Essay Evaluation Martin Angert Rachel Drossman.
A Brief Overview to Writing A Comparison/Contrast Essay
Body Paragraphs Writing body paragraphs is always a T.R.E.A.T. T= Transition R= Reason/point from thesis/claim E= Evidence (quote from the text) A= Answer.
Preparing our students for the EAP English Prompt.
The California Writing Exam Grades 4 and 7
Katherine S. Holmes READ 7140 May 28, Georgia Writing Test – 5 th Grade GOAL: To assess the procedures to enhance statewide instruction in language.
What is Readability?  A characteristic of text documents..  “the sum total of all those elements within a given piece of printed material that affect.
TAKS Test CONSTRUCTION. Important WORD TRIPLET What is a triplet? Triplet… three Three reading selections linked by a common theme. Consists of –a literary.
Essay Writing Tips Presented by: Calumet College Student Peer Advisors Date: Thursday, January 27, 2011.
NTeQ: Designing an Integrated Lesson
Organizing ideas and writing the outline
Exam Orientation for Chair tutor: Cao Wen Chair tutor: Cao Wen English for Studying.
How to Evaluate Student Papers Fairly and Consistently.
INTERMEDIATE 2 ENGLISH What you need to do to pass.
Natural Language Processing for Writing Research: From Peer Review to Automated Assessment Diane Litman Senior Scientist, Learning Research & Development.
An unknown author once said, “In literature, evil often triumphs but never conquers.” In other words, bad things may win at first, but never overpowers.
*You have 25 minutes to write the essay. *You will be provided with a short excerpt and will be asked to present your views on the subject.
Content-Area Writing Chapter 10 Writing for Tests and Assessments Darcey Helmick EIWP 2013.
TOEFL iBT Writing Overview SectionContentTimeWordsScore WritingIntegrated: read, listen, and write Independent: knowledge and experience 20 minutes 30.
SAC 1 Informal Discourse Comparative Analysis. Analytical Commentary SAC 1: Analytical Commentary What is it? Linguistic analysis. Articulate your understanding.
ESSAY WRITING Topics, Rules, Issues, Evaluation. Research Essay. Grading: 25%  Approx words (with bibliography)  An independent analysis of any.
THE ACT ASSESSMENT This test is used by colleges to predict students’ success in college courses.
Rubrics for Complex Papers/Projects Academic Assessment Workshop May 14-15, 2009 Bea Babbitt, Ph.D.
DO NOW – 9/17/15 Write a 2-3 sentence response on Cornell Notes: 1) What are the Common Core State Standards, and why are they important to you as a student?
Assessing Writing Presenter: Sandra Brewer Language Arts Instructional Coach Muskogee Public Schools OWP-S. Brewer.
The Rubric. Teachers grade with a rubric for the following reasons: To ensure that there is a standard way of grading So students know why they got their.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
10 th Grade English Wednesday 19 Feb Agenda: 4 Finish The Truman Show 4 Return Dove “Evolution” Analysis/Review How Graded 4 Introduce HSPE and the.
How to Write a Winning Argumentative Essay in 30 Minutes Palmetto Middle School ACTion for 8 th Grade Writing- Timed Persuasive Argumentative.
GCSE English Language 8700 GCSE English Literature 8702 A two year course focused on the development of skills in reading, writing and speaking and listening.
© 2015 The College Board The Redesigned SAT Essay Writing Oakland Schools.
INTERNAL ASSESSMENT ADVICE Or…how to get a 7 on your Internal Assessment.
* Statutory Assessment Tasks and Tests (also includes Teacher Assessment). * Usually taken at the end of Key Stage 1 (at age 7) and at the end of Key.
Standards That Count: Reading, Discussion, Writing, and Presentation.
Test Writing as Genre: How to Apply What the Students Already Know Presented by: Tara Falasco and Kathleen Masone.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Welcome Parents! FCAT Information Session. O Next Generation Sunshine State Standards O Released Test Items O Sample Test.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
25 minutes long Must write in pencil Off topic or illegible score will receive a 0 Essay must reflect your original and individual work.
Charlton Kings Junior School INFORMATION EVENING FOR YEAR 6 PARENTS.
GWE Testing Tips GWE Testing Tips Student Engagement and Academic Success Student Engagement and Academic Success.
Automatic Writing Evaluation
Dr Anie Attan 26 April 2017 Language Academy UTMJB
ENG1120K Review Class.
Professor Diane Litman
9th Grade Literature & Composition
Method Separate subheadings for participants, materials, and procedure (3 marks in total) Participants (1 mark) Include all info provided in the assignment.
English Language GCSE.
What are the SATS tests? The end of KS2 assessments are sometimes informally referred to as ‘SATS’. SATS week across the country begins on 14th May 2018.
You are more than a score.
Deputy Commissioner Jeff Wulfson Associate Commissioner Michol Stapel
EBPS Year 6 SATs evening.
NC Tenth Grade Writing Test
WRITING A BALANCED ARGUEMENT
AP U.S. History Exam Details
Presentation transcript:

BEA 2015 University of Pittsburgh Incorporating Coherence of Topics as a Criterion in Automatic Response-to-Text Assessment of the Organization of Writing Zahra Rahimi Diane Litman Elaine Wang Richard Correnti zar10@pitt.edu dlitman@pitt.edu elw51@pitt.edu rcorrent@pitt.edu BEA 2015 University of Pittsburgh

Goals Automatic scoring of students’ writing Analytical text-based writing Quality of essays in terms of organization 6/4/2015

Outline Goals Response-to-Text Assessment (RTA) Focus of the Study Approach and Model Data Experiments and Results Conclusion and Future Work 6/4/2015

Response-to-Text Assessment (RTA) (Correnti et al., 2013) Analytical text-based writing Making claims Marshalling evidence from a source text to support a viewpoint Evaluated on five-traits rubric. Thinking about the text Skill at finding evidence to support their claims (Rahimi et al. 2014) Organization Style (Mechanics, Usage, Grammar, Spelling) 6/4/2015

Text and Writing Prompt Excerpt from Text (“A Brighter Future” by Hannah Sachs from Time for Kids) : The people of Sauri have made amazing progress in just four years. The Yala Sub-District Hospital has medicine, free of charge, for all of the most common diseases. Water is connected to the hospital, which also has a generator for electricity. Writing Prompt: The author provided one specific example of how the quality of life can be improved by the Millennium Villages Project in Sauri, Kenya. Based on the article, did the author provide a convincing argument that winning the fight against poverty is achievable in our lifetime? Explain why or why not with 3-4 examples from the text to support your answer. 6/4/2015

A Sample High Quality Essay they showed many example in the beginning and showed how it changed at theThis story convinced me that “winning the fight against poverty is achievable because end. One example they sued show a great amount oF change when they stated at first most people thall were ill just stayed in the hospital Not even getting treated either because of the cost or the hospital didnt have it, but at the end it stated they now give free medicine to most common deseases. Anotehr amazing change is in the beginning majority of the childrenw erent going to school because the parents couldn’t affford the school fee, and the kdis didnt like school because tehre was No midday meal, and Not a lot of book, pencils, and paper. Then in 2008 the perceNtage of kids going to school increased a lot because they Now have food to be served aNd they Now have more supplies. So Now theres a better chance of the childreN getting a better life The last example is Now they dont have to worry about their families starving because Now they have more water and fertalizer. They have made some excellent changes in sauri. Those chaNges have saved many lives and I think it will continue to change of course in positive ways Hospitals Hospitals (before) Hospitals (after) Schools Farming 6/4/2015

Outline Goals Response-to-Text Assessment (RTA) Focus of the Study Approach and Model Data Experiments and Results Conclusion and Future Work 6/4/2015

Focus of the Study Develop a task-dependent model that is consistent with the rubric criteria Ability to provide feedback that is better aligned with the task Organization as conceived by the RTA How well the pieces of evidence are organized to make a strong argument Coherence around the ordering of topics related to pieces of evidence. Assessment of coherence (Foltz et al., 1998; Higgins et al., 2004; Burstein et al., 2010; Somasundaran et al.,2014) Evaluate the writing of younger students in grades 5 through 8 In previous studies, assessments of text coherence have been task-independent, which means that these models are designed to be able to evaluate the coherence of the response to any writing task. Taskindependence is often the goal for automated scoring systems, but it is also important to measure the quality of students’ organization skills when they are responding to a task-dependent prompt. One advantage of task-dependent scores is the ability to provide feedback that is better aligned with the task. 6/4/2015

Lexical Cohesion is Insufficient Assess coherence using: Entity grids (Burstein et al., 2010) and lexical chains (Somasundaran et al., 2014) Repetition of identical or similar words according to external similarity sources Interested in evaluating the coherence around pieces of evidence, not just the lexical cohesion Hypothesis: existing models are not as well on short and noisy essays as on longer and better written essays “The hospitals were in bad situation. There was no electricity or water.” There would be no transition between these two sentences 6/4/2015

Outline Goals Response-to-Text Assessment (RTA) Focus of the Study Approach and Model Data Experiments and Results Conclusion and Future Work 6/4/2015

How to Model Coherence around Topics and Pieces of Evidence? By: Topic Grid and Topic Chains 6/4/2015

Example Topic and Pieces of Evidence Excerpt from the text The people of Sauri have made amazing progress in just four years. The Yala Sub-District Hospital has medicine, free of charge, for all of the most common diseases. Water is connected to the hospital, which also has a generator for electricity Pieces of evidence around the topic “hospitals” for the state “after” Yala sub-district hospital medicine medicine free charge water connected hospital hospital generator electricity Medicine common diseases Experts of the RTA manually extract the exhaustive list of topics discussed in the article Contrasting text and prompt: The conditions in a Kenyan village before and after the United Nations-intervention Extract topics that provided evidence for the “before” and “after” states 7 different topics; 4 of them have before and after states, resulting in 11 sub-topics in total 6/4/2015

Topic Grid One example they sued show a great amount oF change when they stated at first most people thall were ill just stayed in the hospital Not even getting treated either because of the cost or the hospital didnt have it, but at the end it stated they now give free medicine to most common deseases. 1 2 3 4 5 6 7 8 9 10 Hospitals.before - x - - - - - - - - Hospitals.after - - x - - - - - - - Education.before - - - x - - - - - - Education.after - - - - x x - - - - Farming.before - - - - - - x - - - Farming.after - - - - - - - x - - General x - - - - - - - x x Yala sub-district hospital medicine medicine free charge water connected hospital hospital generator electricity Medicine common diseases A sentence or a sub-sentence Long sentences can include more than one topic and Each text unit is automatically labeled with topics using a simple window-based algorithm Which relies on the presence and absence of topic-words in a sliding window and chooses the most similar topic to the window. (Several equally similar topics might be chosen). 6/4/2015

Topic Chain One chain for each topic 1 2 3 4 5 6 7 8 9 10 Hospitals.before - x - - - - - - - - Hospitals.after - - x - - - - - - - Education.before - - - x - - - - - - Education.after - - - - x x - - - - Farming.before - - - - - - x - - - Farming.after - - - - - - - x - - General x - - - - - - - x x One chain for each topic Each node carries two pieces of information: The index of the text unit it appears in Whether it is a before or after state Topic Chain Hospitals (b,2),(a,3) Education (b,4),(a,5),(a,6) Farming (b,7),(a,8) 6/4/2015

Features Surface Discourse structure Goal: design a small set of rubric-based features that performs acceptably and also models what is actually important in the rubric. Surface Discourse structure Local coherence and paragraph transitions Topic development Topic ordering and patterns The use of before and after information to extract features is based on the rubric and the na- ture of the prompt, and it can be generalized to other contrasting prompts. 6/4/2015

Features (Based on Literature) Surface Number of paragraphs Average sentence length Discourse structure HasBeginning HasEnding (based on general statements from the text and the prompt) LSA-similarity of 1 to 3 sentences from the beginning and ending of the essay with respect to the length of the essay. Local coherence and paragraph transitions The average LSA (Foltz et al., 1998) similarity of adjacent sentences Average LSA similarity of all paragraphs (Foltz et al., 1998) For one paragraph essays, we divide the essays into 3 equal parts and calculate the similarity of 3 parts. 6/4/2015

Topic-Based Features (Based on Literature) Average number of nodes in chains Max distance between chain’s nodes Sum of the distances between each pair of adjacent nodes Average number of nodes in chains divided by average chain length Number of topics covered in the essay divided by the length of the essay Count and percentage of discourse markers from each of the four groups adjacent to a topic 6/4/2015

Topic-Based Features (New) Number and percentage of chains which discusses both aspects (‘before’ and ‘after’) of that topic. Before-only, After-only Number of chains starting and ending inside another chain Levenshtein edit-distance 6/4/2015

Outline Goals Response-to-Text Assessment (RTA) Focus of the Study Approach and Model Data Experiments and Results Conclusion and Future Work 6/4/2015

Data (Correnti et al. 13) First dataset: Grades 5-6 Second dataset: Grades 6-8 Number of essays 1580 812 Number of doubly scored essays Around 600 802 Avg number of words 161.25 207.99 Avg number of unique words 93.27 113.14 Quadratic weighted kappa 0.68 0.69 6/4/2015

Distribution of Organization Scores Short, many spelling and grammatical errors, and not well-organized Score on a scale of 1-4 Dataset/score 1 2 3 4 5–6 grades 398 (25%) 714 (46%) 353 (22%) 115 (7%) 6–8 grades 128 (16%) 316 (39%) 246 (30%) 122 (15%) 6/4/2015

Outline Goals Response-to-Text Assessment (RTA) Focus of the Study Approach and Model Data Experiments and Results Conclusion and Future Work 6/4/2015

Does our rubric-based model perform better than the baselines? On grades (6-8): no significant difference between the rubric-based model and the baselines On grades (5-6): significantly higher performance than either baseline or the combination Model grades (5–6) grades (6-8) 1 EntityGridTT (Burstein et al. 2010) 0.42 0.49 2 LEX1 (Somasundaran et al. 2014) 0.45 0.53 3 EntityGridTT+LEX1 0.46 0.54 4 Rubric-based 0.51 5 EntityGridTT+Rubric-based 6 LEX1+Rubric-based 0.55 7 EntityGridTT+LEX1+Rubric-based 0.50 0.56 Baselines 6/4/2015 Quadratic Weighted Kappa

Is the new model generalizable across different grades? For both experiments: the rubric-based model performs at least as well as the baselines. Test on the shorter and noisier set of (5-6): the rubric-based model performs significantly better than the baselines. Model Train on(5–6) Test on (6-8) Train on(6-8) Test on (5-6) 1 EntityGridTT (Burstein et al. 2010) 0.51 0.43 2 LEX1 (Somasundaran et al. 2014) 0.41 3 EntityGridTT+LEX1 0.52 0.42 4 Rubric-based 0.56 0.47 5 EntityGridTT+LEX1+Rubric-based 0.58 0.45 Baselines 6/4/2015 Quadratic Weighted Kappa

How important are the topic-based features? we believe that the topic-based features are more substantive and potentially provide more useful information for students and teachers. Improve performance by enhancing the simple topic alignment of the sentences. Model (5-6) cross val (6-8) Train on(5–6) Test on (6-8) Train on(6-8) Test on (5-6) EntityGridTT+LEX1 0.46 0.54 0.52 0.42 3 Topic-Based 0.45 0.40 4 Surface 0.32 0.35 5 LocalCoherence+ParagraphTransition 0.20 0.21 0.23 0.18 6 DiscourseStrucutre 0.25 0.19 0.26 0.22 Baseline 6/4/2015 Quadratic Weighted Kappa

Outline Goals Response-to-Text Assessment (RTA) Focus of the Study Approach and Model Data Experiments and Results Conclusion and Future Work 6/4/2015

Conclusion We present the results for predicting the score of the Organization dimension of a response-to-text assessment. New task-dependent rubric-based model performs as well as either baseline on both datasets. On the shorter and noisier essays, the rubric-based model based on coarse- grained topic information outperforms state-of-the-art models based on syntactic and lexical information. In general, the rubric-based features can add value to the baselines. 6/4/2015

Future Work Use a more sophisticated method to annotate text units Test the generalizability of our model by using other texts and prompts from other response-to-text writing tasks Extract topics and words automatically, as our current approach requires these to be manually defined by experts Although this task needs to be only done once for each new text and prompt 6/4/2015

Thank you! 6/4/2015

Levenshtein Edit-Distance Edit-distance of the topic vector representations for “befores” and “afters” normalized by the number of topics in the essay Good organization of topics Cover both the before and the after examples on each discussed topic Come in a similar order The greater the value, the worse the pattern of discussed topics befores=[3,4,4,5] , afters=[3,6,5] befores=[3,4,5] , afters=[3,6,5] The normalized Levensthein = 1/4 6/4/2015

Can the lexical chaining baseline be improved with the use of topic information from the source document? Lexical chaining uses both external sources to measure semantic similarity and also our list of topics extracted from the source text Model grades (5–6) grades (6-8) 1 LEX1 0.45 0.53 2 LEX1+Topic 0.48 0.54 6/4/2015

Surface > TopicOrdering > LocalCoherence+ParagraphTransitions > DiscourseStructure > TopicDevelopment 6/4/2015

Related work on measuring coherence in student essays Vector-based similarity methods measure lexical relatedness between text segments (Foltz et al., 1998) Between discourse segments (Higgins et al., 2004) Centering theory (Grosz et al., 1995) addresses local coherence (Miltsakaki and Kukich, 2000) Entity-based essay representation (Burstein et al., 2010) Lexical chaining addresses (Somasundaran et al.,2014) Discourse structure is used to measure the organization of argumentative writing (Cohen, 1987; Burstein et al., 1998; Burstein et al., 2003) 6/4/2015

Lexical Cohesion Lexical chains (Somasundaran et al., 2014) and entity grids (Burstein et al., 2010) The continuity of lexical meaning Lexical chains are sequences of related words characterized by the relation between them, as well as by their distance and density within a given span. Entity grids capture how the same word appears in a syntactic role (Subject, Object, Other) across adjacent sentences. 6/4/2015