A Comparison of Features for Automatic Readability Assessment Lijun Feng 1 Matt Huenerfauth 1 Martin Jansche 2 No´emie Elhadad 3 1 City University of New.

Slides:



Advertisements
Similar presentations
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Advertisements

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Tracking L2 Lexical and Syntactic Development Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Department of Computer.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Ranking Text Documents Based on Conceptual Difficulty using Term Embedding and Sequential Discourse Cohesion Shoaib Jameel, Wai Lam, Xiaojun Qian Department.
An Analysis of Statistical Models and Features for Reading Difficulty Prediction Michael Heilman, Kevyn Collins-Thompson, Maxine Eskenazi Language Technologies.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Readability Assessment for Text Simplification Sandra Aluisio 1, Lucia Specia 2, Caroline Gasperin 1, Carolina Scarton 1 1 University of São Paulo, Brazil.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Mining and Summarizing Customer Reviews
Educator’s Guide Using Instructables With Your Students.
What is Readability?  A characteristic of text documents..  “the sum total of all those elements within a given piece of printed material that affect.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Using Rhetorical Modes to Reinforce Deaf Students’ Writing Skills at Different English Proficiency Levels John Panara NTID English Department.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
Learning to Predict Readability using Diverse Linguistic Features Rohit J. Kate 1 Xiaoqiang Luo 2 Siddharth Patwardhan 2 Martin Franz 2 Radu Florian 2.
Evaluating Statistically Generated Phrases University of Melbourne Department of Computer Science and Software Engineering Raymond Wan and Alistair Moffat.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Statistical Estimation of Word Acquisition with Application to Readability Prediction Proceedings of the 2009 Conference on Empirical Methods in Natural.
1 Chapter 7 ~~~~~ ReadingAssessment. 2 Early Literacy Assessment Oral Language Oral Language Assess receptive and expressive vocabulary Assess receptive.
A Framework of Mathematics Inductive Reasoning Reporter: Lee Chun-Yi Advisor: Chen Ming-Puu Christou, C., & Papageorgiou, E. (2007). A framework of mathematics.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
SOCIO-COGNITIVE APPROACHES TO TESTING AND ASSESSMENT
The Readability of JPIF: ERES Milan 2010 The Readability of Academic Papers in the Journal of Property Investment & Finance (JPIF) Stephen Lee Cass Business.
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
A Language Independent Method for Question Classification COLING 2004.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Noun-Phrase Analysis in Unrestricted Text for Information Retrieval David A. Evans, Chengxiang Zhai Laboratory for Computational Linguistics, CMU 34 th.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Introduction to Content Standards Jacqueline E. Korengel, Ed.D.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Using Semantic Relations to Improve Information Retrieval
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 King Faisal University.
National Assessment Tests 2016 Redhill Primary School.
What is a CAT? What is a CAT?.
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Automated Experiments on Ad Privacy Settings
Statistical NLP: Lecture 3
Erasmus University Rotterdam
Improving a Pipeline Architecture for Shallow Discourse Parsing
THE NATURE of LEARNER LANGUAGE
Presentation transcript:

A Comparison of Features for Automatic Readability Assessment Lijun Feng 1 Matt Huenerfauth 1 Martin Jansche 2 No´emie Elhadad 3 1 City University of New York 2 Google, Inc. 3 Columbia University Coling 2010: Poster Volume, pages 276–284, Beijing, August 2010 報告者:劉憶年 2014/4/21

Outline Introduction Corpus Features Experiments and Discussion Conclusions

Motivation and Method Readability Assessment quantifies the difficulty with which a reader understands a text. Automatic readability assessment enables the selection of appropriate reading material for readers of varying proficiency. We use grade levels, which indicate the number of years of education required to completely understand a text, as a proxy for reading difficulty. We treat readability assessment as a classification task and evaluate trained classifiers in terms of their prediction accuracy.

RelatedWork Many traditional readability metrics are linear models with a few (often two or three) predictor variables based on superficial properties of words, sentences, and documents. These traditional metrics are easy to compute and use, but they are not reliable, as demonstrated by several recent studies in the field. With the advancement of natural language processing tools, a wide range of more complex text properties have been explored at various linguistic levels. In addition to lexical and syntactic features, several researchers started to explore discourse level features and examine their usefulness in predicting text readability.

Corpus We contacted the Weekly Reader1 corporation, an on- line publisher producing magazines for elementary and high school students, and were granted access in October 2008 to an archive of their articles. While pre-processing the texts, we found that many articles, especially those for lower grade levels, consist of only puzzles and quizzes, often in the form of simple multiple-choice questions. We discarded such texts and kept only 1433 full articles.

Entity-Density Features(1/2) Conceptual information is often introduced in a text by entities, which consist of general nouns and named entities, e.g. people’s names, locations, organizations, etc. These are important in text comprehension, because established entities form basic components of concepts and propositions, on which higher level discourse processing is based. We hypothesized that the number of entities introduced in a text relates to the working memory burden on their targeted readers – individuals with intellectual disabilities. We defined entities as a union of named entities and general nouns (nouns and proper nouns) contained in a text, with overlapping general nouns removed.

Entity-Density Features(2/2) We believe entity-density features may also relate to the readability of a text for a general audience.

Lexical Chain Features During reading, a more challenging task with entities is not just to keep track of them, but to resolve the semantic relations among them, so that information can be processed, organized and stored in a structured way for comprehension and later retrieval. The length of a chain is the number of entities contained in the chain, the span of chain is the distance between the index of the first and last entity in a chain. A chain is defined to be active for a word or an entity if this chain passes through its current location.

Coreference Inference Features Relations among concepts and propositions are often not stated explicitly in a text. Automatically resolving implicit discourse relations is a hard problem. The ability to resolve referential relations is important for text comprehension. Inference distance is the difference between the index of the referent and that of its pronominal reference. If the same referent occurs more than once in a chain, the index of the closest occurrence is used when computing the inference distance.

Entity Grid Features Coherent texts are easier to read. Each text is abstracted into a grid that captures the distribution of entity patterns at the level of sentence-to- sentence transitions. The entity grid is a two-dimensional array, with one dimension corresponding to the salient entities in the text, and the other corresponding to each sentence of the text.

Language Modeling Features We use grade levels to divide the whole corpus into four smaller subsets. In addition to implementing Schwarm and Ostendorf’s information-gain approach, we also built LMs based on three other types of text sequences for comparison purposes. These included: word-token-only sequence (i.e., the original text), POS-only sequence, and paired word-POS sequence. For each grade level, we use the SRI Language Modeling Toolkit5 (with Good- Turing discounting and Katz backoff for smoothing) to train 5 language models (1- to 5-gram) using each of the four text sequences, resulting in 4×5×4 = 80 perplexity features for each text tested.

Parsed Syntactic Features Our parsed syntactic features focus on clauses (SBAR), noun phrases (NP), verb phrases (VP) and prepositional phrases (PP).

POS-based Features Part-of-speech-based grammatical features were shown to be useful in readability prediction. To extend prior work, we systematically studied a number of common categories of words and investigated to what extent they are related to a text’s complexity. We focus primarily on five classes of words (nouns, verbs, adjectives, adverbs, and prepositions) and two broad categories (content words, function words). Content words include nouns, verbs, numerals, adjectives, and adverbs; the remaining types are function words.

Shallow Features Shallow features, which are limited to superficial text properties, are computationally much less expensive than syntactic or discourse features.

Other Features For comparison, we replicated 6 out-of-vocabulary features described in Schwarm and Ostendorf (2005). We also replicated the 12 perplexity features implemented by Schwarm and Ostendorf (2005).

Experiments and Discussion In our research, we have used various models, including linear regression; standard classification (Logistic Regression and SVM), which assumes no relation between grade levels; and ordinal regression/ classification (provided by Weka, with Logistic Regression and SMO as base function), which assumes that the grade levels are ordered. In this paper, we present the best results, which are obtained by standard classifiers. We evaluate classification accuracy using repeated 10- fold cross-validation on the Weekly Reader corpus. Classification accuracy is defined as the percentage of texts predicted with correct grade levels. We repeat each experiment 10 times and report the mean accuracy and its standard deviation.

Discourse Features We first discuss the improvement made by extending our earlier entity-density features (Feng et al., 2009). With earlier features only, the model achieves 53.66% accuracy. With our new features added, the model performance is 59.63%. Table 6 presents the classification accuracy of models trained with discourse features.

Language Modeling Features Table 7 compares the performance of models generated using our approach and our replication of Schwarm and Ostendorf’s (2005) approach.

Parsed Syntactic Features(1/2) Table 8 compares a classifier trained on the four parse features of Schwarm and Ostendorf (2005) to a classifier trained on our expanded set of parse features.

Parsed Syntactic Features(2/2) Table 9 shows a detailed comparison of particular parsed syntactic features.

POS-based Features The classification accuracy generated by models trained with various POS features is presented in Table 10.

Shallow Features We present some notable findings on shallow features in Table 11.

Comparison with Previous Studies Using the same experiment design, we train classifiers with three combinations of our features as listed in Table 12.

Conclusions Discourse features do not seem to be very useful in building an accurate readability metric. The reason could lie in the fact that the texts in the corpus we studied exhibit relatively low complexity, since they are aimed at primary-school students. In future work, we plan to investigate whether these discourse features exhibit different discriminative power for texts at higher grade levels.