An Analysis of Statistical Models and Features for Reading Difficulty Prediction Michael Heilman, Kevyn Collins-Thompson, Maxine Eskenazi Language Technologies.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.
Tracking L2 Lexical and Syntactic Development Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010.
Retrieval of Authentic Documents for Reader- Specific Lexical Practice Jonathan Brown Carnegie Mellon University Language Technologies Institute J. Brown.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Model Assessment, Selection and Averaging
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Readability Assessment for Text Simplification Sandra Aluisio 1, Lucia Specia 2, Caroline Gasperin 1, Carolina Scarton 1 1 University of São Paulo, Brazil.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Evaluation.
Correlation. Two variables: Which test? X Y Contingency analysis t-test Logistic regression Correlation Regression.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Simple Linear Regression and Correlation
Multinomial Logistic Regression Basic Relationships
Chapter 14 Inferential Data Analysis
Simple Linear Regression Analysis
Classification and Prediction: Regression Analysis
Relationships Among Variables
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
What is Readability?  A characteristic of text documents..  “the sum total of all those elements within a given piece of printed material that affect.
MANAGEMENT AND ANALYSIS OF WILDLIFE BIOLOGY DATA Bret A. Collier 1 and T. Wayne Schwertner 2 1 Institute of Renewable Natural Resources, Texas A&M University,
Multinomial Logistic Regression Basic Relationships
A Comparison of Features for Automatic Readability Assessment Lijun Feng 1 Matt Huenerfauth 1 Martin Jansche 2 No´emie Elhadad 3 1 City University of New.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Statistical Estimation of Word Acquisition with Application to Readability Prediction Proceedings of the 2009 Conference on Empirical Methods in Natural.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Examining Relationships in Quantitative Research
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Ensemble Methods: Bagging and Boosting
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Chapter 4: Introduction to Predictive Modeling: Regressions
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Chapter 16 Data Analysis: Testing for Associations.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Appendix I A Refresher on some Statistical Terms and Tests.
Quantitative Methods in the Behavioral Sciences PSY 302
Inference about the slope parameter and correlation
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Notes on Logistic Regression
Correlation and Regression
Introduction to Statistics
LESSON 24: INFERENCES USING REGRESSION
6-1 Introduction To Empirical Models
Retrieval of Authentic Documents for Reader-Specific Lexical Practice
Simple Linear Regression
Statistics II: An Overview of Statistics
15.1 The Role of Statistics in the Research Process
Predicting Loan Defaults
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Introduction to Machine learning
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

An Analysis of Statistical Models and Features for Reading Difficulty Prediction Michael Heilman, Kevyn Collins-Thompson, Maxine Eskenazi Language Technologies Institute Carnegie Mellon University 1

The Goal: To predict the readability of a page of text. 2 Grade 3 Grade 7 Grade 11 …From far out in space, Earth looks like a blue ball… …Like the pioneers who headed west in covered wagons, Mir astronauts have learned to do the best they can with what they have… … All the inner satellites and all the major satellites in the solar system have synchronous rotation and revolution because they are tidally coupled to their planets…

Prior Work on Readability MeasureApprox. Year Lexical FeaturesGrammatical Features Flesch-Kincaid1975Syllables per wordSentence length Lexile (Stenner, et al.) 1988Word frequencySentence length Collins-Thompson & Callan 2004Lexical Unigrams- Schwarm & Ostendorf 2005Lexical n-grams, …Sentence length, distribution of POS, parse tree depth, … Heilman, Collins- Thompson, Callan, & Eskenazi 2007Lexical UnigramsManually defined grammatical constructions (this work)2008Lexical UnigramsAutomatically Defined, Extracted syntactic sub- tree features 3

Outline Introduction Lexical & Grammatical Features Scales of Measurement & Statistical Models Experimental Evaluation Results & Discussion 4

Lexical Features relative frequencies of 5000 most common word unigrams Morphological stemming and stopword removal 5 …The continents look brown, like small islands floating in the huge, blue sea…. island huge float small blue continent brown look sea

Grammatical Features: Syntactic Subtrees 6 ADJP Level 0 Feature S NP VP Level 1 Feature NP TO PP DT JJ NN to Level 2 Feature Includes grammatical function words but not content words.

Grammatical Features Frequencies of the 1000 most common subtrees were selected as features. 7 LevelNumber Selected Example 064PP 1334(VP VB PP) 2461(VP (TO to) (VP VB PP)) 3141(S (VP (TO to) (VP VB PP)))

Extracting Grammatical Feature Values …It was the first day of Spring. Stephanie loved spring. It was her favorite season of the year. It was a beautiful sunny afternoon. The sky was a pretty shade of blue. There were fluffy, white clouds in the sky…. (S NP VP)0.022 (NP DET JJ NN)0.033 (S (NP NN) (VP VBD NP)) (VP)0.111 …… PARSE TREES FREQUENCIES OF SUBTREES FREQUENCIES OF SUBTREES SUBTREE FEATURES S NP VP ADJP ADVADJ NVN S NP VP DET NVNP DET N … S NP VP ADJP ADVADJ NVN S NP VP DET NVNP DET N … 8 INPUT TEXT

Outline Introduction Lexical & Grammatical Features Scales of Measurement & Statistical Models Experimental Evaluation Results & Discussion 9

Scales of Measurement 10 Different statistical models are appropriate for different types of data (scales of measurement). What is the appropriate scale for readability?

Scales of Measurement 11 Natural ordering? Evenly Spaced? Meaningful Zero Point? Examples Nominal Ordinal Interval Ratio Annual income apples and oranges Severity of Illness: Mild, moderate, severe, … Years on a calendar

Statistical Modeling Approaches Compared 3 standard statistical modeling approaches for interval, ordinal, nominal data. – different assumptions, numbers of parameters and intercepts More parameters allow more complex models, but may be harder to estimate. 12

Linear Regression Well-suited for interval data. Reading level is linear function of feature values. 13 Single set of parameters for features Single intercept

Proportional Odds Model Log-linear model for ordinal data 14 Intercept for each level Estimated probability of text being level j: difference between levels j and j + 1. Single set of parameters for features

Multi-class Logistic Regression Log-linear model for nominal data 15 Intercept for each level Set of parameters for features for each level but one. Sum over all levels

Estimation and Regularization Parameters were estimated using L 2 regularization. Regularization hyper-parameter for each model was tuned with simple grid search and cross validation. 16

Hypothesis The Proportional Odds Model using both lexical and grammatical features will perform best. – Difference between reading ability between grades 1 & 2 should be larger than between 10 & 11. – Both Lexical & Grammatical features play a role. 17

Outline Introduction Lexical & Grammatical Features Scales of Measurement & Statistical Models Experimental Evaluation Results & Discussion 18

Evaluation Corpus Source: – Content text from set of Web pages. Reading level labels for grades 1-12: – Indicated by Web page or link to it. – Half authored by students. – Half labeled by teachers or authors. 289 texts, 150,000 words Various topics Even distribution across levels (+/- 3) Adapted from previous work: – Collins-Thompson & Callan, 2005 – Heilman, Collins-Thompson, Callan, & Eskenazi,

Evaluation Metrics 20 MeasureDescriptionDetailsRange Pearson’s Correlation Coefficient Strength of linear relationship between predictions and labels. Measures trends, but not the degree to which values match in absolute terms. [0, 1] Adjacent Accuracy Proportion of predictions that were within 1 of label. Intuitive Near miss predictions are treated the same as predictions that are ten levels off. [0, 1] Root Mean Square Error square root of mean squared difference of predictions from labels. strongly penalizes bad errors. “average difference from true level” [0, ∞)[0, ∞)

Evaluation Procedure Randomly split corpus into training set (75%) and test set (25%). Ten-fold stratified cross-validation on training set for model selection and hyper-parameter tuning. Test Set Validation: Compared each statistical model & feature set pair (or baseline) to hypothesized best model, the proportional odds model with combined feature set. 21

Outline Introduction Lexical & Grammatical Features Scales of Measurement & Statistical Models Experimental Evaluation Results & Discussion 22

Comparison of Feature Sets 23 Proportional Odds Model: Lexical Features Proportional Odds Model: Grammatical Features Proportional Odds Model: Combined Features * * p <.05

Comparison of Modeling Approaches 24 Linear Regression: Combined Features Multi-Class Logistic Regression: Combined Features Proportional Odds Model: Combined Features * * * * * p <.05

Comparison to Baselines Compared Proportional Odds Model with Combined Features to: – Flesch-Kincaid – Implementation of Lexile – Collins-Thompson and Callan’s language modeling approach PO model performed as well or better in almost all cases. 25

Findings: Feature Sets Grammatical features alone can be effective predictors of readability. Compared to (Heilman et al., 2007), uses more comprehensive & detailed set of grammatical features. – Does not require extensive linguistic knowledge and effort to manually define grammatical features. 26

Findings: Modeling Approaches Results suggest that reading grade levels lie on an ordinal scale of measurement. Proportional odds model for ordinal data lead to the most effective predictions in general. – More complex multi-class logistic regression did not lead to better predictions. 27

Questions? 28

Proportional Odds Model Intercepts PO intercepts estimate log odds ratio of a text being at or above a level compared to below that level. – Are the intercept values a linear function of grade levels? – Is there value in the ability to model ordinal data? 29 Grade LevelModel Intercept Difference Compared to Intercept for Previous Grade 1N/A N/A

Null Hypothesis Testing Used Bias-Corrected and Accelerated (BC a )Bootstrap (Efron & Tibshirani, 1993) to estimate 95% confidence intervals for differences in evaluation metrics for each model from the PO Model with Combined Features. The bootstrap performs random sampling with replacement of the held-out test dataset to create thousands of bootstrap replications. It then computes a statistic for each replication to estimate the distribution of that statistic. 30

Bootstrap Histogram Example Distribution of difference in RMSE between PO model with combined features and implementation of Lexile: 31 A difference of 0.0 corresponds to the null hypothesis

Comparison to Baselines 32 Lexile-like measure (Stenner et al., 1988) Lang. Modeling (Collins-Thompson & Callan, 2005) Flesch-Kincaid Proportional Odds Model: Combined Features * ** * p <.05

Simplified Linear Regression Example 33 Freq. of Embedded Clauses Freq. of Adverbial Phrases (Prototypical Text at Level j) j

Simplified PO Model Example 34 Freq. of Embedded Clauses Freq. of Adverbial Phrases (Prototypical Text at Level j) j

Simplified Logistic Regression Example 35 Freq. of Embedded Clauses Freq. of Adverbial Phrases (Prototypical Text at Level j) j