Statistical Tools for Evaluating the Behavior of Rival Forms: Logistic Regression, Tree & Forest, and Naive Discriminative Learning R. Harald Baayen University.

Slides:

Advertisements

Similar presentations

Continued Psy 524 Ainsworth

Advertisements

Linear Regression.

Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise – Example Uncertainty in fall time data Bootstrapping.

Inference for Regression

LECTURE 3 Introduction to Linear Regression and Correlation Analysis

© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.

Statistical Methods Chichang Jou Tamkang University.

Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review

Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.

Classification 10/03/07.

Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.

An Introduction to Logistic Regression

Correlational Designs

1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.

On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.

Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.

Discriminant Analysis Testing latent variables as predictors of groups.

Classification and Prediction: Regression Analysis

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.

沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.

CHAPTER NINE Correlational Research Designs. Copyright © Houghton Mifflin Company. All rights reserved.Chapter 9 | 2 Study Questions What are correlational.

1 Least squares procedure Inference for least squares lines Simple Linear Regression.

by B. Zadrozny and C. Elkan

BPS - 3rd Ed. Chapter 211 Inference for Regression.

Understanding Statistics

Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.

Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.

Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.

Multivariate Data Analysis Chapter 5 – Discrimination Analysis and Logistic Regression.

Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.

MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.

Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.

Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.

Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.

Chapter 16 Data Analysis: Testing for Associations.

Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 12 Making Sense of Advanced Statistical.

Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.

Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.

Nurissaidah Ulinnuha. Introduction Student academic performance ( ) Logistic RegressionNaïve Bayessian Artificial Neural Network Student Academic.

© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Multiple Logistic Regression STAT E-150 Statistical Methods.

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.

Chapter 8: Simple Linear Regression Yang Zhenlin.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

A first order model with one binary and one quantitative predictor variable.

Konstantina Christakopoulou Liang Zeng Group G21

Classification Ensemble Methods 1

Case Selection and Resampling Lucila Ohno-Machado HST951.

Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.

Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?

Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

BPS - 5th Ed. Chapter 231 Inference for Regression.

BINARY LOGISTIC REGRESSION

Chapter 7. Classification and Prediction

26134 Business Statistics Week 5 Tutorial

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

General principles in building a predictive model

Machine Learning Basics

Making Sense of Advanced Statistical Procedures in Research Articles

BOOTSTRAPPING: LEARNING FROM THE SAMPLE

Multiple Decision Trees ISQS7342

Presentation transcript:

Statistical Tools for Evaluating the Behavior of Rival Forms: Logistic Regression, Tree & Forest, and Naive Discriminative Learning R. Harald Baayen University of Tübingen/University of Alberta Laura A. Janda CLEAR-group (Cognitive Linguistics – Empirical Approaches to Russian) University of Tromsø

Other contributors: Anna Endresen Svetlana Sokolova Tore Nesset Anastasia Makarova

All data and code are publicly available on this website:

Evaluating the behavior of rival forms Rival forms:  Languages often provide choices of two or more forms, let’s call them X vs. Y, with (approximately) the same meaning  Linguists often want to know what factors influence the choice of rival forms Traditional approach:  Collect corpus data showing the distribution of the rival forms and the other factors  Build a logistic regression model with rival forms (form X vs. form Y) as dependent variable and other factors as independent (predictor) variables

What a logistic regression analysis does A logistic regression model predicts the probability that a given value (form X, or alternatively, form Y) for the dependent variable will be used, taking into account the effects of the independent variables It can tell us what the relationship is between the choice of rival forms and other factors logistic function

Drawbacks of logistic regression Assumes normal distribution (parametric assumption), which is usually not the case with corpus data Assumes that all combinations of all variables are represented in the dataset, but linguistic data often involves paradigmatic gaps normal distribution

More drawbacks of logistic regression Building the optimal model can be a laborious task There is a danger of overfitting and no validation (after corpus data is collected there is usually no opportunity to collect an independent equivalent dataset) Results are a table and coefficients that can be hard to interpret

Alternatives to regression: tree & forest and naive discriminative learning These models:  are non-parametric (do not assume a normal distribution)  do not assume all combinations of variable levels are represented  do not require tedious building and fine-tuning  provide opportunities for validation  yield results that can be easier to interpret (especially trees)

Drawbacks of tree & forest and naive discriminative learning tree & forest model cannot handle random effects – mixed effects logistic models are needed tests of significance (p-values) are available for logistic regression and tree & forest, but not for naive discriminative learning from a cognitive perspective, naive discriminative learning makes sense only when data represent a speaker’s learning experience

Comparing regression, tree & forest, and naive discriminative learning We have tested the performance of regression, tree & forest and naive discriminative learning across four different datasets and found:  The three models perform nearly identically in terms of classification accuracy, indices of concordance and measurement of variable importance In other words, by most measures tree & forest and naive discriminative learning models perform just as well as regression

Our four datasets (all Russian) The Locative Alternation in Russian Distribution of prefixes pere- vs. pre- Distribution of prefixes o- vs. ob- Loss of –nu suffix in verbs Different datasets may be better served by different models See Baayen et al. forthcoming

The Locative Alternation in Russian Theme-object construction gruzit’ seno na telegu [load hay-ACC onto wagon-ACC] ‘load hay onto the wagon’ Goal-object construction gruzit’ telegu senom [load wagon-ACC hay-INST] ‘load the wagon with hay’ Variables: VERB (prefixes), (passive) PARTICIPLE, REDUCED VERB: unprefixed gruzit’ or prefixed: nagruzit’, zagruzit’, pogruzit’ PARTICIPLE: Theme-object: seno gruženo na telegu ‘hay is loaded onto the wagon’ Goal-object: telega gružena senom ‘the wagon is loaded with hay’ REDUCED: Theme-object: gruzit’ seno ‘load the hay’ Goal-object: gruzit’ telegu ‘load the wagon’

The Locative Alternation in Russian RIVAL FORMS: the two constructions, theme-object vs. goal- object DEPENDENT VARIABLE: CONSTRUCTION: theme-object vs. goal-object INDEPENDENT VARIABLES: VERB: zero (for the unprefixed verb gruzit’) vs. na- vs. za- vs. po- PARTICIPLE: yes vs. no REDUCED: yes vs. no DATA: 1920 sentences from the Russian National Corpus

Logistic regression model See Handout Optimal model: CONSTRUCTION~VERB+REDUCED+PARTICIPLE +VERB*PARTICIPLE The model estimates how the log of the number of theme constructions divided by the log of the number of goal constructions depends on the predictors. The coefficient of the estimate (Coeff.) is POSITIVE if the combination of factors predicts more theme constructions, but NEGATIVE if they predict more goal constructions

Tree & forest Classification and regression tree (CART) uses recursive partitioning to yield a classification tree showing the best sorting of observations separating the values for the dependent variable  optimal algorithm for predicting an outcome given the predictor values Random forest uses repeated bootstrap samples drawn with replacement from the dataset such that in each repetition some observations are sampled and serve as a training set and other observations are not sampled, so they can serve for validation of the model  predictor variables are also randomly removed from repetitions, making it possible to measure variable importance

Tree & forest: CART Includes intercept of logistic regression model: VERB = na, PARTICIPLE = no

Tree & forest: variable importance

Naive discriminative learning Naive discriminative learning is a quantitative model for how choices are made, making use of a system of weights that are estimated using equilibrium equations  See Table 3 on handout Naive discriminative learning partitions the data into ten subsamples, nine of which serve as the training set, reserving the tenth one to serve for validation  This process is repeated ten times so that each subsample is used for validation

Naive discriminative learning: variable importance

Comparison of performance Logistic regression  C = index of concordance = 0.96  Accuracy = 89% Tree & forest  C = index of concordance = 0.96  Accuracy = 89% Naive discriminative learning  C = index of concordance = 0.96  Accuracy = 88%

Conclusions Logistic regression, tree & forest and naive discriminative learning give very similar results in terms of overall performance and assessment of factors In addition to the gruzit’ ‘load’ dataset, we have tested these tools on three other datasets and found remarkable convergence between the models Tree & forest and naive discriminative learning models can be used reliably to complement logistic regression These tools can empower cognitive linguists to find the structure in rival forms data and consider how these patterns are learned