Lexico-grammar: From simple counts to complex models

Slides:



Advertisements
Similar presentations
Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
Advertisements

Logistic Regression.
Simple Logistic Regression
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Chapter 7 Correlational Research Gay, Mills, and Airasian
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Generalized Linear Models
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Categorical Data Prof. Andy Field.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 13: Nominal Variables: The Chi-Square and Binomial Distributions.
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
Logistic Regression Database Marketing Instructor: N. Kumar.
Linear correlation and linear regression + summary of tests
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Statistical test for Non continuous variables. Dr L.M.M. Nunn.
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
1 Follow the three R’s: Respect for self, Respect for others and Responsibility for all your actions.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Week 13a Making Inferences, Part III t and chi-square tests.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.
I. ANOVA revisited & reviewed
Introduction to Marketing Research
Cross Tabulation with Chi Square
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
32931 Technology Research Methods Autumn 2017 Quantitative Research Component Topic 4: Bivariate Analysis (Contingency Analysis and Regression Analysis)
BINARY LOGISTIC REGRESSION
Logistic Regression When and why do we use logistic regression?
EHS Lecture 14: Linear and logistic regression, task-based assessment
Logistic Regression APKC – STATS AFAC (2016).
Notes on Logistic Regression
Chapter 13 Nonlinear and Multiple Regression
Making Comparisons All hypothesis testing follows a common logic of comparison Null hypothesis and alternative hypothesis mutually exclusive exhaustive.
Basic Estimation Techniques
8. Association between Categorical Variables
Association between two categorical variables
Bi-variate #1 Cross-Tabulation
Categorical Data Aims Loglinear models Categorical data
Multiple Regression.
Generalized Linear Models
Multiple logistic regression
The Chi-Square Distribution and Test for Independence
Chi Square Two-way Tables
If we can reduce our desire,
Statistics in SPSS Lecture 9
Statistical Analysis using SPSS
Chapter 10 Analyzing the Association Between Categorical Variables
LEARNING OUTCOMES After studying this chapter, you should be able to
Logistic Regression.
Computing A Variable Mean
Introduction to Logistic Regression
Change over time: Working with diachronic data
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Analyzing the Association Between Categorical Variables
Register variation: correlation, clusters and factors
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Comparing Two Variables
Applied Statistics Using SPSS
One-Factor Experiments
Applied Statistics Using SPSS
Logistic Regression.
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Presentation transcript:

Lexico-grammar: From simple counts to complex models Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

“If these words are not essential to the meaning of your sentence, use ‘which’ and separate the words with a comma” (Microsoft 2010).

Think about and discuss What is a grammatical rule? Do you think grammatical rules apply in all cases? Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Where to start? Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Lexico-grammar: Research design Linguistic feature design explanatory variables/predictors Linguistic/outcome variable Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Lexico-grammatical frame Opportunity of use (obligatory place). Place where variation happens in text. E.g. NOUN + which/that + relative clause Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Lexico-grammatical frame (cont.) Research question Outcome variable options Lexico-grammatical frame When do we use the passive construction? ACTIVE, PASSIVE All verb forms that can be used in passive i.e. transitive verbs. In what contexts do we use which and in what contexts that in relative clauses? which, that All relative clauses. When do speakers use that deletion? E.g. I think Ø this is good. that, Ø [no relativizer] All clauses where that occurs or is deleted. What is the difference between various modal expressions of strong obligation? must, have to, need to All contexts in which strong deontic modals occur. Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Lexico-grammatical vs. ‘ambient’ variables It's about time that was done [BNC, file: KBB]. Well, you know, it you see, time were, I don't know I suppose, I don't know but I never seemed to be afraid... [BNC, file: HDK]. Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Cross-tabulation and mosaic plot Presence of separator Relativizer   separator (, or –) no separator Total which 1,396 (63%) 804 (37%) 2,200 that 191 (3%) 7,281 (97%) 7,472 1,587 8,085 9,672 Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Cross-tabulation and mosaic plot (cont.) Presence of separator Relativizer   separato r (, or –) no separator Total which 1,396 (63%) 804 (37%) 2,200 that 191 (3%) 7,281 (97%) 7,472 1,587 8,085 9,672 Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Percentages and chi-squared test Percentages in a cross-tabulation table = cell value relevant total ×100 Chi-squared =Sum for all cells of (observed frequency −expected frequency) 2 expected frequency Assumptions: Independence of observations. Expected frequencies greater than 5 (In contingency tables larger than 2 × 2 at least 80% of expected frequencies greater than 5). Alternative tests: Log likelihood test (also known as likelihood ratio test or G test) or the Fisher exact test. Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Complex model: logistic regression  Variety  Separator (, or –)  Clause Syntax Relativizer  Total    that  which American NO Non-restrictive Object 3 1 4 Subject 11 2 13 Restrictive 18 20 126 5 131 YES 6  British 8 14 22 76 15 91 31 Total 256 104 360 Presence of separator Relativizer   separato r (, or –) no separator Total which 1,396 (63%) 804 (37%) 2,200 that 191 (3%) 7,281 (97%) 7,472 1,587 8,085 9,672 Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Logistic regression Outcome: nominal, (ordinal) Statistical model: Combination of predictors with different weights, linguistically: patterns/'rules' of lexico-grammar Category A e.g. that Category B e.g. which [...] Predictors: nominal, ordinal, scale Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Logistic regression: Dataset explanatory variables/predictors Linguistic/outcome variable nominal nominal nominal nominal scale nominal Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

Logistic regression: output Model 1 with predictor variables (‘Variety’, ‘Separator’, ‘Clause type’, ‘Syntax’ and ‘Length’) is significant (LL: 222.31; p < .0001) and has outstanding classification properties (C-index: 0.91). Estimate (log odds) Standard Error Z value (Wald) p-value Estimate (odds) 95% CI lower 95% CI upper (Intercept) -3.354 0.563 -5.958 0.000 0.035 0.011 0.099 VarietyB_BR 1.667 0.397 4.195 5.296 2.511 12.080 SeparatorB_YES 3.985 0.825 4.832 53.795 12.876 376.448 ClauseB_Non_restr 2.046 0.446 4.588 7.733 3.235 18.812 SyntaxB_Subject -0.614 0.421 -1.460 0.144 0.541 0.240 1.260 Length 0.079 0.029 2.739 0.006 1.083 1.023 1.147 Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.

“If these words are not essential to the meaning of your sentence, use ‘which’ and separate the words with a comma” (Microsoft 2010). Was the computer right after all? If the suggestion by the computer were to be taken as a categorical rule, the answer is certainly ‘no’. There is a combination of multiple factors that favour or disfavour the use of which (and that) and these factors have to be interpreted as probabilities (or odds, to be precise), not certainty.

Things to remember When analysing lexico-grammatical variation we need to pay attention to individual linguistic contexts and define a lexico-grammatical frame. Cross-tabulation can be used for a simple analysis of categorical variables. In addition to frequencies, crosstab tables can also include percentages based on row totals (most useful for investigation of lexico-grammar), column totals and the grand total. The data in cross-tab tables can be effectively visualized using mosaic plots. We can test the statistical significance of the relationship between variables in a two-way crosstab table (i.e. a table with one linguistic and one explanatory variable) using the chi-squared test. The effect sizes reported are Cramer’s V (overall effect) and probability or odds ratios (individual effects). Logistic regression is a sophisticated multivariable method for analysing the effect of different predictors (both categorical and scale) on a categorical (typically binary) outcome variable. Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.