Russian Verbs Across Aspect and Genre

Slides:

Advertisements

Similar presentations

Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.

Advertisements

Estimation in Sampling

Statistical Tools for Evaluating the Behavior of Rival Forms: Logistic Regression, Tree & Forest, and Naive Discriminative Learning R. Harald Baayen University.

A quick introduction to the analysis of questionnaire data John Richardson.

Objective 1 Interpret graphs. Slide Linear Equations in Two Variables; The Rectangular Coordinate System.

Relationships Among Variables

Measures of Central Tendency

Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.

Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.

This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.

Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 

APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.

PSYCHOLOGY: Themes and Variations Weiten and McCann Appendix B : Statistical Methods Copyright © 2007 by Nelson, a division of Thomson Canada Limited.

QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures.

T-TEST Statistics The t test is used to compare to groups to answer the differential research questions. Its values determines the difference by comparing.

I Introductory Material A. Mathematical Concepts Scientific Notation and Significant Figures.

Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.

Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.

Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.

Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder.

11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.

Chapter 3 Section 1. Objectives 1 Copyright © 2012, 2008, 2004 Pearson Education, Inc. Linear Equations in Two Variables; The Rectangular Coordinate System.

Correlation & Regression Analysis

Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.

Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.

Copyright © 2016 Brooks/Cole Cengage Learning Intro to Statistics Part II Descriptive Statistics Intro to Statistics Part II Descriptive Statistics Ernesto.

Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.

Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.

Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.

Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.

Predicting Russian Aspect: A corpus study and an experiment

Chapter 2 Linear regression.

Chapter 12 Understanding Research Results: Description and Correlation

Step 1: Specify a null hypothesis

How learnable is Russian aspect?

Multiple Regression Prof. Andy Field.

Travelling to School.

Tennessee Adult Education 2011 Curriculum Math Level 3

Correlation and Simple Linear Regression

Intro to Statistics Part II Descriptive Statistics

Statistics: The Z score and the normal distribution

The Metaphorical Shape of Actions: Verb Classifiers in Russian

Intro to Statistics Part II Descriptive Statistics

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

CHAPTER 11 Inference for Distributions of Categorical Data

Understanding Standards Event Higher Statistics Award

POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.

Social Research Methods

Chapter 11: Inference for Distributions of Categorical Data

7.3 Best-Fit Lines and Prediction

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

Four Languages Verbs from the Bottom up

Product moment correlation

Volume 3, Issue 1, Pages (July 2016)

15.1 The Role of Statistics in the Research Process

CHAPTER 11 Inference for Distributions of Categorical Data

The Examination of Residuals

Facts from figures Having obtained the results of an investigation, a scientist is faced with the prospect of trying to interpret them. In some cases the.

CHAPTER 11 Inference for Distributions of Categorical Data

Constructing and Interpreting Visual Displays of Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

MGS 3100 Business Analysis Regression Feb 18, 2016

REGRESSION ANALYSIS 11/28/2019.

Presentation transcript:

Russian Verbs Across Aspect and Genre Laura A. Janda UiT The Arctic University of Norway with: Hanne M. Eckhoff and Olga Lyashevskaya

Aspect in Russian (a crash course) All forms of all verbs obligatorily express perfective vs. imperfective aspect Perfective aspect: unique, complete events with crisp boundaries Pisatel’ na-pisal/na-pišet roman ‘The writer has written/will write a novel’ Imperfective aspect: ongoing or repeated events without crisp boundaries Pisatel’ pisal/pišet roman ‘The writer was writing/is writing a novel’ Morphological marking is very helpful, but not entirely reliable: bare verb: usually imperfective (pisat’ ‘write’), some biaspectual (ženit’sja ‘marry’), a few perfective (dat’ ‘give’) prefix + verb: usually perfective (pere-pisat’ ‘rewrite’), some imperfective (pre-obladat’ ‘prevail’, pere-xodit’ ‘walk across’) prefix + verb + suffix: imperfective (pere-pis-yva-t’ ‘rewrite’)

How do speakers know which verbs are perfective and which ones are imperfective? Can the aspect of individual verbs can be predicted based on the statistical distribution of their inflectional forms? How does this compare with prediction based on aspectual morphology? Is genre a factor? It is unclear how children acquire Russian aspect in L1 Generativist theory would assume that aspect is part of UG Gvozdev (1961), based on his diary of son Ženja, claimed Russian aspect was fully acquired early on, but re-analysis of his and other data (Stoll 2001, Gagarina 2004) has shown that L1 acquisition is far from complete even at age 6

Where the aspects do and do not compete Paradigmatically competing: Non-past (future if perfective, present if imperfective) Past Imperative Infinitive Paradigmatically non-competing: Past gerunds and participles are perfective Present gerunds and participles are imperfective We examine the cues available via the distribution of paradigmatic forms, also known as “grammatical profiles”

Aspect via Grammatical Profiles Janda & Lyashevskaya (2011) showed that, for paired verbs, perfective and imperfective verbs have in aggregate different grammatical profiles This was a top-down approach (we started out by segregating perfective from imperfective verbs) and was limited to paired verbs Can aspect be approached bottom-up? Is it possible to figure out the aspect of individual verbs of all types (not just paired verbs) based only on the distribution of their grammatical forms in a corpus? Goldberg (2006) gives evidence that children are sensitive to statistical tendencies in L1 acquisition Could children learn to distinguish between perfective and imperfective verbs by supplementing morphological cues with paradigmatic cues (the distributions of verb forms)?

What is a grammatical profile? Verbs have different forms: eat 749 M eats 121 M eating 514 M eaten 88.8 M ate 258 M Figures from google – this is NOT a scientific way to do grammatical profiles, but it gives us a rough approximation to illustrate The grammatical profile of eat

chi-squared = 947756 df = 3 p-value < 2.2e-16 effect size Janda & Lyashevskaya 2011 Grammatical Profiles of Russian Verbs Top-Down Nonpast Past Infinitive Imperative Imperfective 1,330,016 915,374 482,860 75,717 Perfective 375,170 1,972,287 688,317 111,509 chi-squared = 947756 df = 3 p-value < 2.2e-16 effect size (Cramer’s V) = 0.399 (medium-large) Based on verbs with 100 or more attestations in RNC

Can we turn this upside-down and go Bottom-Up? Janda & Lyashevskaya 2011 Grammatical Profiles of Russian Verbs Top-Down Nonpast Past Infinitive Imperative Imperfective 1,330,016 915,374 482,860 75,717 Perfective 375,170 1,972,287 688,317 111,509 Can we turn this upside-down and go Bottom-Up? Based on verbs with 100 or more attestations in RNC

Grammatical Profiles of Russian Verbs Bottom-Up Data extracted from the manually disambiguated Morphological Standard of the Russian National Corpus (approx. 6M words), 1991-2012 Stratified by genre, 0.4M word sample for each Genre # Verb Tokens # Verb Lemmas # Verb Lemmas Frequency >50 Journalistic 52,716 5,940 185 Fiction 78,084 8,665 225 Scientific-Technical 43,528 4,494 172

Correspondence Analysis of Journalistic Data Input: 185 vectors (1 for each verb) of frequencies for verb forms Each vector tells how many forms were found for each verbal category: indicative non-past, indicative past, indicative future, imperative, infinitive, non-past gerund, past gerund, non-past participle, past participle rows are verbs, columns are verbal categories Process: Matrices of distances are calculated for rows and columns and represented in a multidimensional space defined by factors that are mathematical constructs. Factor 1 is the mathematical dimension that accounts for the largest amount of variance in the data, followed by Factor 2, etc. Plot of the first two (most significant) Factors, with Factor 1 as x-axis and Factor 2 as the y-axis You can think of Factor 1 as the strongest parameter that splits the data into two groups (negative vs. positive values on the x-axis)

On the Following Slide… Results of correspondence analysis for Journalistic data Perfective verbs represented as “p” Imperfective verbs represented as “i” Remember that the program was not told the aspect of the verbs All it was told was the frequency distributions of grammatical forms All it was asked to do was to construct the strongest mathematical Factor that separates the data along a continuum from negative to positive (x-axis)

Factor 1 looks like aspect Perfective Imperfective Factor 1 looks like aspect

Factor 1 correctly predicts aspect 91. 3% (negative = perfective vs Factor 1 correctly predicts aspect 91.3% (negative = perfective vs. positive = imperfective) Of the 185 verbs: 87 perfectives 84 negative values, 3 positive values, so 96.6% correct 3 deviations are: obojtis’ ‘make do without’, smoč’ ‘manage’, prijtis’ ‘be necessary’ 96 imperfectives 83 positive values, 13 negative values, so 86.5% correct 13 deviations are: ezdit’ ‘ride’, rešat’ ‘decide’, xodit’ ‘walk’, prinimat’ ‘receive’, iskat’ ‘seek’, rassčityvat’ ‘estimate’, provodit’ ‘carry out’, ožidat’ ‘expect’, borot’sja ‘struggle’, platit’ ‘pay’, čitat’ ‘read’, učastvovat’ ‘participate’, smotret’ ‘look’ 2 biaspectuals with low negative values obeščat’ ‘promise’, ispol’zovat’ ‘use’

What is the basis for these statements? Interpretation of 0 (zero) Value for Factor 1 in Correspondence Analysis Kamphuis 2016: 78-79: “The first thing we notice is that the verbs that are shown in the scatter plot and in the corresponding table (Eckhoff & Janda 2014: 240-241) show a continuum and not a clear division into two groups. This makes the vertical line drawn at 0 (zero), dividing the lefties and the righties, look arbitrary. The zero does not have a clear meaning, nor does it seem to be a natural boundary in the scatter plot.” Kamphuis 2016: 152: “The lines are just as arbitrary as the line that Eckhoff & Janda (2014: 238) draw at zero and have no consequences for the final assessment of the aspect of the verbs in the group.” What is the basis for these statements?

R. Harald Baayen “the loadings on axis/factors/principal components/corresp. analysis dimensions are some form of correlation, that tell you to what extent your original variable is correlated with your new axis. So if you have a loading of zero, it means the original factor does not predict your current axis, they are "uncorrelated". A large positive loading indicates the original predictor axis and the new axis are very similar (highly correlated, small angle between their vectors), and a large negative loading indicates they point in opposite directions. ”

Natalia Levshina “The origin (zero point) represents the centre of gravity…, or, in other words, the centroid (average) of the column and row profiles. The further a point representing a row or a column from the origin, the greater the chi-squared distance from it to the centroid and the more it should contribute to the inertia (the CA parlance for variance). So, the zero point is by no means arbitrary or meaningless!”

Stefan Th. Gries “the 0-point is not arbitrary”

Plot for Fiction

Overview of sorting of verbs for Fiction All Verbs Perfective Verbs Imperfective Verbs Biaspectual Verbs Total # Verbs 225 114 110 1 # Deviations 19 6 13 NA Factor 1 Correctly Predicts Aspect 91.5% 94.7% 88.1%

Plot for Scientific-Technical Prose

Overview of sorting of verbs for Scientific-Technical Prose All Verbs Perfective Verbs Imperfective Verbs Biaspectual Verbs Total # Verbs 172 64 103 5 # Deviations 7 2 NA Factor 1 Correctly Predicts Aspect 95.8% 97.0% 95.2%

Comparison of Accuracy of Paradigmatic vs. Morphological Cues Chi-square tests reveal no significant differences between accuracy of paradigmatic cues and morphological cues Paradigmatic cues are just as reliable as morphological cues for predicting the aspect of verbs

e.g. provodit´ ‘conduct’ e.g. vyjti ‘exit’ e.g. dat´ ‘give’ e.g. napisat´ ‘write’ e.g. xodit´ ‘walk’ e.g. idti ‘walk’ e.g. pokazyvat´ ‘show’ e.g. obeščat´ ‘promise’ e.g. igrat´ ‘play’ Aspectual morphology and Factor 1 values for Journalistic prose

e.g. provodit´ ‘conduct’ e.g. vyjti ‘exit’ e.g. dat´ ‘give’ e.g. napisat´ ‘write’ e.g. xodit´ ‘walk’ Perfective verbs (aside from smoč´, prijtis´, obojtis´) are well-behaved e.g. idti ‘walk’ e.g. pokazyvat´ ‘show’ e.g. obeščat´ ‘promise’ e.g. igrat´ ‘play’ Aspectual morphology and Factor 1 values for Journalistic prose

Biaspectuals and indeterminate motion verbs e.g. provodit´ ‘conduct’ e.g. vyjti ‘exit’ e.g. dat´ ‘give’ e.g. napisat´ ‘write’ e.g. xodit´ ‘walk’ Biaspectuals and indeterminate motion verbs lie near the line, but with perfectives e.g. idti ‘walk’ e.g. pokazyvat´ ‘show’ e.g. obeščat´ ‘promise’ e.g. igrat´ ‘play’ Aspectual morphology and Factor 1 values for Journalistic prose

Other imperfectives are mostly well-behaved e.g. provodit´ ‘conduct’ e.g. vyjti ‘exit’ Other imperfectives are mostly well-behaved e.g. dat´ ‘give’ e.g. napisat´ ‘write’ e.g. xodit´ ‘walk’ e.g. idti ‘walk’ e.g. pokazyvat´ ‘show’ e.g. obeščat´ ‘promise’ e.g. igrat´ ‘play’ Aspectual morphology and Factor 1 values for Journalistic prose

Summary: Paradigmatic Perspective When we look at the distribution of verb forms, aspect (or a close approximation) emerges as the most important factor distinguishing verbs It is possible to sort high-frequency verbs as perfective vs. imperfective based only on the distribution of their forms with about 93% accuracy Individual verbs can deviate from overall patterns Alignment of paradigmatic and morphological cues is clearly demonstrated

Paradigmatic and morphological cues compared The two major classes of verbs with unambiguous aspectual morphology, namely the prefixed perfectives (PFperf) and secondary imperfectives (2Impv), also have their medians maximally far from the zero line. These two classes of verbs clearly behave as prototypes for the two aspects. Both of these classes allow considerable freedom in distributions of grammatical forms for individual verbs, which is reasonable since they are clearly marked morphologically. Grammatical profiles are most unambiguous (non-overlapping) precisely for the two classes of verbs which lack morphological marking: the simplex verbs that are either perfective or imperfective. Here the paradigmatic cues play a particularly important role, making it possible to distinguish perfective from imperfective verbs in the absence of morphological cues.