Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linguistic Profiling Laura A. Janda

Similar presentations


Presentation on theme: "Linguistic Profiling Laura A. Janda"— Presentation transcript:

1 Linguistic Profiling Laura A. Janda
CLEAR (Cognitive Linguistics: Empirical Approaches to Russian) UiT The Arctic University of Norway

2 A Big Question perspective
Big Questions Transcend theory Interesting for all linguists Theory Helps to focus Big Questions Operationalization Facilitates quantitative methods Not necessarily unidirectional. You might notice something quantitatively and then realize it has theoretical/Big Q implication. Important thing is to keep perspective across these

3 Overview Big Questions Theoretical perspective Operationalization
Portable Multipurpose Examples Infrastructure Applications

4 1. Some Big Questions What is the relationship between form and meaning? What is the relationship between lexicon and grammar? What is the structure of linguistic categories? What is the structure of linguistic constructions?

5 2. Theoretical perspective: Cognitive linguistics
Minimal Assumption: language can be accounted for in terms of general cognitive strategies no autonomous language faculty no strict division between grammar and lexicon no a priori universals Usage-Based: generalizations emerge from language data no strict division between langue and parole no underlying forms Meaning is Central: holds for all language phenomena no semantically empty forms differences in behavior are motivated (but not specifically predicted) by differences in meaning

6 Big Questions focused by Cognitive Linguistics
What is the relationship between form and meaning? How does form reflect meaning? Can we use difference in form as a measure of meaning? What is the relationship between lexicon and grammar? How do we account for meaning in grammar? Can we use similar models for grammatical meanings? What is the structure of linguistic categories? What is relationship between prototype and periphery? Can we compare category structure across near synonyms? What is the structure of linguistic constructions? Are constructions hierarchical or flat? What is the relationship between constructions and fillers?

7 3. Operationalization: Linguistic profiles
Focused subsets of behavioral profiles (Firth 1957, Harris 1970, Hanks 1996, Geeraerts et al. 1999, Speelman et al. 2003, Divjak & Gries 2006, Gries & Divjak 2009) Grammatical profiling: relationship between frequency distribution of forms and linguistic categories Semantic profiling: relationship between meanings (semantic tags) and forms Constructional profiling: relationship between frequency distribution of grammatical constructions and meaning Radial category profiling: differences in the frequency distribution of uses across two or more near-synonyms Collostructional profiling: relationship between a construction and the words that fill its slots Relate also to Speelman et al.’s use of profiling...

8 4. Portable Linguistic profiles are portable across questions
across theories across statistical models across languages Linguistic profiles are a suite of methodological ideas that make it possible to approach Big Questions empirically from a variety of angles Ideally results are also portable across platforms open source, open access, available to all researchers

9 5. Multipurpose Quantitatively measured results yield real gains in our understanding of languages These results can serve multiple purposes: resources for language learners and users (real, not statistical) machine translation documentation and revitalization for minority indigenous languages language policy

10 6. Examples For each example we will identify: Big Questions
Grammatical Profiles: TAM in Russian Semantic Profiles: “Empty” prefixes in Russian Constructional Profiles: SADNESS in Russian Radial Category Profiles: Ambipositions in North Saami For each example we will identify: Big Questions Theoretical perspective Operationalization (Profiling) & statistical methods Portability Multipurpose applications

11 Grammatical Profiles: TAM in Russian
Janda, L. A. & Lyashevskaya, O “Grammatical profiles and the interaction of the lexicon with aspect, tense and mood in Russian”. Cognitive Linguistics 22:4 (2011),

12 Crash course in Russian TAM
Tense: Past vs. Non-Past Non-Past: Imperfective = Present vs. Perfective = Future Aspect: Perfective (marked) vs. Imperfective (unmarked) All forms of all verbs express aspect “Aspectual pairs” = same lexical meaning, different aspect, e.g., pisat’ ‘write[imperfective]’ vs. napisat’ ‘write[perfective]’ Aspectual pairs can be formed via both prefixation and suffixation (perepisat’ ‘rewrite[perfective]’ vs. perepisyvat’ ‘rewrite[imperfective]’) ≈1400 imperfective base stems form ≈2000 perfective aspectual partners using 16 prefixes ≈20K perfective stems form imperfective partners using 3 suffixes These affixes are traditionally assumed to be “empty” Mood: imperative, infinitives in modal constructions

13 Grammatical Profiles: TAM in Russian
Big Questions: What is the relationship between form and meaning? ➜ between verb inflection and grammatical meaning of aspect? What is the relationship between lexicon and grammar? ➜ between lexical meaning of verbs and TAM?

14 Grammatical Profiles: TAM in Russian
Theoretical focus: Can we measure the expression of aspect according to distribution of inflected forms? Can we distinguish between prefixation vs. suffixation in formation of aspectual pairs? Can we measure the attraction of lexical classes to grammatical categories?

15 Grammatical Profiles: TAM in Russian
Operationalization: Grammatical profiles: frequency distribution of inflected forms ➜Distribution of Russian verb forms according to subparadigm ➜Distribution of Russian verbs according to subparadigm Data: Approx. 6M verb forms from the Russian National Corpus ( ) Statistics: Chi-square, Cramer’s V effect size, distribution plots

16 What is a grammatical profile?
Verbs have different forms: eat M eats 121 M eating 514 M eaten M ate 258 M Figures from google – this is NOT a scientific way to do grammatical profiles, but it gives us a rough approximation to illustrate The grammatical profile of eat

17

18 Grammatical Profiles of Russian Verbs
Nonpast Past Infinitive Imperative Imperfective 1,330,016 915,374 482,860 75,717 Perfective 375,170 1,972,287 688,317 111,509 chi-squared = df = 3 p-value < 2.2e-16 effect size (Cramer’s V) = 0.399 (medium-large) Based on verbs with 100 or more attestations in RNC

19 Prefixation (dark) vs. suffixation (light):
Distribution of Russian verb forms according to subparadigm Prefixation (dark) vs. suffixation (light): Statistically significant, BUT effect sizes too small (0.076 & 0.037)

20 Distribution of Russian verbs according to subparadigm:
Imperfective verbs and their attraction to imperative Over 200 outliers 7/19/20187/19/2018

21 Imperfective imperative “be doing X!”
Polite: guest knows what to expect: razdevajtes’ ‘take off your coat’, sadites’ ‘sit down’ Insistence: hearer is hesitant: stupajte ‘get going’, gljadite ‘look’, zabirajte ‘take’ Insistence: hearer has not behaved properly (connection with negation): provalivaj ‘get out of here’, končaj ‘stop’, ne perebivaj ‘don’t interrupt’ Polite requests: vyručajte ‘help’ Kind wishes: vyzdoravlivajte ‘get well’ Idiomatic: davajte posmotrim ‘let’s take a look’ Idiomatic/culturally anchored: proščaj(te) ‘farewell’, soedinjajtes’ ‘unite’ (slogan), zapevaj ‘sing’ (army)

22 Grammatical Profiles: Findings
Perfective verbs behave differently than imperfective verbs “Verb pairs” behave the same regardless of which type of morphology (prefixation vs. suffixation) is used to mark aspect We can identify exactly the verbs that are most attracted to various TAM combinations.

23 Grammatical Profiles: Portability
Across issues: Grammatical profiling and gender stereotypes (Kuznetsova 2012) Across languages: Gives 96% resolution of perfective vs. imperfective for Old Church Slavonic verbs, as compared with Dostál 1954 (Eckhoff & Janda 2013) Planned study of grammatical profiles across 4 languages: Across researchers: All outlier verbs listed in Janda & Lyashevksaya 2011, data and code for Eckhoff & Janda 2013 on website Morphological Aspect Morphological Aktionsart Russian + Czech - N. Saami Norwegian

24 Grammatical Profiles: Multipurpose Applications
Pedagogical implications: Strategic combinations of verbs and subparadigms

25 Semantic Profiles: “Empty” prefixes in Russian
Janda, L. A. & Lyashevskaya, O “Semantic Profiles of Five Russian Prefixes: po-, s-, za-, na-, pro-”. Journal of Slavic Linguistics 21:2,

26 Semantic Profiles: “Empty” prefixes in Russian
Big Questions: What is the relationship between form and meaning? ➜ ...between prefixes and meanings of verbs? Are there any “empty” forms? ➜ Are prefixes empty as claimed? Imperfective base Prefixed perfective sovetovat’ ‘advise’ posovetovat’ ‘advise’ varit’ ‘cook’ svarit’ ‘cook’ pisat’ ‘write’ napisat’ ‘write’ tverdet’ ‘harden’ zatverdet’ ‘harden’ gremet’ ‘thunder’ progremet’ ‘thunder’

27 Semantic Profiles: “Empty” prefixes in Russian
Theoretical focus: Can we measure the relationship between prefixes and meanings of verbs? ➜ Distribution of prefixes vs. semantic groups of verbs How do we show that “empty” forms aren’t really empty? ➜ Show that prefixes have different semantic behaviors

28 Semantic Profiles: “Empty” prefixes in Russian
Operationalization: Semantic profiling: relationship between meanings (semantic tags) and forms ➜Distribution of Russian verb prefixes vs. semantic tags Data: 382 verbs with “empty” prefixes from the Exploring Emptiness database ( ), semantic tags independently assigned in the Russian National Corpus ( ) Statistics: Chi-square, Cramer’s V effect size, Fisher Test

29 chi-square = 248, df = 12, p = 2.2e-16; Cramer’s V effect-size = 0.8

30 Attractions and repulsions measured by Fisher Test

31 Semantic Profiles: Findings
Each prefix has a unique semantic profile Each prefix is attracted to and repulsed by a different set of semantic classes of verbs It is possible to establish meanings of prefixes and expectations for how prefixes combine with verbs

32 Semantic Profiles: Portability
All data, statistical code, lists of verbs available at:

33 Semantic Profiles: Multipurpose Applications
Pedagogical implications: We can design materials that reduce the burden of memorizing ≈2000 correct prefix-verb combinations

34 Constructional Profiles: SADNESS in Russian
Janda, L. A. & Solovyev, V “What Constructional Profiles Reveal About Synonymy: A Case Study of Russian Words for sadness and happiness”. Cognitive Linguistics 20:2,

35 Crash course in Russian case & SADNESS
Nouns are obligatorily case-marked 6 cases: Nominative, Accusative, Dative, Instrumental, Genitive, Locative All cases can appear with a preposition All cases except Locative can also appear without a preposition 70 constructions [(preposition) [NOUN]case] SADNESS: 6 near-synonyms, no “umbrella term” grust’, melanxolija, pečal’, toska, unynie, xandra

36 Constructional Profiles: SADNESS in Russian
Big Questions: What is the relationship between form and meaning? ➜What is the relationship between words and grammatical constructions? ➜What is the relationship between synonyms?

37 Constructional Profiles: SADNESS in Russian
Theoretical focus: Can we measure the difference between synonyms in terms of distribution in grammatical constructions?

38 Constructional Profiles: SADNESS in Russian
Operationalization: Constructional profiling: relationship between frequency distribution of grammatical constructions and meaning ➜SADNESS words vs. distribution in [(preposition) [NOUN]case] constructions Data: 500 sentences for each word from Russian National Corpus, Biblioteka Maksima Moškova Statistics: Chi-square, Cramer’s V effect size, Hierarchical Clustering (squared Euclidean distance)

39 Chi-square = 730.35, df = 30, p < 0.0001, Cramer’s V = 0.305

40 ‘Sadness’ Hierarchical Cluster pečal’ toska xandra melanxolija grust’
unynie

41 Constructional Profiles: Findings
Each synonym has a unique constructional profile Some synonyms are closer together, others are farther apart

42 Constructional Profiles: Portability
Across issues: Logistic regression analysis of Russian gruzit’ ‘load’ with 3 “empty” prefixes across Locative Alternation constructions (Sokolova 2012, Sokolova, Janda and Lyashevskaya 2012) Analysis of aspectual pairs formed by prefix pro- (Kuznetsova 2012) Across languages: North Saami anaphoric possessive constructions: reflexive pronoun vs. possessive suffix (forthcoming) Data published in Janda & Solovyev article; data and code for gruzit’ on website.

43 Constructional Profiles: Multipurpose Applications
Pedagogical implications: Teach relevant constructions with near-synonyms Possible implication for machine translation: Lexical selection informed by constructional profiles

44 Radial Category Profiles: Ambipositions in North Saami
Antonsen, L., Janda, L. A., & Baal, B. A. B. “Njealji davvisámi adposišuvnna geavahus” [“The Use of Four North Saami Adpositions”], co-authored with Lene Antonsen[1] and Berit Anne Bals Baal[3], Sámi dieđalaš áigečála 2012, v pp. Janda, L. A., Antonsen, L. & Baal, B. A. B. Forthcoming. “A Radial Category Profiling Analysis of North Sámi Ambipositions”. High Desert Linguistics Society Proceedings, Volume pp.

45 Crash course in North Saami ambipositions
Unusually large number of adpositions that can appear as both prepositions and postpositions, always use Genitive case 1. a. miehtá dálvvi b. dálvvi miehtá [over winter-G] [winter-G over] ‘during the winter’ 2. a. čađa áiggi b. áiggi čađa [through time-G] [time-G through] ‘through time’ 3. a. rastá joga b. joga rastá [across river-G] [river-G across] ‘across the river’ 4. a. maŋŋel soađi b. soađi maŋŋel [after war-G] [war-G after] ‘after the war’ 5 = North Saami

46 Radial Category Profiles: North Saami ambipositions
Big Questions: What is the relationship between form and meaning? ➜What is the relationship between position (preposition vs. postposition) and meaning? What is the influence of majority languages (prepositional languages in West vs. postpositional languages in East)? Is there a relationship between frequency of ambipositions and their use to distinguish meaning?

47 Radial Category Profiles: North Saami ambipositions
Theoretical focus: Can we measure the difference between uses in preposition vs. postposition? Can we model the meanings in terms of a radial category? Can we measure dialectal differences?

48 Radial Category Profiles: North Saami ambipositions
Operationalization: Radial category profiling: differences in the frequency distribution of uses across two or more near-synonyms ➜Distribution across uses in radial category for preposition vs. postposition Data: 100+ sentences for each position from 10M word newspaper corpus, plus exx. from literature, Bible translation Statistics: Chi-square, Cramer’s V effect size

49 Radial categories: miehtá ‘over’ in newspapers
time 9% extent 79% time 95% extent 5% motion 12% Statistically significant difference, large effect size -- test on time vs. space (collapses motion and extent) preposition postposition chi-squ = 170, df = 2, p < 2.2e-16; Cramer’s V = 0.85

50 Distribution of adpostitions
Х2=129.7, df=2, p<2.2e-16 Cramer’s V=0.48

51 Radial Category Profiles: Findings
There is a relationship between meaning and position Prevailing trends in majority languages do influence use of position There seems to be a typological relationship between frequency of ambipositions and their use to distinguish meaning Languages with few ambipositions (Germanic, Russian) do not use position distinctively Languages with more ambipositions use them in more complex ways (North Saami > Finnish, Estonian)

52 Radial Category Profiles: Portability
Across issues and languages: Russian prefixes vy- vs. iz- (Nesset, Endresen, Janda 2011) Russian prefixes o-/ob-/obo- (Baydimirova [Endresen] 2010) Data and code published on website.

53 Radial Category Profiles: Multipurpose Applications
Pedagogical implications: Teach ambipositions with relevant meanings and nouns Improvements to constraint grammar analyzer: Improves linguistic analysis and language technology tools, these are crucial to preserving and revitalizing the language

54 7. Infrastructure Data management issues: Remember those problems with portability? --Data analyzed in proprietary programs --Data not publicly available or hard to navigate

55 TROLLing Tromsø Repository of Language and Linguistics
International archive of data and code All items open-source, open access Searchable metadata Verify results, see how to implement various statistical models Housed at UiT library Connected to CLARIN (Common Language Resources and Technology Infrastructure, a networked federation of European data repositories)

56 8. Applications A model for applications:
If we have a finding that is connected to a Big Question and is statistically robust, it should also be USEFUL Language technology can implement useful results in: --Disambiguation, Parsing These feed into: --Pedagogical applications --Machine translation --Corpus analysis tools --Language revitalization --Language proofing tools A model for applications:


Download ppt "Linguistic Profiling Laura A. Janda"

Similar presentations


Ads by Google