Peter Grzybek & Ernst Stadlober Austrian Research Fund Project #15485 Quantitative Text Typology
… A Universe of Texts Let‘s suppose there is …
Is the Universe Structured ? Or Can We Structure it ? How Can the Text Universe Be Structured?
Corpus Analysis vs. Text Analysis „Text Mixture“ (Re-)Construction of a norm of a standard of „language“Text As a Homogeneous Entity Complete Text („Quasi Text“) Self-regulating System
What is a Text ? Complete novel, composed of books ? Complete book of a novel, consisting of several chapters ? Individual chapters ? Dialogical vs. narrative sequences within a text ? Two Major Problems: 1.Data Homogeneity 2.Definition of Basic Analytical Units
Both problems relevant for quantitative approaches WHY QUANTITATIVE APPROACHES ? ASSUMPTION: If a ‚text‘ is governed by synergetic processes, these processes can and must be quantitatively described. The descriptive models obtained for each ‚text‘, can be compaired to each other, possibly resulting in one or more general model(s). Thus, a quantitative typology of texts can be obtained.
Synergetics In a Nutshell – Frequencies and Dependencies WHY WORD LENGTH ? Word Length: Graphemes, Phonemes, Syllables, Morphemes,…
TYPES OF TEXT TYPOLOGIES I. Qualitative II. Quantitative-Qualitative a.Tabula Rasa Principle (Clustering Methods) b.A-priori A-posteriori Principle (Discrimination Methods)
Structuring the Text Universe (Ia): Text Sorts
Structuring the Text Universe (Ib): Functional Styles
In a qualitative approach, the text universe is structured with regard to external (pragmatic) factors („with reference to the world“) general communicative functions of language (functional styles) specific situational functions (text sorts)
Top-Down Bottom-Up
Top-Down Bottom-Up First and Second Order Cross Comparisons
Intended Emphasis on Letters ‚Letter‘ as a Prototype of Language 1.Located between Oral and Written Communication 2.Result of One Homogeneous Process of Text Generation
FUNCTIONAL STYLE AUTHOR(S)TEXT TYPE(S)NUMBER EVERYDAY LANGUAGE Cankar, JurčičPrivate Letters61 PUBLIC STYLEdiv. anon.Open Letters29 JOURNALISMdiv. anon.Readers‘ Letters, Comments 65 ARTISTIC STYLE Prose Cankar Švigelj-Mérat / Kolšek Individual Chapters from Short Novels („povest“) Letters from an Epistolary Novel PoetryGregorčičVersified Poems40 DramaJančarIndividual Acts from Dramas42 Textbasis (398 Slovenian Texts)
A Small World of Texts Word Length Frequencies (in %) of Four Texts Literary Prose Text (#256)Versified Poetic Text (#359) Journalistic Comment (#324)Private Letter (#1)
Post-Hoc-Tests (Text Sorts) Groups without significant differences form „homogeneous subgroups“ a.Homogeneous subgroups do exist b.All four letter types in different subgroups !
Post-Hoc-Analyses Homogeneous Subgroups Discriminant analyses Cases are attributed to groups, on the basis of specific predictor variables The variables are submitted to linear transformations in order to arrive at an optimal discrimination of the individual cases
Discriminant Analysis: Eight Text Sorts Discrimination variables: m 1, m 2, v, p 1 (56.30%)
Discriminant Analysis: Four Letter Types (n=213) {Private L.} {Ep. Novel} {Readers‘ L.} {Open L.} Discrimination variables: m 1, v %
Discriminant Analysis: Three Letters Types (n=213) {Private L., Ep. Novel} {Readers‘ L.} {Open L.} Discrimination variables: m 1, p % Distinction of Literary Letters Irrelevant ?
Discriminant Analysis: Private vs. Public Letters (n=213) {Private L., Ep. Novel}, {Readers‘ & Open L.} Discrimination variables: m 1, p % Distinction of Private vs. Public Styles ?
Discriminant Analysis: Private vs. Public Texts (n=248) {Private L., Ep. Novel}, {Readers‘ & Open L., Comments} Discrimination variables: m 1, p % Public vs. Private Styles ?
Discriminant Analysis: Private/Oral vs. Public/Written Texts (n=290) {Private L., Ep. Novel, Drama}, {Readers‘ & Open L., Comments} Discrimination variables: m 1, p % Oral vs. Written Styles ?
Towards a New Typology ? Discriminant Analysis: Three Text Types (n=330) {Private / Oral} {Public / Written} {Verse} Discrimination variables: m 1, p 2, v %
Discriminant Analysis: Four Text Types (n=398) {Private / Oral} {Public / Written} {Prose} {Verse} Discrimination variables: m 1, p 2, v %
Discriminant Analysis: Three Text Types (n=398) {Private / Oral} {Public / Written / Prose} {Verse} Discrimination variables: m 1, p 2, v %
This is the End …