Download presentation
Presentation is loading. Please wait.
Published byAdele Sarah Palmer Modified over 9 years ago
1
1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles) automatic taggers as heuristic tools multilevel approaches : the motives what do they have in common ?
2
2 Modern Approaches to Corpus Linguistics 2 1. Automatic taggers as heuristic tools a LASLA research project : testing various automatic recognition software, know as taggers Biber, 1993, Illouz, 1999, etc. : the quality of production can vary significantly - from one type of text to another - from one tagger to another. Questions : - are the results better with a tagger trained - on one author or on a given text for another text - by the same author, or within the same discourse? - what can we deduce from those results regarding - the tagger or - the homogeneity of corpora?
3
2 Modern Approaches to Corpus Linguistics 3 1. Automatic taggers as heuristic tools The test-texts : - book 3 of The Gallic Wars by Caesar – BGall3 (3673 tokens - The Conspiracy of Catilina by Sallust – SalCat. (10688 tokens), - book 3 of The History of Alexander the Great by Quintus Curtius – QC3 (7261 tokens), - The First Oration Against Catilina by Cicero – CicCat1 (3333 tokens) - poem 66 of Catullus – Catu66 (586 tokens) Varying the nature of the training and evaluation corpus, in order to identify and measure variant factors : style of the work style of the author diachrony literary genre type of discourse
4
2 Modern Approaches to Corpus Linguistics 4 1. Automatic taggers as heuristic tools In theoretical terms : taggers appear to have some value as heuristic instruments For instance, highlight - the homogeneity of the historical style over and above diachronic development - the gap between narration and discourse (speeches) - the gap between the styles of Caesar and Cicero - a smaller gap between Catullus and Cicero or between Catullus and Quintus Curtius/Tacitus than the gap between Catullus and Caesar, etc
5
2 Modern Approaches to Corpus Linguistics 5 2. Multilevel approaches : the “motives” Some indicators intuitively catalogued in Latin narrative prose - sequences of verb tenses - lexical elements repente, subito ‘suddenly’, ‘abruptly’ - syntactical structures / ‘linking clichés’ Quibus rebus cognitis ‘Those things being known’ Quod ubi animaduertit ‘When he had noticed that’ Limits - no very analysis as text’s structure indicators - no study of their interaction - poor use for characterising text genre and style
6
2 Modern Approaches to Corpus Linguistics 6 2. Multilevel approaches : the “motives” The Discourse Modes and Bases Approach - Kroon, 2007, 2009; Adema, 2007, 2008, 2009 - a priori definition of typical features for each discourse mode - in order to evaluate text homogeneity LASLA and BCL approach - to develop endogenous exploratory methods - to take into account this text linearity - to specify functional convergences between several indicators methods - calling upon mathematical models (neighborhoods, bursts) - combining - small-scale qualitative approach - large-scope quantitative analysis
7
2 Modern Approaches to Corpus Linguistics 7 3. What do these approaches have in common ? they take texts and discourses into account in both their dimensions - the multilevel nature of texts and of languages, from phonetics to pragmatics - the fact that texts and discourses - are organized according to linearity - can be considered as topological entities.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.