Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles) automatic taggers as heuristic tools multilevel.

Similar presentations


Presentation on theme: "1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles) automatic taggers as heuristic tools multilevel."— Presentation transcript:

1 1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles) automatic taggers as heuristic tools multilevel approaches : the motives what do they have in common ?

2 2 Modern Approaches to Corpus Linguistics 2 1. Automatic taggers as heuristic tools  a LASLA research project : testing various automatic recognition software, know as taggers  Biber, 1993, Illouz, 1999, etc. : the quality of production can vary significantly - from one type of text to another - from one tagger to another.  Questions : - are the results better with a tagger trained - on one author or on a given text for another text - by the same author, or within the same discourse? - what can we deduce from those results regarding - the tagger or - the homogeneity of corpora?

3 2 Modern Approaches to Corpus Linguistics 3 1. Automatic taggers as heuristic tools  The test-texts : - book 3 of The Gallic Wars by Caesar – BGall3 (3673 tokens - The Conspiracy of Catilina by Sallust – SalCat. (10688 tokens), - book 3 of The History of Alexander the Great by Quintus Curtius – QC3 (7261 tokens), - The First Oration Against Catilina by Cicero – CicCat1 (3333 tokens) - poem 66 of Catullus – Catu66 (586 tokens)  Varying the nature of the training and evaluation corpus, in order to identify and measure variant factors : style of the work style of the author diachrony literary genre type of discourse

4 2 Modern Approaches to Corpus Linguistics 4 1. Automatic taggers as heuristic tools  In theoretical terms : taggers appear to have some value as heuristic instruments  For instance, highlight - the homogeneity of the historical style over and above diachronic development - the gap between narration and discourse (speeches) - the gap between the styles of Caesar and Cicero - a smaller gap between Catullus and Cicero or between Catullus and Quintus Curtius/Tacitus than the gap between Catullus and Caesar, etc

5 2 Modern Approaches to Corpus Linguistics 5 2. Multilevel approaches : the “motives”  Some indicators intuitively catalogued in Latin narrative prose - sequences of verb tenses - lexical elements repente, subito ‘suddenly’, ‘abruptly’ - syntactical structures / ‘linking clichés’ Quibus rebus cognitis ‘Those things being known’ Quod ubi animaduertit ‘When he had noticed that’  Limits - no very analysis as text’s structure indicators - no study of their interaction - poor use for characterising text genre and style

6 2 Modern Approaches to Corpus Linguistics 6 2. Multilevel approaches : the “motives”  The Discourse Modes and Bases Approach - Kroon, 2007, 2009; Adema, 2007, 2008, 2009 - a priori definition of typical features for each discourse mode - in order to evaluate text homogeneity  LASLA and BCL approach - to develop endogenous exploratory methods - to take into account this text linearity - to specify functional convergences between several indicators  methods - calling upon mathematical models (neighborhoods, bursts) - combining - small-scale qualitative approach - large-scope quantitative analysis

7 2 Modern Approaches to Corpus Linguistics 7 3. What do these approaches have in common ?  they take texts and discourses into account in both their dimensions - the multilevel nature of texts and of languages, from phonetics to pragmatics - the fact that texts and discourses - are organized according to linearity - can be considered as topological entities.


Download ppt "1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles) automatic taggers as heuristic tools multilevel."

Similar presentations


Ads by Google