Corpus Linguistics: Counting words, texts or features Mike Scott, University of Liverpool Corpus Linguistics Summer Institute June-July 2008
Aims to identify what is in principle countable using CL techniques to consider what it is in principle desirable to count and why
No, not that kind of sentence
What have we got, anyway? electronic texts is anything missing?
What is a text, anyway?
What we’re looking at Words in Texts sentences paragraphs sections key words etc. Words in the Brain memory e.g. tip-of-the-tongue word associations enjoyment priming Words in the Language lexicography terminology, phraseology, etc. patterns of “standard English” Words in Culture cultural key words, indicators of class and stance, bias, etc.
What is countable? characters word-forms parts of speech sentences headings? paragraphs? lines? pages? other divisions (section, chapter) if marked up utterances turns grammatical sequences
What isn’t countable? metaphors semantic prosody patterns because these are abstractions
though we have to try … by seeking various markers, frames signalling these abstractions recognising, however, that 1 form ≠ 1 function Corpus Linguistics is all about pattern-seeking!
Why counting, anyway? search for interpretations understanding re-defining categories via patterns WordSmith
What should we count? the question of focus the question of scope pointfulness: the search for patterns the POS-trap metadata are used to forget the data (François Rastier)
Reference Scott, M. & C. Tribble, Textual Patterns: keyword and corpus analysis in language education, Amsterdam: Benjamins. Chapters 1 & 2.