Mike Scott Aston University INWWCT, Trondheim October 3rd, 2011 Networks of Key Words Mike Scott Aston University INWWCT, Trondheim October 3rd, 2011
Abstract The notion of keyness is important for document retrieval, for language learning and for study of the nature of text. Keyness, a textual not a linguistic quality, may be shared by certain words and phrases in one text, but its patterning is further distributed across text sets of various dimensions in associates (Scott, 1997) and clustering. This presentation considers the network patterns of keyness which can be investigated using quite simple software procedures and the extent to which these patternings may relate to a user’s needs and interests. Scott, M., 1997, "PC Analysis of Key Words -- and Key Key Words", System, Vol. 25, No. 1, pp. 1-13.
Key words (KWs) Issues Keyness Aboutness Distribution patterns of KWs … in texts and across corpora
complex pattern
or simple
fractal?
A fractal is "a rough or fragmented geometric shape that can be split into parts, each of which is (at least approximately) a reduced-size copy of the whole,"[1] a property called self-similarity (Wikipedia) [1] Mandelbrot, B.B. (1982). The Fractal Geometry of Nature. W.H. Freeman and Company. Fractal
aboutness importance a textual category Keyness
KWs
frequencies
aboutness what the text is about what the message is what it all means picture from mindreadersdictionary.com
importance centrality
PC Identification of KWs simple verbatim repetition no allowance for anaphora, synonymy, antonymy etc. simple frequency threshold one word, or more than one?
Corpus-based or corpus-driven? Machine-identified keyness is ideal for corpus-driven research The researcher lets the PC suggest areas needing further chasing up See recent work by McEnery, Baker, etc. Corpus-based or corpus-driven?
Dispersion within the text
Global KWs
Local KWs
middling burstiness verbs appears begins puts observes replies continues says considers etc.
Distribution patterns across the corpus
Key Key Words A "key key-word" is one which is "key" in more than one of a number of related texts. The more texts it is "key" in, the more "key key" it is.
Associates An "associate" of key-word X is another key-word (Y) which co-occurs with X in a number of texts. (It may or may not co-occur in proximity to key-word X.) Association strength measured using a standard collocation statistic, here MI3
Climate change LexisNexis database 9,444 stories UK press 2010
KKWs
Associates
waste
university
Conclusions but early days, lots of questions: KW patterns within individual texts within the corpus or sub-corpus but early days, lots of questions: are any KW patternings fractal? do specialised corpora have specialised KKWs?