Introduction to Corpus Linguistics: Key Word Analysis John Corbett & Wendy Anderson
Review So far we have looked at: Word frequencies Manual interpretation of concordance lines Using statistics to measure collocation in terms of frequency and probability Colligation (= collocation of grammatical categories) Concordance/dispersion plots But what if we want to ask whether a feature in a particular corpus/text is unusually frequent or infrequent? To answer this, we can compare corpora using key word analysis.
This session What is key word analysis and why is it useful? Comparison of corpora: specialised & reference Cultural key words Significance of key words Using AntConc (freeware)
Choosing corpora to compare For key word analysis you need: A specialised corpus you want to explore (Corpus A) A (usually larger) reference corpus (Corpus B) Let’s say Corpus A is a corpus of broadcast media reports, or works of fiction by a particular writer. What should Corpus B be?
Specialised and reference corpora… The nature of the specialised and reference corpora will be determined by your research questions (and also practical considerations like access). Comparing a specialised corpus (Corpus A) of news broadcasts to a balanced general reference corpus like BNC or CoCA would show you significant lexis in Corpus A compared to the language in general (ie BrEng or AmEng) Comparing a specialised corpus of news broadcasts (Corpus A) to a reference corpus of the same genre (Corpus B) will show you the significant lexis in Corpus A against language in texts of the same type.
Doing it ourselves… Step 1…build a specialised corpus (or even just look at a single text). For example, collect one or more political leaders’ speeches from the web: http://www.bbc.co.uk/news/uk-scotland-11560698 Copy and save as a plain text file, eg ‘Alex Salmond Speech 1’ Add more texts if you wish.
Doing it yourself… Step 2: Find a reference corpus, eg political texts from the SCOTS corpus. Go to www.scottishcorpus.ac.uk Click on Advanced Search Click on Written > TextType and Choose ‘Written record of speech’
Doing it yourself… Select some or all of the Scottish Parliament Body texts Click on Download and Save as a zip file. Unzip the contents of the plain text files into a folder, (you might call it ‘Parliamentary Reference Corpus’)
Choose a text analysis program, eg AntConc Choose your specialised corpus by clicking File> Open File and browsing for your target text(s)
Click Tool Preferences menu and choose Keyword Preferences
In Keyword Preferences… Choose a statistical measure of ‘keyness’ (the most common is ‘log likelihood’ but you can choose ‘chi square’) Choose a threshold for the number of keywords to be displayed (eg top 100) Choose whether or not to display ‘negative keywords’ (ie those words in the specialised corpus that have an unusually low frequency compared to the reference corpus)
In Keyword Preferences… Choose a reference corpus at the bottom of the Keyword Preferences menu. You can choose a ‘Directory’ ie a folder with a group of files. Choose the Parliamentary Reference Corpus folder as your reference corpus.
In Keyword Preferences… Click ‘Apply’
Back at the main screen Click ‘Words’ in the ‘Search Term’ option Click ‘Treat all data as lowercase’ Keep ‘Sort by Keyness’ Click Start
RESULTS!!!
Provisional interpretation of results We are comparing one political speech with a reference corpus of Scottish political discourse. Notice the unusual frequency of ‘I/we’ – is this a charismatic leader and a rhetoric of inclusion? Notice the intensification of ‘I’ towards the end. ‘Scotland/Scottish/nation’ – is this a nationalist speech? ‘Labour’ – the main political rival in Scotland ‘protect/NHS’ – government as carer Personal names (‘Jimmy’ = ally; ‘Cameron’ = enemy)
Provisional interpretation of results We are comparing one political speech with a reference corpus of Scottish political discourse. Notice the unusual infrequency of ‘finance’/’cost’ – is this is a speech that avoids economics? ‘problem’ – does the speech focus on upbeat topics? The keyword analysis, however, should only act as a point of departure for broader analysis.
Keywords and concordance plots It is sometimes interesting to look at how keywords are dispersed in a text. Simply load your text as for keywords, choose ‘concordance plot’ from the tab, and run a word search. Here, I have chosen ‘games’ for the Alex Salmond speech. It occurs four times in the speech…but where? And ‘I’ occurs 77 times, but where? Run concordance plots to find out.
‘Concordance plot’ for ‘games’ in Salmond speech
‘Concordance plot’ for ‘I’ in Salmond speech
Take-home messages Key word analysis is used when comparing corpora. Statistical programs are used to calculate words that appear unusually frequently or infrequently in one corpus, as opposed to the other one. This kind of analysis can tell us something interesting about the content, style and/or ideology of the corpus. We can combine key word analysis with other types of corpus search (eg concordance plots) to increase our understanding of the text.