Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com Corpus analysis (2) Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com
Outline of the session Lecture Practical Keyword Reference corpus Key keyword Practical WST keyword AntConc keyword Wmatrix keyword / key concept Extra: keyword analysis with CQPweb
What is a keyword? Keywords are those words whose frequency is exceptionally high (positive keywords) or low (negative keywords) in comparison with a reference corpus Keywords usually refer to positive keywords But negative keywords are equally interesting (see Xiao and McEnery 2005) They appear at the very end of your listing, in a different colour in WordSmith They are omitted automatically from a keywords database for key keyword analysis and a keyword plot
Why keyword analysis? Indicating the ‘aboutness’ (Scott 1999) of a particular text or corpus Contents analysis, discourse analysis Also revealing the salient features which are functionally related to a particular genre (Xiao and McEnery 2005) Genre analysis, stylistic analysis
How to do keyword analysis Make a wordlist of the target corpus Locate or make a word list of a reference corpus Scott (2005) “In search of a bad reference corpus” http://www.methodsnetwork.ac.uk/redist/pdf/es1_05scott.pdf The reference corpus is usually larger than the target corpus The appropriateness of a reference corpus depends on your research questions! Compare the frequency of each item in the two wordlists to extract keywords – done automatically Analyse and interpret keywords – you will do it!
Keywords in the party speeches Target corpus – just one text David Cameron's speech at the Conservative conference (10 October 2012, Manchester) http://www.bbc.co.uk/news/uk-politics-15189614 Local copy available (David_speech Unicode text) - download and unzip the file into a file folder: www.fass.lancs.ac.uk/projects/corpus/data/workshop3texts.zip Reference corpus The 100-million-word BNC: download and unzip (local copy available) www.lexically.net/downloads/version4/BNC_World.zip Tool WST Keyword
Wordlist of David’s speech
Creating keyword list
Keyword extraction in progress Warning: It can take time if you have loaded two large wordlists
Keywords in David’s speech What do these keywords tell us? Negative keyword
Keyword: Plot view
What companies do keywords keep?
Why “marriage”?
Key clusters Similar to word clusters, but only keywords are used.
Key keywords A key keyword is one which is "key" in more than one of a number of related texts The more texts it is "key" in, the more "key key" it is Can avoid extracting keywords which are unusually frequent in only a small number of files Can be created automatically and as simple to extract as you do for keywords n.b. Negative keywords are omitted automatically from a key keyword list
Making a batch wordlist Specify a folder where you can write
Batch making keyword lists
Batch making keyword lists Specify a folder where you can write
Making a KW database
Key keywords key coverage of the corpus An "associate" is a keyword that appears in the same text
Keyword in AntConc target corpus reference corpus
Keyword in AntConc Key words in David's speech (in relation to Ed's speech)
Wmatrix: Keywords and key concepts POS and semantic tagging Keyword / key concept analysis in Cameron’s speech in comparison with Miliband’s speech Copy and paste the speeches into two separate text files http://www.bbc.co.uk/news/uk-politics-15189614 http://www.labour.org.uk/ed-milibands-speech-to-labour-party-conference Save the two texts as David_speech.txt and Ed_speech.txt www.fass.lancs.ac.uk/projects/corpus/data/workshop3texts.zip
Wmatrix: Keywords and key concepts Login with your account using zhejiangxx account http://ucrel.lancs.ac.uk/wmatrix3.html
Tagging Wizard
Tagging in progress
Tagging result
Labour frequency list
KWIC concordance
“My folders” Upload and tag Ed’s speech …and click on “My folders” Warning: Your folder view may look different!
Open David_speech folder and select Ed_speech in “Keyword compared to” dropdown box
Keyword list to download!
Keyword cloud – even more interesting!
David’s key concepts (“Key concepts compared to”)
Keyword analysis in online corpora Using Lancaster’s CQPweb to compare British English (LOB+FLOB) and American English (Brown + Frown) Login CQPweb http://cqpweb.lancs.ac.uk Similar analysis can be done at BSFU’s CQPweb corpus hub (different corpora) http://124.193.83.252/cqp/ Account: ID=pass=test
Creating subcorpora
Creating subcorpus BrE
Creating subcorpus AmE
Making wordlists
Wordlist available now
Computing keywords You can make adjustments to the statistical measure, cut-off point, and minimum frequency according your research purposes.
Keywords in BrE and AmE