Download presentation
Presentation is loading. Please wait.
1
Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com
Corpus analysis (2) Corpus Linguistics Richard Xiao
2
Outline of the session Lecture Practical Keyword Reference corpus
Key keyword Practical WST keyword AntConc keyword Wmatrix keyword / key concept Extra: keyword analysis with CQPweb
3
What is a keyword? Keywords are those words whose frequency is exceptionally high (positive keywords) or low (negative keywords) in comparison with a reference corpus Keywords usually refer to positive keywords But negative keywords are equally interesting (see Xiao and McEnery 2005) They appear at the very end of your listing, in a different colour in WordSmith They are omitted automatically from a keywords database for key keyword analysis and a keyword plot
4
Why keyword analysis? Indicating the ‘aboutness’ (Scott 1999) of a particular text or corpus Contents analysis, discourse analysis Also revealing the salient features which are functionally related to a particular genre (Xiao and McEnery 2005) Genre analysis, stylistic analysis
5
How to do keyword analysis
Make a wordlist of the target corpus Locate or make a word list of a reference corpus Scott (2005) “In search of a bad reference corpus” The reference corpus is usually larger than the target corpus The appropriateness of a reference corpus depends on your research questions! Compare the frequency of each item in the two wordlists to extract keywords – done automatically Analyse and interpret keywords – you will do it!
6
Keywords in the party speeches
Target corpus – just one text David Cameron's speech at the Conservative conference (10 October 2012, Manchester) Local copy available (David_speech Unicode text) - download and unzip the file into a file folder: Reference corpus The 100-million-word BNC: download and unzip (local copy available) Tool WST Keyword
7
Wordlist of David’s speech
8
Creating keyword list
9
Keyword extraction in progress
Warning: It can take time if you have loaded two large wordlists
10
Keywords in David’s speech
What do these keywords tell us? Negative keyword
11
Keyword: Plot view
12
What companies do keywords keep?
13
Why “marriage”?
14
Key clusters Similar to word clusters, but only keywords are used.
15
Key keywords A key keyword is one which is "key" in more than one of a number of related texts The more texts it is "key" in, the more "key key" it is Can avoid extracting keywords which are unusually frequent in only a small number of files Can be created automatically and as simple to extract as you do for keywords n.b. Negative keywords are omitted automatically from a key keyword list
16
Making a batch wordlist
Specify a folder where you can write
17
Batch making keyword lists
18
Batch making keyword lists
Specify a folder where you can write
19
Making a KW database
20
Key keywords key coverage of the corpus
An "associate" is a keyword that appears in the same text
21
Keyword in AntConc target corpus reference corpus
22
Keyword in AntConc Key words in David's speech (in relation to Ed's speech)
23
Wmatrix: Keywords and key concepts
POS and semantic tagging Keyword / key concept analysis in Cameron’s speech in comparison with Miliband’s speech Copy and paste the speeches into two separate text files Save the two texts as David_speech.txt and Ed_speech.txt
24
Wmatrix: Keywords and key concepts
Login with your account using zhejiangxx account
25
Tagging Wizard
26
Tagging in progress
27
Tagging result
28
Labour frequency list
29
KWIC concordance
30
“My folders” Upload and tag Ed’s speech …and click on “My folders”
Warning: Your folder view may look different!
31
Open David_speech folder and select Ed_speech in “Keyword compared to” dropdown box
32
Keyword list to download!
33
Keyword cloud – even more interesting!
34
David’s key concepts (“Key concepts compared to”)
35
Keyword analysis in online corpora
Using Lancaster’s CQPweb to compare British English (LOB+FLOB) and American English (Brown + Frown) Login CQPweb Similar analysis can be done at BSFU’s CQPweb corpus hub (different corpora) Account: ID=pass=test
36
Creating subcorpora
37
Creating subcorpus BrE
38
Creating subcorpus AmE
39
Making wordlists
40
Wordlist available now
41
Computing keywords You can make adjustments to the statistical measure, cut-off point, and minimum frequency according your research purposes.
42
Keywords in BrE and AmE
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.