Presentation is loading. Please wait.

Presentation is loading. Please wait.

语料库研究中的 主题词分析方法及其扩展 中国外语教育研究中心 梁茂成 An extension to the keyword approach in corpus analysis.

Similar presentations


Presentation on theme: "语料库研究中的 主题词分析方法及其扩展 中国外语教育研究中心 梁茂成 An extension to the keyword approach in corpus analysis."— Presentation transcript:

1 语料库研究中的 主题词分析方法及其扩展 中国外语教育研究中心 梁茂成 An extension to the keyword approach in corpus analysis

2 主要内容 Keywords Applications of corpus comparison Limitations to the keyword approach Keywords+ Demo

3 Keywords ☻ Keywords: ☺ Keywords are words whose frequency is unusually high (or low) in comparison with some norm. (Scott, 2003)

4 Keywords ☻ Positive keywords: ☺ Words which occur more often than would be expected by chance in comparison with the reference corpus.

5 Keywords ☻ Negative keywords: ☺ Words which occur less often than would be expected by chance in comparison with the reference corpus.

6 Keywords ☻ Positive and negative keywords ☺ In a corpus of business English, words such as business, profit and companies are likely to be positive keywords if the corpus is to be compared with a general corpus.

7 Keywords ☻ Positive and negative keywords ☺ In a corpus of academic English, words such as morning, afternoon and evening are likely to be negative keywords if the corpus is to be compared with a general corpus.

8 Keywords ☻ Calculating keyness (Rayson et al. 2004, Oakes 1998) ☺ Chi-square

9 Keywords Chi-square

10 Keywords Chi-square with Yate’s correction

11 Keywords Loglikelihood References: http://ucrel.lancs.ac.uk/llwizard.html

12 Keywords ☻ Previous research has revealed that loglikelihood is a better measure than chi-square when comparing word frequencies in corpora.

13 Keywords ☻ Ways to find keywords: ☺ Top-down: corpus-based ☺ Buttom-up: corpus-driven

14 Applicatons of… ☺ Comparison across users ☺ Comparison across genres ☺ Comparison across times ☺ Comparison across (varieties of) languages

15 Applicatons of… ☺ Compiling a specialized dictionary ☺ Detecting the topic ☺ Genre analysis ☺ Contrastive Interlanguage Analysis ☺ ……

16 Limitations to… ☻ Keywords: ☺ Do keywords have to be single words? Phraseology seems more interesting! ☺ Do keywords have to be lexical words? POS tag sequences may also be interesting. ☺ Can we bring together the bottom-up approach and the top-down approach?

17 Limitations to… ☻ Top-down: the problem is I do not yet know what may be interesting.

18 Limitations to… ☻ Buttom-up: the problem is that I have been given a long list of keywords, only some of which are interesting, buried among many others which do not seem interesting at all.

19 Keywords+ ☻ Support multiword sequences ☻ Support online search ☻ Support POS tag sequences ☻ Support regex search

20 Demo ☻ demo

21 Thank you.


Download ppt "语料库研究中的 主题词分析方法及其扩展 中国外语教育研究中心 梁茂成 An extension to the keyword approach in corpus analysis."

Similar presentations


Ads by Google