Download presentation
Presentation is loading. Please wait.
Published byTerence Thornton Modified over 9 years ago
1
语料库研究中的 主题词分析方法及其扩展 中国外语教育研究中心 梁茂成 An extension to the keyword approach in corpus analysis
2
主要内容 Keywords Applications of corpus comparison Limitations to the keyword approach Keywords+ Demo
3
Keywords ☻ Keywords: ☺ Keywords are words whose frequency is unusually high (or low) in comparison with some norm. (Scott, 2003)
4
Keywords ☻ Positive keywords: ☺ Words which occur more often than would be expected by chance in comparison with the reference corpus.
5
Keywords ☻ Negative keywords: ☺ Words which occur less often than would be expected by chance in comparison with the reference corpus.
6
Keywords ☻ Positive and negative keywords ☺ In a corpus of business English, words such as business, profit and companies are likely to be positive keywords if the corpus is to be compared with a general corpus.
7
Keywords ☻ Positive and negative keywords ☺ In a corpus of academic English, words such as morning, afternoon and evening are likely to be negative keywords if the corpus is to be compared with a general corpus.
8
Keywords ☻ Calculating keyness (Rayson et al. 2004, Oakes 1998) ☺ Chi-square
9
Keywords Chi-square
10
Keywords Chi-square with Yate’s correction
11
Keywords Loglikelihood References: http://ucrel.lancs.ac.uk/llwizard.html
12
Keywords ☻ Previous research has revealed that loglikelihood is a better measure than chi-square when comparing word frequencies in corpora.
13
Keywords ☻ Ways to find keywords: ☺ Top-down: corpus-based ☺ Buttom-up: corpus-driven
14
Applicatons of… ☺ Comparison across users ☺ Comparison across genres ☺ Comparison across times ☺ Comparison across (varieties of) languages
15
Applicatons of… ☺ Compiling a specialized dictionary ☺ Detecting the topic ☺ Genre analysis ☺ Contrastive Interlanguage Analysis ☺ ……
16
Limitations to… ☻ Keywords: ☺ Do keywords have to be single words? Phraseology seems more interesting! ☺ Do keywords have to be lexical words? POS tag sequences may also be interesting. ☺ Can we bring together the bottom-up approach and the top-down approach?
17
Limitations to… ☻ Top-down: the problem is I do not yet know what may be interesting.
18
Limitations to… ☻ Buttom-up: the problem is that I have been given a long list of keywords, only some of which are interesting, buried among many others which do not seem interesting at all.
19
Keywords+ ☻ Support multiword sequences ☻ Support online search ☻ Support POS tag sequences ☻ Support regex search
20
Demo ☻ demo
21
Thank you.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.