Download presentation
Presentation is loading. Please wait.
Published byNancy Bishop Modified over 6 years ago
1
keywords the words (or n word sequences) which are significantly more frequent in a specialised corpus than in a "reference corpus" generally, the reference corpus will be a large mixed corpus (like the BNC)
2
"keyness" the probability that the frequency found in the specialised corpus is due to chance eg 100 occurrences in specialised corpus of 100,000 words 1000 occurrences in reference corpus of 100,000,000 words occurrence is 100 times more frequent in specialised corpus p< less than 1% probability typically calculated using either chi squared log likelihood
3
stop lists lists of very common words (typically "function words" - articles, prepositions, conjunctions, pronouns, auxiliary verbs, negation elements ...) excluded from keyword lists
4
POS tagging Treetagger XML version another DETERMINER word NOUN
<w pos="det">another</w> <w pos="noun>word</w>
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.