Presentation is loading. Please wait.

Presentation is loading. Please wait.

Keywords the words (or n word sequences) which are significantly more frequent in a specialised corpus than in a "reference corpus" generally, the reference.

Similar presentations


Presentation on theme: "Keywords the words (or n word sequences) which are significantly more frequent in a specialised corpus than in a "reference corpus" generally, the reference."— Presentation transcript:

1 keywords the words (or n word sequences) which are significantly more frequent in a specialised corpus than in a "reference corpus" generally, the reference corpus will be a large mixed corpus (like the BNC)

2 "keyness" the probability that the frequency found in the specialised corpus is due to chance eg 100 occurrences in specialised corpus of 100,000 words 1000 occurrences in reference corpus of 100,000,000 words occurrence is 100 times more frequent in specialised corpus p< less than 1% probability typically calculated using either chi squared log likelihood

3 stop lists lists of very common words (typically "function words" - articles, prepositions, conjunctions, pronouns, auxiliary verbs, negation elements ...) excluded from keyword lists

4 POS tagging Treetagger XML version another DETERMINER word NOUN
<w pos="det">another</w> <w pos="noun>word</w>


Download ppt "Keywords the words (or n word sequences) which are significantly more frequent in a specialised corpus than in a "reference corpus" generally, the reference."

Similar presentations


Ads by Google