Keywords the words (or n word sequences) which are significantly more frequent in a specialised corpus than in a "reference corpus" generally, the reference.

keywords the words (or n word sequences) which are significantly more frequent in a specialised corpus than in a "reference corpus" generally, the reference corpus will be a large mixed corpus (like the BNC)

"keyness" the probability that the frequency found in the specialised corpus is due to chance eg 100 occurrences in specialised corpus of 100,000 words 1000 occurrences in reference corpus of 100,000,000 words occurrence is 100 times more frequent in specialised corpus p< less than 1% probability typically calculated using either chi squared log likelihood

stop lists lists of very common words (typically "function words" - articles, prepositions, conjunctions, pronouns, auxiliary verbs, negation elements ...) excluded from keyword lists

POS tagging Treetagger XML version another DETERMINER word NOUN
<w pos="det">another</w> <w pos="noun>word</w>

Keywords the words (or n word sequences) which are significantly more frequent in a specialised corpus than in a "reference corpus" generally, the reference.

Similar presentations

Presentation on theme: "Keywords the words (or n word sequences) which are significantly more frequent in a specialised corpus than in a "reference corpus" generally, the reference."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Keywords the words (or n word sequences) which are significantly more frequent in a specialised corpus than in a "reference corpus" generally, the reference.

Similar presentations

Presentation on theme: "Keywords the words (or n word sequences) which are significantly more frequent in a specialised corpus than in a "reference corpus" generally, the reference."— Presentation transcript:

Similar presentations

About project

Feedback