Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lemma: canonical (citation) form of a lexeme, which conventionally represents the set of related words Lexeme: the set of related words But….

Similar presentations


Presentation on theme: "Lemma: canonical (citation) form of a lexeme, which conventionally represents the set of related words Lexeme: the set of related words But…."— Presentation transcript:

1 Lemma: canonical (citation) form of a lexeme, which conventionally represents the set of related words Lexeme: the set of related words But….

2 Text Corpora: Speech Corpora: British National Corpus: 100M words
Brown Corpus: 1M words Hansards: 750K words Wall Street Journal: 914K words AP newswire: 620+M words Penn Treebank: +1M words, bracketed syntactically, WSJ+ Speech Corpora: London-Lund Corpus: 1M words Call Home: lots ATIS (7812 words) Switchboard: 240h (+3M words) Broadcast News: lots TDT: h (Eng,Ara,Mand) Communicator: 62h (317k words)


Download ppt "Lemma: canonical (citation) form of a lexeme, which conventionally represents the set of related words Lexeme: the set of related words But…."

Similar presentations


Ads by Google