Download presentation
Presentation is loading. Please wait.
Published byKurt Pedersen Modified over 6 years ago
1
Lemma: canonical (citation) form of a lexeme, which conventionally represents the set of related words Lexeme: the set of related words But….
2
Text Corpora: Speech Corpora: British National Corpus: 100M words
Brown Corpus: 1M words Hansards: 750K words Wall Street Journal: 914K words AP newswire: 620+M words Penn Treebank: +1M words, bracketed syntactically, WSJ+ Speech Corpora: London-Lund Corpus: 1M words Call Home: lots ATIS (7812 words) Switchboard: 240h (+3M words) Broadcast News: lots TDT: h (Eng,Ara,Mand) Communicator: 62h (317k words)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.