Download presentation
Presentation is loading. Please wait.
Published byMay Paul Modified over 9 years ago
1
BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT PARTS-OF-SPEECH TAGGER
2
Part-of-speech (POS) tagging is the process of assigning words their part of speech tag. A part of speech tag is a label i.e. Noun, Verb, Adjectives, etc. POS is done by looking at the relationship with adjacent words. A simplified form is taught to school children. The Venda language has unique diacritics. ◦ INTRODUCTION
3
11/04/08 A Venda translator A generic tagger Best Solution A parts-of-speech tagger that allows the user to change tags to solve for ambiguity of tags. Compute initial Hidden Markov Models(HMMs). Compute test data Very ambitious Still ambitious THE DEVELOPMENT PROCESS
4
R Abney, S. Part-of-speech tagging and partial parsing. Brill, E. A simple rule-based part-of-speech tagger. Prez, L. C. IEEE information theory society newsletter. Samuelsson, C., and Voutilainen, A. Comparing a linguistic and a stochastic tagger. pp. 246–253. Shannon, C. E. A mathematical theory of communication. 11/04/08 Research User Requirements Prototype REQUIREMENT ANALYSIS
5
For all the code For all the databases 11/04/08 IMPLEMENTATION TOOLS
6
11/04/08 Change GUI GUI was criticized Use Dejavu fonts Displaying diacritics on the GUI Change the Character encoding Use a flat file. MySql Database not writing diacritics PROBLEMS WITH THE PROTOTYPE
7
USEROCCUPATION USER 1MSc STATS USER 2BSc Honors USER 3BSc Microbiology USER 4UCT MSc Sociology USABILITY TESTING
9
11/04/089 USABILITY TESTING
10
First Screen File Menu ◦ Open a file ◦ Exit View Menu ◦ Word frequency ◦ Count words Edit ◦ Clear Help Second Screen File Menu ◦ Save a file ◦ Exit View Menu ◦ Word frequency ◦ Count words Edit ◦ Word model 11/04/08 USER’S GUIDE
11
[1] Abney, S. Part-of-speech tagging and partial parsing. In Corpus-Based Methods in Language and Speech (Dordrecht, 1996), K. Church, S. Young, and G. Bloothooft, Eds., Kluwer Academic Publishers. [2] Brill, E. A simple rule-based part-of-speech tagger. In Proceedings of ANLP-92, 3rd Conference on Applied Natural Language Processing(Trento, IT, 1992), pp. 152–155. 11/04/08 REFERENCES
12
[3] Prez, L. C. Ieee information theory society newsletter. ISSN 105 53, 04 (2003), pp1–10. [4] Samuelsson, C., and Voutilainen, A. Comparing a linguistic and a stochastic tagger. In Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics (Morristown, NJ, USA, 1997), Association for Computational Linguistics, pp. 246–253. [5] Shannon, C. E. A mathematical theory of communication. The Bell System Technical (1948), pp1–12. 11/04/08 REFERENCES
13
Open a file ◦ View User manual Tagging a file. ◦ Search for multiple occurrences of word. ◦ Insert a diacritic. ◦ Copy and paste. ◦ Save a file Exit the system 11/04/08 THE DEMO
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.