Download presentation
Presentation is loading. Please wait.
1
Tel More Telugu Morphological Generator
Madhavi Ganapathiraju and Lori Levin Language Technologies Institute Carnegie Mellon University Pittsburgh USA I am going to present a tool that can generate morphological forms of telugu words ICUDL 2006: Second International Conference on Universal Digital Library Alexandria, Egypt November 17-19, 2006
2
U D L machine translation Information retrieval Interface design
digital storage summarization A number of language processing tools have emerged from the research base created by the universal digital library. This work that I am presenting fits well into the machine translation work presented by Prof Balki yesterday OCR 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
3
machine translation Rani gave the book to my mother OR
1. Phrase match in EBMT Gave to <noun> <noun> ki ichchaad’u OR 1. Output from English Lexical analysis gave Verb past, root give the book Noun phrase, singular, neutral mother noun, singular, feminine my possessive, root I … 2. English – Telugu Dictionary for root forms of nouns and verbs give ichchut’a book pustakamu mother talli, amma I neinu 3. TelMore: Morphological generator for Telugu 3. TelMore: Morphological generator for Telugu ichchut’a ichchaad’u (past masc), ichchinadi (past fem), ... Istun’di (future fem), istaad’u (future masc) pustakamu pustakamu, pustakamutoo (with pustakamu), pustakamu loo (in pustakamu)… amma ammaki (to amma), amma cheita (by amma) I naa (possessive) 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
4
TelMore Generates morphological forms for nouns and verbs
when the root word is given 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
5
ICUDL2006: TelMore - Telugu Morphological Generator
About Telugu 2nd largest spoken language in India (?) 70 M native speakers World ranking 13-17 with Korean, Vietnamese, Marathi and Tamil 7th century AD recorded origin literary language in 11th century AD 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
6
ICUDL2006: TelMore - Telugu Morphological Generator
Parts of Speech: Noun Number: singular, plural Gender: male, female, neutral Morphological forms: (vibhaktulu) nominative, genitive, dative, accusative, vocative, instrumental and locative 14 forms for each noun 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
7
Plural formation General rule is to add “lu” as a suffix;
A series of rules are then applied to yield final form of : ©Õ (lu), ©Õx (llu), @ÁÙ} (l’l’u) or ¢œ¿Õx (n’d’lu) 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
8
ICUDL2006: TelMore - Telugu Morphological Generator
Parts of Speech: Verb Number: singular, plural Gender: male, female, neutral Voice: 1st person, 2nd person, 3rd person Morphological forms: Present, past, future, aorist affirmative, aorist negative, imperative and prohibitive Present participle, past participle : affirmative and negative Number of forms: 2 x 3 x 3 x 7 + 4 130 forms for each verb 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
9
Features in TelMore (v.1)
Morphological form generation Nouns Verbs System Library module for integration elsewhere Flat file input & output (plain text or html) User-interactive through command line Web interface for data addition with user validation Web Interface 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
10
Current Data Size words have been created by native speakers upon request 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
12
ICUDL2006: TelMore - Telugu Morphological Generator
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
13
ICUDL2006: TelMore - Telugu Morphological Generator
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
14
ICUDL2006: TelMore - Telugu Morphological Generator
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
15
ICUDL2006: TelMore - Telugu Morphological Generator
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
16
ICUDL2006: TelMore - Telugu Morphological Generator
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
17
ICUDL2006: TelMore - Telugu Morphological Generator
Linguistic Knowledge The linguistic rules are taken from a book by C.P. Brown Rules are demonstrated through examples No formal description 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
18
Noun: First Declension Morphs
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
19
Noun: Second Declension
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
20
Noun: Third Declension
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
21
Noun: Third Declension: Irregular 2
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
22
Noun: Third Declension: Irregular 3
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
23
Noun: Third Declension: Irregular 4
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
24
Noun: Third Declension: Irregular 5
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
25
Verb: First Conjugation
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
26
Verb: Second Conjugation
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
27
Verb: Third Conjugation
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
28
Alternate dialects and spellings
Telugu is spoken in many dialects Andhra Pradesh has long borders with 4 states each of which speaks a different language, and one long coastal region Dialects in each of these regions is different learned and the others speak different dialects Urdu influence in Hyderabad due to Muslim rule pure/poetic formal/informal Telugu is written the way it is spoken Hence the different dialects result in different spellings of the words 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
29
Future work for this tool
Causative, middle and passive voices to be added Morphology of adjectives, etc Integration of Om native font integration for flat file processing Integration with English Lexicon to be of real use in multilingual applications 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
30
Acknowledgements Prof. Lori Levin Linguistics Advisor Prof. Raj Reddy
Prof. N. Balakrishnan UDL Advisors R. Harsha Naveena Yanamala Web-interface creation Data Creation … V. Mythili Shyam G. Padmasree V. Abhinay B.V. Prashanth G. Ramana Lakshmi G. Padmavathy V. Nava Mallika 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
31
http://linzer.blm.cs.cmu.edu/morph/ www.cs.cmu.edu/~madhavi
19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.