Tel More Telugu Morphological Generator Madhavi Ganapathiraju and Lori Levin Language Technologies Institute Carnegie Mellon University Pittsburgh USA I am going to present a tool that can generate morphological forms of telugu words ICUDL 2006: Second International Conference on Universal Digital Library Alexandria, Egypt November 17-19, 2006
U D L machine translation Information retrieval Interface design digital storage summarization A number of language processing tools have emerged from the research base created by the universal digital library. This work that I am presenting fits well into the machine translation work presented by Prof Balki yesterday OCR 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
machine translation Rani gave the book to my mother OR 1. Phrase match in EBMT Gave to <noun> <noun> ki ichchaad’u OR 1. Output from English Lexical analysis gave Verb past, root give the book Noun phrase, singular, neutral mother noun, singular, feminine my possessive, root I … 2. English – Telugu Dictionary for root forms of nouns and verbs give ichchut’a book pustakamu mother talli, amma I neinu 3. TelMore: Morphological generator for Telugu 3. TelMore: Morphological generator for Telugu ichchut’a ichchaad’u (past masc), ichchinadi (past fem), ... Istun’di (future fem), istaad’u (future masc) pustakamu pustakamu, pustakamutoo (with pustakamu), pustakamu loo (in pustakamu)… amma ammaki (to amma), amma cheita (by amma) I naa (possessive) 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
TelMore Generates morphological forms for nouns and verbs when the root word is given 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
ICUDL2006: TelMore - Telugu Morphological Generator About Telugu 2nd largest spoken language in India (?) 70 M native speakers World ranking 13-17 with Korean, Vietnamese, Marathi and Tamil 7th century AD recorded origin literary language in 11th century AD 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
ICUDL2006: TelMore - Telugu Morphological Generator Parts of Speech: Noun Number: singular, plural Gender: male, female, neutral Morphological forms: (vibhaktulu) nominative, genitive, dative, accusative, vocative, instrumental and locative 14 forms for each noun 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Plural formation General rule is to add “lu” as a suffix; A series of rules are then applied to yield final form of : ©Õ (lu), ©Õx (llu), @ÁÙ} (l’l’u) or ¢œ¿Õx (n’d’lu) 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
ICUDL2006: TelMore - Telugu Morphological Generator Parts of Speech: Verb Number: singular, plural Gender: male, female, neutral Voice: 1st person, 2nd person, 3rd person Morphological forms: Present, past, future, aorist affirmative, aorist negative, imperative and prohibitive Present participle, past participle : affirmative and negative Number of forms: 2 x 3 x 3 x 7 + 4 130 forms for each verb 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Features in TelMore (v.1) Morphological form generation Nouns Verbs System Library module for integration elsewhere Flat file input & output (plain text or html) User-interactive through command line Web interface for data addition with user validation Web Interface 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Current Data Size words have been created by native speakers upon request 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
ICUDL2006: TelMore - Telugu Morphological Generator 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
ICUDL2006: TelMore - Telugu Morphological Generator 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
ICUDL2006: TelMore - Telugu Morphological Generator 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
ICUDL2006: TelMore - Telugu Morphological Generator 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
ICUDL2006: TelMore - Telugu Morphological Generator 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
ICUDL2006: TelMore - Telugu Morphological Generator Linguistic Knowledge The linguistic rules are taken from a book by C.P. Brown Rules are demonstrated through examples No formal description 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Noun: First Declension Morphs 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Noun: Second Declension 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Noun: Third Declension 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Noun: Third Declension: Irregular 2 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Noun: Third Declension: Irregular 3 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Noun: Third Declension: Irregular 4 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Noun: Third Declension: Irregular 5 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Verb: First Conjugation 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Verb: Second Conjugation 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Verb: Third Conjugation 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Alternate dialects and spellings Telugu is spoken in many dialects Andhra Pradesh has long borders with 4 states each of which speaks a different language, and one long coastal region Dialects in each of these regions is different learned and the others speak different dialects Urdu influence in Hyderabad due to Muslim rule pure/poetic formal/informal Telugu is written the way it is spoken Hence the different dialects result in different spellings of the words 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Future work for this tool Causative, middle and passive voices to be added Morphology of adjectives, etc Integration of Om native font integration for flat file processing Integration with English Lexicon to be of real use in multilingual applications 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
Acknowledgements Prof. Lori Levin Linguistics Advisor Prof. Raj Reddy Prof. N. Balakrishnan UDL Advisors R. Harsha Naveena Yanamala Web-interface creation Data Creation … V. Mythili Shyam G. Padmasree V. Abhinay B.V. Prashanth G. Ramana Lakshmi G. Padmavathy V. Nava Mallika 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator
http://linzer.blm.cs.cmu.edu/morph/ www.cs.cmu.edu/~madhavi 19th Nov, 2006 ICUDL2006: TelMore - Telugu Morphological Generator