Download presentation
Presentation is loading. Please wait.
Published byWinfred Armstrong Modified over 9 years ago
1
DRAVIDIAN WORDNET S.Arulmozi Dravidian University 29 April 2013
2
Tamil Thesaurus Preliminary work on lexical semantics. Monumental work on Tamil Thesaurus. Ontologicial classification of Tamil Vocabulary Rajendran, S. (2001) tamizhc coRkaLanjciyam. (in Tamil).Tamil University Publication. 29 April 2013
3
Domains in Tamil Thesaurus Tamil vocabulary is classified into four major domains: Entities Abstracts Events and Relationals 29 April 2013
4
parumaippeyarkaL `concrete nouns ' aHRinaippeyarkaL `irrational nouns' uyirillaatavai `non-living beings ' uruvaakkiya maRRum patananjceyta poruTkaL `manufactured and processed items' kaTTappaTTavai `constructed' Lexical Hierarchy of the Domain `Construction’
5
Nouns RelationsExample SynonymyviiTu ‘house’ - illam `house‘ Hypernymy-HyponymypaLLi 'school' – kalviccaalai 'educational institution‘ Hyponym-Hypernymykalluuri 'college' – aracukkalluuri `govt college‘ Holonymy-MeronymyndaaRkaali 'chair' - kaal 'leg‘ Meronymy-Holonymycakkaram 'wheel' to vaNTi 'cart‘ Related VerbpaTittal ‘reading’ – paTi ‘read’ Coordinate termskooyil `temple' – macuuti 'mosque' 29 April 2013
6
Verbs RelationsExample SynonympaTi ‘read’ – payilu ‘read’ Hypernymycuvai ‘taste’ – uNar TroponymykeeL ‘ask’– kenjcu ‘plead’ Nominalparuku `drink’ – parukutal `drinking’ Related NounkaNTupiTi `discover’ – kaNTupiTippu `discovery’ 29 April 2013
7
Tamil WordNet Objective: To build a WordNet for Tamil to enhance machine translation Resources: Tamil Thesaurus, Technical Glossaries (Tamil University Publications), Princeton English WordNet Funding Agency: Tamil Software Development Fund, Tamil Virtual University - 4 lacs Time Frame: 18 months 29 April 2013
8
Details Software used Front-end – Java Back-end - Mysql Database Project Deliverables 50k root words Relationships coded Stand-alone and web-based interface Embedded morphological analyser 29 April 2013
9
Statistics Total Words: 50497 Unique Senses: 41013 Nouns: 46710 Verbs: 2881 Adjectives: 416 Adverbs: 490 29 April 2013
10
Total Words: 50497 Unique Senses: 41013 29 April 2013 Project Completed (2004) http://www.nrcfosshelpline.in/code/wiki/TamilWordnet
11
29 April 2013
12
Standalone version – Tamil WordNet (Snapshot) 29 April 2013
13
Standalone version – Tamil WordNet (Snapshot) 29 April 2013
14
Web-version – Tamil WordNet (Snapshot) 29 April 2013
15
Web-version – Tamil WordNet (Snapshot) 29 April 2013
16
First Effort on Dravidian Languages National Workshop on WordNet for Dravidian Languages 2-3 June 2003 Organized by AU-KBC Research Centre, Chennai, Central Institute of Indian Languages, Mysore and Tamil University. Hands-on experience on specified domain – construction Report available on Global WordNet website 29 April 2013
17
MHRD Project Creation of Machine Translation tools and resources for English to Dravidian Languages: Pilot Study to develop Machine Translation(MT) system and needed linguistic resources for English-Dravidian languages(Tamil, Malayalam, Telugu and Kannada), This would facilitate the creation of rich educational contents in Indian languages. This research effort is to make all the tools and translation system to be based on Machine Learning methodologies so that computer graduates and other such non-linguists are able to immediately participate in the national mission on literacy by contributing additional tools for language translation. 29 April 2013
18
Modules Module 1: Machine Translation aims at developing teaching material corresponding to the tools developed so that it can be delivered as part of undergraduate computer science and engineering curriculum on data mining/machine learning. This will ensure a critical amount of man power required for sustaining translation effort needed for national mission on education. Module 2: Training aims at training 500 faculties selected from across the country on machine translation methodologies using machine learning techniques. Module 3: Dravidian WordNet aims at developing a Dravidian WordNet required for translation. 29 April 2013
19
Total Budget IIT Bombay – 15 lacs Amrita University – 40 lacs Tamil University – 15 lacs University of Hyderabad – 15 lacs Dravidian University – 15 lacs Time Frame 12 months March 30, 2009 – March 29, 2010 29 April 2013
20
Work done Part of a one year Pilot project involving Tamil, Telugu, Malayalam and Kannada Funding Agency: Ministry of HRD Duration: 18 months (July 2009-Dec 2010) Deliverable: 13k synsets 7k synsets linked to IndoWordNet, available at http://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php http://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php 29 April 2013
21
Statistics on Dravidian WordNet 29 April 2013
22
Publications `Tamil WordNet’, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Rajendran) `Building a WordNet’ for Dravidian Languages, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Rajendran, S.Gopakumar, V.Dhanalakshmi) `Representation of Kinship in WordNet’, Proceedings of the 9 th International Tamil Internet Conference, Coimbatore, 23-27 June 2010 (S.Arulmozi) `Polysemy in Tamil and other Indian Languages’, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Arulmozi & Panchanan Mohanty) `Telugu WordNet’, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Arulmozi) 29 April 2013
23
First IndoWordNet Workshop Amrita University 11-14 June 2009 Necessity for developing linked WordNets of different languages of India was stressed Challenges such as language divergence, lexical semantics, embedding WordNet in MT and cross-lingual search applications can be achieved Participation from groups: Hindi, Marathi, Sanskrit, Nepali, Assamese, Bodo, Manipuri, Konkani, Kashmiri, Tamil, Telugu, Malayalam, Kannada Proposal on Indhradhanush 29 April 2013
24
Dravidian WordNet Present Project Funded by DIT. 29 April 2013
25
Links Tamil WordNet – Open Source http://www.nrcfosshelpline.in/code/wiki/TamilWordnet VerbNet (English) http://verbs.colorado.edu/~mpalmer/projects/verbnet.html Princeton English WordNet http://wordnet.princeton.edu/ Global WordNet Association http://www.globalwordnet.org/ WordNets in the World http://www.globalwordnet.org/gwa/wordnet_table.htm WordNet Bibliography http://lit.csci.unt.edu/~wordnet/ IndoWordNet http://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php 29 April 2013
26
Thank you! 29 April 2013
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.