Indradhanush WordNet Project Consortium PRSG Meeting 30th April 2013, University of Mysore Indradhanush PRSG Report 2/25/2019
Contents Previous PRSG details Present Work Status Tools developed Utilities developed Websites and Computational Resources developed Financial Details Manpower Trained Equipments Purchased Future Work Plan Publications Indradhanush Consortium PRSG Report 2/25/2019
Previous PRSG Recommendations(1/4) First PRSG Meeting – 9th August 2011, Goa University Sense marking to be done to find the WordNet Coverage. Follow-up action taken: Newspaper corpus has been collected and used by all the Institutes for sense marking. All research papers published to be made available to DeitY for uploading on TDIL Center. Proceedings of the Goa Workshop have been submitted to Chairman of PRSG and will be given to DeitY for uploading. Indradhanush Consortium PRSG Report 2/25/2019
Previous PRSG Recommendations(2/4) PRSG recommended release of the next installment of the GIA to Consortium Leader Goa University subject to receipt of the Compiled Utilization Certificate (UC) for the released grants. Follow-up action taken: Compiled UC submitted by Consortium Leader (CL) on 31st January 2012. DeitY released the second year funds on 11th April 2012 to CL The CL released funds to all members depending on the utilization of their funds for the first year. Indradhanush Consortium PRSG Report 2/25/2019
Previous PRSG Recommendations(3/4) Second PRSG Meeting – 24th July 2012, Hyderabad University PRSG recommended the extension of project duration till 31st December 2012 Follow-up action taken: The project duration extended till 31st December 2012 and new deliverables set were as under – Linking and validation of minimum 27,000 synsets by each member Sense Marking of minimum 1,00,000 words. Testing and documenting the tools and utilities developed. Indradhanush Consortium PRSG Report 2/25/2019
Previous PRSG Recommendations(4/4) PRSG recommended release of the balance amount to the consortium leader after submission of the Consolidated U.C. Follow-up action taken: There was sufficient overall balance available with the consortium members and also scope for enhancement of the WordNet work. Hence it was requested to consider the extension for the period till 31st March 2013 instead of 31st December 2012. The request was accepted. Indradhanush Consortium PRSG Report 2/25/2019
Present Work Status(1/2) Synset Linking Status Language Noun Verb Adjective Adverb Total Hindi 28227 3098 6075 460 37860 Bengali 27281 2804 5815 445 36345 Gujarati 24896 2805 5828 33974 Kashmiri 17959 2354 6382 305 27000 Konkani 22976 2991 5689 474 32130 Odia 27216 2418 5273 377 35284 Punjabi 21625 2806 5786 442 30659 Urdu 21595 2800 5787 443 30625 Indradhanush Consortium PRSG Report 2/25/2019
Present Work Status(2/2) Sense Marking Status Language Corpus name No. of files Collected No. of files used for sense marking Total No. of words No. of sense marked words WordNet Coverage Bengali Newspaper (Anandabazar patrika) 9 6 92276 38637 41.87 Gujarati Gujarati News corpus 101 337094 112884 33.49% Kashmiri So:n Mira:s’ Kashmiri weekly newspaper 350 98350 42290 43.00% Konkani ‘Sunaparant’ Konkani daily newspaper 3433 625 213415 103456 48.48% Odia Newspaper (Sambad) and Articles 135 236125 100285 42.27% Punjabi Online Articles, News Text, Stories 98 216878 93279 43.01% Urdu Newspaper:“Jang urdu” ,“Nawai waqt” & “BBC urdu” 240 10 110000 50171 45.61% Indradhanush Consortium PRSG Report 2/25/2019
Tools developed (1/5) Synset Categorization Tool – by IIT Bombay To chose common linkable synsets across all languages by classifying them as Universal, Pan- Indian, etc. Released for use by consortium members after 1st Indradhanush Consortium Workshop at DDU Nadiad Synset Creation Tool – by IIT Bombay An offline interface to create synsets by using Hindi synsets as reference. Released for use by consortium members to create WordNets using Expansion Approach Indradhanush Consortium PRSG Report 2/25/2019
Tools developed (2/5) Sense Marker Tool – by IIT Bombay To find the synset coverage of a WordNet. Released for use by consortium members to assist in the task of Sense Marking Generic Stemmer for Indian Languages – by IIT Bombay To find the possible stems of a given word Released for use by consortium members http://www.cfilt.iitb.ac.in/~bornali/generic_stem mer/index.php Indradhanush Consortium PRSG Report 2/25/2019
Tools developed (3/5) WordNet Linkage Tool To link Hindi WordNet and English WordNet, uses 13 different heuristics to automatically identify top 5 English synsets for a given Hindi Synset. Released for use by consortium members, but currently mainly used by IITB Word Sense Disambiguation Portal Provides single access point to 9 different state of art word Sense disambiguation algorithms Released for use by consortium members Indradhanush Consortium PRSG Report 2/25/2019
Tools developed (4/5) WordNet CMS – v1.0, v2.0 – by Goa University Web based content management system to quickly develop customizable, interactive multilingual websites. Tested and documentation available Released for use by consortium members http://indradhanush.unigoa.ac.in/public/downloadTools/d ownloadTools.php CSS Manger Tool v1.0 – by Goa University Centralized Web based tool to manage Synset creation activity. Documentation available http://indradhanush.unigoa.ac.in/conceptspace/ Indradhanush Consortium PRSG Report 2/25/2019
Tools developed (5/5) Lexical Relation Creation Web Based Tool – by Thapar University, Patiala Tool to verify and create lexical relations in the WordNet This tool is under development Indradhanush Consortium PRSG Report 2/25/2019
Utilities developed Sense Marking Statistic Finder Utility – by Goa University Utility to find coverage statistics of the sense marked corpus. Tested and documentation available Synset Merger Utility – by Goa University Utility to merge different synset files into one single file. Indradhanush Consortium PRSG Report 2/25/2019
Websites and Computational Resources developed (1/2) Indradhanush WordNet Consortium Website v1.0 (http://indradhanush.unigoa.ac.in/) Bengali WordNet Website v1.0 (http://www.isical.ac.in/~lru/wordnetnew/) Gujarati WordNet Website v1.0 (http://www.cfilt.iitb.ac.in/gujarati/) Kashmiri WordNet Website v1.0 (http://indradhanush.unigoa.ac.in/kashmiriwordnet/) Konkani WordNet Website v2.0. (http://konkaniwordnet.unigoa.ac.in/) Odia WordNet Website v1.0 (http://indradhanush.unigoa.ac.in/odiawordnet) Punjabi WordNet Website v1.0 (http://punjabiwordnet.com/) Urdu WordNet Website v1.0 (http://indradhanush.unigoa.ac.in/urduwordnet) Indradhanush Consortium PRSG Report 2/25/2019
Websites and Computational Resources developed(2/2) IndoWordNet Database v1.0, v2.0, v3.0 Relational database structure to store WordNet data and relationships. Tested and documentation available Released for use by consortium members http://indradhanush.unigoa.ac.in/public/downloadTools/downloa dTools.php IndoWordNet API – v1.0, v2.0, v3.0 – by Goa University IndoWordNet Application Programming Interface (IWAPI) helps in providing access to the WordNet resources independent of the underlying storage technology. Implemented in Java as well as in Php Tested and` documentation available http://indradhanush.unigoa.ac.in/public/downloadTools/downloadTools.php Indradhanush Consortium PRSG Report 2/25/2019
Financial Details (1/3) Financial Status as on 2nd February 2013 Total funds received by Goa University from DeitY Rs. 281,83,413 Total Interest earned by all institutes on the received funds Rs. 4,99,687 Total amount including interest earned Rs. 286,83,100 Total amount spent by all Institutes Rs. 267,46,182 Total committed expenditure of all Institutes - Rs. 5,63,673 Total amount spent including the committed expenditure - Rs. 273,09,855 Total balance with Consortium [Rs. 286,83,100 – Rs. 273,09,855] Rs. 13,73,245 Total amount balance with DeitY [Rs. 299,52,000 – Rs. 286,83,100] Rs. 12,68,900 Net balance with the Consortium (Including the unreleased balance with DeitY) Rs. 26,42,145 Indradhanush Consortium PRSG Report 2/25/2019
Financial Details (2/3) Estimated Financial Status as on 30th April 2013 Total funds received by Goa University from DeitY Rs. 281,83,413 Total Interest earned by all institutes on the received funds Rs. 4,99,687 Total amount including interest earned Rs. 286,83,100 Total amount spent by all Institutes as on 2nd Feb Rs. 267,46,182 Total committed expenditure of all Institutes - Rs. 5,63,673 Total amount spent including the committed expenditure - Rs. 273,09,855 Estimated Total amount spent as on 30th April 2013 Including the committed expenditure Rs. 293,30,926 Estimated Total balance with Consortium as on 30th April 2013 [Rs. 286,83,100 – Rs. 293,30,926] - Rs. 6,47,826 Total amount balance with DeitY [Rs. 299,52,000 – Rs. 286,83,100] Rs. 12,68,900 Estimated Net balance with the Consortium as on 30th April 2013 (Including the unreleased balance with DeitY) Rs. 6,21,074 Indradhanush Consortium PRSG Report 2/25/2019
Financial Details (3/3) Fund Estimation for the proposed period till 31st July 2013 Budget Head Wise Fund Estimate for the period till 31st July 2013 (Appendix A) Institution Wise Fund Estimate for the period till 31st July 2013 (Appendix B) Indradhanush Consortium PRSG Report 2/25/2019
Manpower Trained Consortium Leader 1 Co-Consortium Leader Number Consortium Leader 1 Co-Consortium Leader Principal Investigator 8 Co-Principal Investigator 9 Project Manager 2 Office Assistant 3 Senior Linguist 11 Lexicographer 32 Computer Scientist 23 Research Scholar 4 Consultant 7 Total 101 Institution Wise Manpower Details in Appendix C Indradhanush Consortium PRSG Report 2/25/2019
Equipments Purchased Desktop 22 Laptop 24 Netbook 2 Server 1 Scanner Number Desktop 22 Laptop 24 Netbook 2 Server 1 Scanner Printer 5 UPS LCD Projector Hard Disk DVD Writer Wi-Fi dongle LCD Projector Screen Adapter KVM Switch Total 70 Indradhanush Consortium PRSG Report 2/25/2019
Future Work Plan (1/2) An extension is requested for the period till 31st July 2013. The following set of additional deliverables will be submitted at the end of this period Report on the Preliminary study carried out to give a Semantic Web Orientation to the Indradhanush WordNet and Gamification for Language Learning(IITB and GU) Each member will create an additional of 2,000 to 5,000 new synsets to increase the coverage of their WordNets(ALL members) Indradhanush Consortium PRSG Report 2/25/2019
Future Work Plan (2/2) Each member will sense mark an additional 25,000 to 50,000 words from Newspaper Corpus(ALL members) All tools will be documented, tested and uploaded on the Indradhanush WordNet Website http://indradhanush.unigoa.ac.in/ (beta version) hosted at Goa University and the link of this will be put up on the TDIL data Center. All WordNet papers published will be handed over to DeitY for uploading on TDIL center The balance amount is requested from DeitY to meet the expenses for the project extension period till 31st July 2013. Indradhanush Consortium PRSG Report 2/25/2019
Publications List of publications placed in ( Appendix D ) Indradhanush Consortium PRSG Report 2/25/2019
Thank You Indradhanush Consortium PRSG Report 2/25/2019