Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE Department, I.I.T. Bombay Automatic Lexicon Generation through WordNet by Nitin Verma and Pushpak Bhattacharyya Jan 21, 2004.

Similar presentations


Presentation on theme: "CSE Department, I.I.T. Bombay Automatic Lexicon Generation through WordNet by Nitin Verma and Pushpak Bhattacharyya Jan 21, 2004."— Presentation transcript:

1 CSE Department, I.I.T. Bombay Automatic Lexicon Generation through WordNet by Nitin Verma and Pushpak Bhattacharyya Jan 21, 2004

2 CSE Department, I.I.T. Bombay Introduction u A lexicon is the heart of any natural language processing system. u Difficult to construct requiring enormous amount of time and man power. u Document specific dictionary generation – – Given a document D and word W therein, which sense S of W should be picked up from the document ? – Can one construct a document specific dictionary wherein single senses of the words are stored ?

3 CSE Department, I.I.T. Bombay UW Dictionary u An important machine readable lexical resource used by the enconverter and deconverter software's. Introduction Enconverter UW Dictionary Analysis Rules Natural Language UNL

4 CSE Department, I.I.T. Bombay u Format of dictionary entries – – Semantic attributes (derived from the ontology). – Syntactic attributes (POS, person, number, tense). – Used for the firing of appropriate analysis rules. Introduction (UW dictionary) [crane] “crane (icl>bird)” (N, ANIMT, FAUNA, BIRD); Restriction HW UW Attributes (both syntactic and semantic)

5 CSE Department, I.I.T. Bombay u Animate (ANIMT) – Flora (FLORA) v Shrubs (ANIMT, FLORA, SHRB), e.g. jasmine v Aquatic plants(ANIMT, FLORA, AQTC), e.g. lotus v …. – Fauna (FAUNA) v Mammals (MML) v Reptiles (ANIMT, FAUNA, RPTL), e.g. lizard v Birds (ANIMT, FAUNA, BIRD) v Fish (ANIMT, FAUNA, FISH) v Insects (ANIMT, FAUNA, INSCT), e.g. butterfly v …… Ontology* *Dictionary group, CFILT, IIT Bombay. Introduction

6 CSE Department, I.I.T. Bombay English-UW dictionary generation

7 CSE Department, I.I.T. Bombay u Resources used – – English WordNet, a WSD* system (soft word sense disambiguation method), the UNLKB and an inferencer. u Knowledge based approach. English-UW dictionary generation * G. Ramakrishnan and P. Bhattacharya. Soft Word Sense Disambiguation, GWN 2004

8 CSE Department, I.I.T. Bombay u Stage 1 – u Stage 2 – English-UW dictionary generation Method Word1 word2.. ----------- Input Document WSD* Word1:N:1 Word2:N:3 ----------- POS and Sense tagged document

9 CSE Department, I.I.T. Bombay English-UW dictionary generation (Method) Word1:pos1:sense1 Word2:pos2:sense2 ----------- Inference Engine KB WordNet Database of rules Tagged Document ----------- ------ UW Dictionary Explanation UNL KB

10 CSE Department, I.I.T. Bombay UW generation for nouns UW generation

11 CSE Department, I.I.T. Bombay UW generation for nouns crane:N:4 Word2:pos2:sense2 ----------- Inference Engine KB WordNet UNL KB Tagged Document crane:N:4 1

12 CSE Department, I.I.T. Bombay UW generation for nouns crane:N:4 Word2:pos2:sense2 ----------- Inference Engine KB WordNet UNL KB Tagged Document crane:N:4 A query to collect semantic information 1 2

13 CSE Department, I.I.T. Bombay UW generation for nouns crane:N:4 Word2:pos2:sense2 ----------- Inference Engine KB WordNet UNL KB Tagged Document crane:N:4 A query to collect semantic information crane bird fauna, animal organism 1 2 3

14 CSE Department, I.I.T. Bombay UW generation for nouns crane:N:4 Word2:pos2:sense2 ----------- Inference Engine KB WordNet UNL KB Tagged Document crane:N:4 A query to collect semantic information crane bird fauna, animal organism A query to collect relevant rules 1 4 2 3

15 CSE Department, I.I.T. Bombay UW generation for nouns crane:N:4 Word2:pos2:sense2 ----------- Inference Engine KB WordNet UNL KB Tagged Document crane:N:4 A query to collect semantic information crane bird fauna, animal organism A query to collect relevant rules 1 4 2 3 5 depthwordrelationrestriction 6birdiclanimal 5 iclliving thing 4 null

16 CSE Department, I.I.T. Bombay UW generation for nouns crane:N:4 Word2:pos2:sense2 ----------- Inference Engine KB WordNet UNL KB Tagged Document crane:N:4 A query to collect semantic information crane bird fauna, animal organism A query to collect relevant rules Crane(icl>bird) 1 4 2 3 5 6 depthwordrelationrestriction 6birdiclanimal 5 iclliving thing 4 null 6

17 CSE Department, I.I.T. Bombay UW generation for nouns crane:N:4 Word2:pos2:sense2 ----------- Inference Engine KB WordNet UNL KB Tagged Document crane:N:4 A query to collect semantic information crane bird fauna, animal organism A query to collect relevant rules Crane(icl>bird) 1 4 2 3 5 6 Explanation 7 depthwordrelationrestriction 6birdiclanimal 5 iclliving thing 4 null 6

18 CSE Department, I.I.T. Bombay UW generation for verbs UW generation

19 CSE Department, I.I.T. Bombay UW generation for verbs Input word {hypernyms(word)} Π {‘be’, ‘continue’, etc} = 0 true (icl > be) e.g. : exist (icl > be) {hypernyms(nominal word)} Π {‘phenomenon’, ‘natural event’, etc} = 0 true (icl > occur) e.g. : rain (icl > occur) false (icl > do)e.g. : make (icl > do)

20 CSE Department, I.I.T. Bombay UW generation for adjectives Input word UW present in the UNL KB ? Yes Pick the UW e.g. : broad (aoj > thing) No IS_DEFINED (is_a_value_of relation) on the input word ? Yes (aoj > thing) e.g. : good (aoj > thing) No (mod > thing)e.g. : green (mod > thing)

21 CSE Department, I.I.T. Bombay Semantic attribute generation English-UW dictionary generation (Method)

22 CSE Department, I.I.T. Bombay Semantic attribute generation crane:N:4 Word2:pos2:sense2 ----------- Inference Engine KB WordNet Database of rules Tagged Document crane:N:4 1

23 CSE Department, I.I.T. Bombay Semantic attribute generation crane:N:4 Word2:pos2:sense2 ----------- Inference Engine KB WordNet Database of rules Tagged Document crane:N:4 A query to collect semantic information 1 2

24 CSE Department, I.I.T. Bombay Semantic attribute generation crane:N:4 Word2:pos2:sense2 ----------- Inference Engine KB WordNet Database of rules Tagged Document crane:N:4 A query to collect semantic information crane bird fauna, animal organism 1 2 3

25 CSE Department, I.I.T. Bombay Semantic attribute generation crane:N:4 Word2:pos2:sense2 ----------- Inference Engine KB WordNet Database of rules Tagged Document crane:N:4 A query to collect semantic information crane bird fauna, animal organism A query to collect relevant rules 1 4 2 3

26 CSE Department, I.I.T. Bombay Semantic attribute generation crane:N:4 Word2:pos2:sense2 ----------- Inference Engine KB WordNet Database of rules Tagged Document crane:N:4 A query to collect semantic information crane bird fauna, animal organism A query to collect relevant rules IF hypernym=‘organism’ THEN generate ‘ANIMT’ ELSE generate ‘INANI’; IF hypernym=‘fauna’ THEN generate ‘FAUNA’; IF hypernym=‘bird’ THEN generate ‘BIRD’; --- ------ ---- 1 4 2 3 5

27 CSE Department, I.I.T. Bombay Semantic attribute generation crane:N:4 Word2:pos2:sense2 ----------- Inference Engine KB WordNet Database of rules Tagged Document crane:N:4 A query to collect semantic information crane bird fauna, animal organism A query to collect relevant rules IF hypernym=‘organism’ THEN generate ‘ANIMT’ ELSE generate ‘INANI’; IF hypernym=‘fauna’ THEN generate ‘FAUNA’; IF hypernym=‘bird’ THEN generate ‘BIRD’; --- ------ ---- (N,ANIMT,FAUNA,BIRD) 1 4 2 3 5 6

28 CSE Department, I.I.T. Bombay Database of rules Semantic attribute generation u No of such rules: 4344 HYPERNYMATTRIBUTE organismANIMT floraFLORA faunaFAUNA birdBIRD HYPERNYMATTRIBUTE changeVOA,CHNG communicateVOA,COMM moveVOA,MOTN completeVOA,CMPLT IS_A_VALUE_OFATTRIBUTE weightDES,WT strengthDES,STRNGTH qualDES,QUAL SYNONYMY OR ANTONYMY ATTRIBUTE brightDES,APPR deepDES,DPTH shallowDES,DPTH SYNONYMYATTRIBUTE backwardDRCTN alwaysFREQ frequentFREQ beautifullyMAN Table 1. Rules for nouns (96)Table 2. Rules for verbs (405) Table 4. Rules for adverbs (556) Table 3.2. Rules for adjectives (3258) Table 3.1. Rules for adjectives (29)

29 CSE Department, I.I.T. Bombay Experiments and Results No of correct entries in the dictionary Total no of entries in the dictionary Precision for nouns – 93.9%Precision for verbs – 84.4% Document No  Precision =

30 CSE Department, I.I.T. Bombay No of correct entries in the dictionary Total no of entries in the dictionary Precision for adjectives – 90.06%Precision for adverbs – 86% Document No  Precision = Experiments and results

31 CSE Department, I.I.T. Bombay Implementation details u Subtasks identified – – MySQL database is used for storing the rules and the UNL KB. v 7540 entries in the UNL KB. v 4344 entries in the rule base. – Inference engine in C++. – Web interface of the DDG in CGI & PHP. – Other utilities like UNL KB organizer, Rule entry interface, WSD integrator are implemented in Perl. – LOC 4761

32 CSE Department, I.I.T. Bombay Demo

33 CSE Department, I.I.T. Bombay Hindi-UW dictionary generation Method

34 CSE Department, I.I.T. Bombay Hindi-UW dictionary generation 1. WordNet API is used to obtain all possible parts-of-speech and all possible senses for every word. 2. Hindi WN is queried (by using Hindi WN API) to obtain the semantic attributes.

35 CSE Department, I.I.T. Bombay 2.Hindi WN is queried (by using Hindi WN API) to obtain the semantic attributes. 3.The Hindi UW dictionary database is queried (on the basis of input-word and its POS) to obtain an appropriate UW. 4.In this step the irrelevant entries are disabled and the incorrect ones are corrected manually by the lexicographer. Hindi-UW dictionary generation

36 CSE Department, I.I.T. Bombay Demo

37 CSE Department, I.I.T. Bombay u The burden of lexicography has been reduced considerably. u The system is being routinely used in our work on machine translation in a tri-language setting (English, Hindi and Marathi). u Future work will be directed towards the implementation of part-of-speech tagger and word-sense-disambiguator, for Hindi and Marathi languages. Conclusion and future work


Download ppt "CSE Department, I.I.T. Bombay Automatic Lexicon Generation through WordNet by Nitin Verma and Pushpak Bhattacharyya Jan 21, 2004."

Similar presentations


Ads by Google