Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information Michael RothSabine Schulte im Walde.

Similar presentations


Presentation on theme: "Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information Michael RothSabine Schulte im Walde."— Presentation transcript:

1 Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information Michael RothSabine Schulte im Walde Universität Stuttgart

2 Michael Roth Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information May 30, 2008 Overview Motivation / Introduction ▫Data-intensive lexical semantics ▫Corpus-based descriptions ▫Semantic Associations Our Work ▫Evaluation of data-driven models ▫Cross-comparison between resources Summary / Conclusions 2

3 Michael Roth Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information May 30, 2008 Data-intensive lexical semantics Modelling word meaning ▫Using meaning aspects ▫Automatically obtainable Goal: Determine (dis)similarity of words Applications: ▫Word sense discrimination ▫Anaphora resolution ▫... 3

4 Michael Roth Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information May 30, 2008 Corpus-based Descriptions Disadvantage: Corpus co-occurrence does not cover all aspects of word meaning ▫Especially world knowledge Our question: Can we find complementing information in other resources? ▫Dictionaries? ▫Encyclopaedias? 4

5 Michael Roth Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information May 30, 2008 Dictionary and Encyclopaedia Consider other resources: ▫Dictionaries contain detailed information about word senses ▫Encyclopaedias written knowledge compendiums How to identify meaning aspects? ▫In our work, we rely on semantic associations 5

6 Michael Roth Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information May 30, 2008 Semantic Associations Definition: ▫We define semantic associations as concepts spontaneously called to mind by other concepts (stimuli) Assumption: ▫Evoked words reflect highly salient linguistic and conceptual features 6

7 Michael Roth Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information May 30, 2008 Data Collection: Verb Stimuli Associates to verb stimuli ▫Web experiment ▫330 verb stimuli ▫30 seconds per verb 7 klagen ‘complain, moan, sue’ Gericht‘court’19 jammern‘moan’18 weinen‘cry’13 Anwalt‘lawyer’11 Richter‘judge’9 Klage‘complaint’7 Leid‘suffering’6 Trauer‘mourning’6 Klagemauer‘Wailing Wall’5 laut‘noisy’5

8 Michael Roth Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information May 30, 2008 Data Collection: Noun Stimuli Associates to noun stimuli ▫Offline experiment ▫409 noun stimuli ▫3 associates per noun 8 Schloss ‘castle, lock’ Schlüssel‘key’51 Tür‘door’15 Prinzessin‘princess’8 Burg‘castle’8 sicher‘safe’7 Fahrrad‘bike’7 schließen‘close’7 Keller‘cellar’7 König‘king’7 Turm‘tower’6

9 Michael Roth Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information May 30, 2008 Knowledge Resources Corpus data ▫German newspaper corpus ▫~200 mio. words Dictionary: WDG (Wörterbuch der deutschen Gegenwartssprache) ▫Freely available dictionary (130,000 entries) ▫Average of 840 words/entry Encyclopedia: Wikipedia ▫Free online encyclopedia (650,000 articles) ▫Average of 1,164 words/article 9

10 Michael Roth Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information May 30, 2008 Analysis: Vorgehensweise Corpus data ▫Extract co-occurrence windows of stimuli ▫Check windows for associations WDG / Wikipedia ▫Download stimuli entries ▫Check content for associations Missing entries: ▫WDG - 7%/0% ▫Wikipedia - 2%/54% 10

11 Michael Roth Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information May 30, 2008 Analysis: Resource Coverage 11 Noun + associate (all)Verb + associate (all) POSTypesTokens corpus 70%84% WDG 12%28% Wikipedia26%46% POSTypesTokens corpus 67%77% WDG 12%25% Wikipedia6%10% Resources differ in... coverage per stimuli part-of-speech token/type ratio proportions per associate‘s part-of-speech (next slide) 1.2 2.3 1.8 1.2 2.0 1.7

12 Michael Roth Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information May 30, 2008 Analysis: Resource Coverage (2) Proportions per associate‘s part-of-speech: Noun stimuli ▫Corpus – 88% V > 84% N > 83% Adj ▫WDG – 43% V > 31% Adj > 26% N ▫Wikipedia – 49% N > 39% Adj > 37% V Verb stimuli ▫Corpus – 91% Adv > 79% V > 77% Adj > 76% N ▫WDG – 29% Adv > 28% V > 25% N> 24% Adj ▫Wikipedia – 12% N > 9% Adj/Adv > 6% V 12

13 Michael Roth Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information May 30, 2008 Analysis: Cross-Comparison Noun + associate World knowledge? ▫Only in WDG /Wiki : carrot – orange, cry – tears,... ▫Only in Corpus: igloo – eskimo, teach – school,... Verb + associate 13 CorpusDicWiki Corpus-55.0%46.0% WDG 0.8%-5.7% Wiki3.2%18.1%- CorpusDicWiki Corpus-45.8%22.1% WDG 0.7%-3.9% Wiki0.5%3.6%-

14 Michael Roth Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information May 30, 2008 Summary / Conclusions Analysis of associations across resources Results: ▫Different coverage per stimuli (noun vs. verb) ▫Different (predominant) PoS in word descriptions ▫Different strength of semantic relatedness Resources complement each other => A combination of resources should be helpful for modelling word meaning and similarity 14

15 Michael Roth Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information May 30, 2008 15

16 Michael Roth Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information May 30, 2008 Questions? 16


Download ppt "Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information Michael RothSabine Schulte im Walde."

Similar presentations


Ads by Google