Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measuring Semantic Similarity between Words Using Web Search Engines WWW 07.

Similar presentations


Presentation on theme: "Measuring Semantic Similarity between Words Using Web Search Engines WWW 07."— Presentation transcript:

1 Measuring Semantic Similarity between Words Using Web Search Engines WWW 07

2 Abstract Semantic web-related applications –Community chain mining –Relation extraction –Automatic meta data extraction –Entity disambiguation results This paper consists of four page-count-based similarity scores and automatically extracted lexico-syntactic patterns from text snippets.

3 Introduction – 1/2 Page counts and snippets are two useful information sources provided by most Web search engines. Some problems –Page count analyses ignore the position of a word in a page two words appear in a page, they might not be related –Polysemous word (a word with multiple senses) apple as a fruit apple as a company

4 Introduction – 2/2 Lexico-syntactic patterns –various semantic relations also known as, is a, part of, is an example of

5 Method (Page-count-based) Page-count-based Similarity Scores (co- occurrence measures) C = 5

6 Method (Lexico-Syntactic Patterns) – 1/4 Extracting Lexico-Syntactic Patterns from Snippets –is a (X is a Y) –and (X and Y)

7 Method (Lexico-Syntactic Patterns) – 2/4 Given a set S of synonymous –n-grames : n=2,3,4, and 5

8 Method (Lexico-Syntactic Patterns) – 3/4 A set S of synonymous word-pairs –5000 word pairs of synonymous nouns from WordNet –4,562,471 unique patterns –80% occur less than 10 times A set of non-synonymous word-pairs –5000 word pairs of non-synonymous nouns from WordNet

9 Method (Lexico-Syntactic Patterns) – 4/4

10 Integrating Patterns and Page Counts

11 Experiments WebOverlap (rank=18,weight=2.45) Web-Jaccard (rank=66, weight=0.618) WebPMI (rank=138,weight=0.0001)

12 Benchmark Dataset –Rubenstein-Goodenough 28 word-pairs

13 Experiments

14 Semantic Similarity

15 Taxonomy-Based Methods

16

17

18 Community Mining 50 personal names from 5 communities: –tennis players, golfers,actors, politicians scientists –10 names from each community –B-CUBED

19 Conclusion Semantic web-related applications –Community chain mining –Relation extraction –Automatic meta data extraction –Entity disambiguation results This paper consists of four page-count-based similarity scores and automatically extracted lexico-syntactic patterns from text snippets.


Download ppt "Measuring Semantic Similarity between Words Using Web Search Engines WWW 07."

Similar presentations


Ads by Google