Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measuring Semantic Similarity between Words Using HowNet ICCSIT 2008 Liuling DAI, Yuning XIA, Bin LIU, ShiKun WU School of Computer Science, Beijing Institute.

Similar presentations


Presentation on theme: "Measuring Semantic Similarity between Words Using HowNet ICCSIT 2008 Liuling DAI, Yuning XIA, Bin LIU, ShiKun WU School of Computer Science, Beijing Institute."— Presentation transcript:

1 Measuring Semantic Similarity between Words Using HowNet ICCSIT 2008 Liuling DAI, Yuning XIA, Bin LIU, ShiKun WU School of Computer Science, Beijing Institute of Technology

2 HowNet W_C= 工夫 DEF={Ability| 能力 :host={human| 人 }} DEF={Strength| 力量 :host={group| 群體 }{human| 人 }} DEF={time| 時間 } Word : 工夫 Concept : {Ability| 能力 :host={human| 人 }} Sememe : Ability| 能力

3 Algorithms Similarity between sememes Similarity between concepts Similarity between words Amendment with thesaurus

4 Similarity between sememes Strategy 1 Strategy 2 d : Distance between S1 and S2 h : Depth of the first common parent node of the two sememes α, β : Parameters to adjust d,h

5 Similarity between concepts Word “Doctor” DEF={human| 人 :{own| 有 :possession={Status| 身 分 :domain={education| 教育 },modifier={HighRank| 高 等 :degree={most| 最 }}},possessor={~}}} Human → Primary sememe Status, own … → Modifying sememe Possession, domain … → Descriptors

6 Similarity between concepts P, Q : Two concepts. Assume P has less number of modifying sememe. P_i, Q_j : ith, jth modifying sememe of P, Q. S, T : Descriptor set of P, Q α,β,γ : Weight of 3 parts

7 Similarity between words One word may has many concepts. Choose the most similar pair.

8 Amendment with thesaurus Some words are missing and some DEFs are too rough in in HowNet. Using Chinese thesaurus Tongyici Cilin ( 同義詞詞林 ) 應為哈爾濱工業大學 IR-Lab 的哈工大信息檢索研究室同義詞詞林擴展版 d : Distance between W1 and W2

9 Similarity between words Sim1 : Eq. 6 (Similarity in HowNet) Sim2 : Eq. 7 (Similarity in Tongyici Cilin) α,β,γ,η : Parameters to scale the weights of the two parts.

10 Evaluation Dataset – RG-65 Rubenstein and Goodenough established synonymy judgments for 65 pairs of nouns. They invited 51 human judges to assign every pair a score between 0.0 and 4.0 to indicate semantic similarity. – MC-28 Miller and Charles follow this idea and restricted themselves to 30 pairs of nouns selected from Rubenstein and Goodenough’s list, divided equally amongst words with high, intermediate and low similarity. For measuring similarity between Chinese words, translate RG-65 into Chinese manually.

11 Evaluation Parameters – Similarity between sememes Strategy 1 : α = 1.6, β = 0.16 Strategy 2 : α = 0.2, β = 0.16 – Similarity between concepts α = 0.54, β = 0.36, γ = 0.1 – Similarity between words On Chinese dataset :α = 0.95,β = 0.05,γ = 0.95,η = 0.05 On English dataset : α = 0.95,β = 0.05,γ = 0.45,η = 0.55

12 Result – HAPI : HowNet_Get_Concept_Similarity in HowNet API

13 Result In addition, They compare results to eight groups of measures that rely on WordNet. Table 1. Correlations coefficient of algorithms ApproachRG-28MC-28RG-65 Hirst- St.Onge 0.6710.6820.732 Jiang0.670.6820.732 Leacock0.8010.820.852 Lin0.7730.8140.834 Resnik0.7060.7630.8 Yang0.8890.9210.897 Li0.89140.882N/A Alvarez0.90.913N/A S1-English0.92380.90740.8764 S2-English0.92860.90560.8744 HAPI-English0.53710.51130.6089 S1-Chinese0.86170.84010.8958 S2-Chinese0.86790.8460.895 HAPI- Chinese 0.53280.50010.6752

14 RG-65

15 MC-30 & RG-30


Download ppt "Measuring Semantic Similarity between Words Using HowNet ICCSIT 2008 Liuling DAI, Yuning XIA, Bin LIU, ShiKun WU School of Computer Science, Beijing Institute."

Similar presentations


Ads by Google