Presentation is loading. Please wait.

Presentation is loading. Please wait.

You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim.

Similar presentations


Presentation on theme: "You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim."— Presentation transcript:

1 You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim Wermter and Udo Hahn Jena University ACL 2006 Regular Conference Paper

2 Objective Compare the performance of frequency, t- test, LSM and LPM methods on collocation extraction and domain-specific automatic term recognition

3 Collocation Extraction Extract idioms “kick the bucket”

4 Domain-Specific Term Extraction Extract domain-specific phrases “mitochondrial inheritance”

5 Corpus

6 LSM A “linguistic knowledge-based” method for collocation extraction proposed by the same authors in another paper Assumes that idioms are less modifiable by supplements –e.g. “kick the beautiful bucket” probability of PNV triple having Supp k : f(x) : frequency of x

7 LSM Modifiability of a PNV triple Probability of a PNV triple Collocation Score

8 LPM A “linguistic knowledge-based” method for automatic term recognition proposed by the same authors in another paper Assumes that words in a phrase are less interchangeable –e.g mitochondrion inheritance  money inheritance Modifiability of a phrase: mod k (n-gram) : replace k words sel i : particular replacement

9 LPM Phrase Score:

10 Evaluation Criteria Compared to the baseline frequency ranking method, a good ranking function should have the four characteristics: 1.Keep the true positives in the upper portion of the list 2.Keep the true negatives in the lower portion of the list 3.Demote true negatives from the upper portion 4.Promote true positives from the lower portion

11 Collocation Extraction Results

12 Automatic Term Recognition Results

13 Observations CE Criterion 1 –t-test and frequency methods have similar performance –LSM promotes some TPs to top 1/6 ATR Criterion 1 –t-test and frequency methods have similar performance –LPM promotes a few TPs to top 1/6

14 Observations CE Criterion 2 –LSM promotes a lot more TNs to upper portion than t-test method (bad…) ATR Criterion 2 –Same as above

15 Observations CE Criterion 3 –LSM demotes a lot more TNs to the lower portion than t-test ATR Criterion 3 –Same as above

16 Observations CE Criterion 4 –LSM promotes more TPs to upper portion than t-test ATR Criterion 4 –Same as above

17

18 Conclusion LSM and LPM methods are better than t- test and frequency methods Pure statistics methods are worse than knowledge-based methods


Download ppt "You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim."

Similar presentations


Ads by Google