You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim Wermter and Udo Hahn Jena University ACL 2006 Regular Conference Paper
Objective Compare the performance of frequency, t- test, LSM and LPM methods on collocation extraction and domain-specific automatic term recognition
Collocation Extraction Extract idioms “kick the bucket”
Domain-Specific Term Extraction Extract domain-specific phrases “mitochondrial inheritance”
Corpus
LSM A “linguistic knowledge-based” method for collocation extraction proposed by the same authors in another paper Assumes that idioms are less modifiable by supplements –e.g. “kick the beautiful bucket” probability of PNV triple having Supp k : f(x) : frequency of x
LSM Modifiability of a PNV triple Probability of a PNV triple Collocation Score
LPM A “linguistic knowledge-based” method for automatic term recognition proposed by the same authors in another paper Assumes that words in a phrase are less interchangeable –e.g mitochondrion inheritance money inheritance Modifiability of a phrase: mod k (n-gram) : replace k words sel i : particular replacement
LPM Phrase Score:
Evaluation Criteria Compared to the baseline frequency ranking method, a good ranking function should have the four characteristics: 1.Keep the true positives in the upper portion of the list 2.Keep the true negatives in the lower portion of the list 3.Demote true negatives from the upper portion 4.Promote true positives from the lower portion
Collocation Extraction Results
Automatic Term Recognition Results
Observations CE Criterion 1 –t-test and frequency methods have similar performance –LSM promotes some TPs to top 1/6 ATR Criterion 1 –t-test and frequency methods have similar performance –LPM promotes a few TPs to top 1/6
Observations CE Criterion 2 –LSM promotes a lot more TNs to upper portion than t-test method (bad…) ATR Criterion 2 –Same as above
Observations CE Criterion 3 –LSM demotes a lot more TNs to the lower portion than t-test ATR Criterion 3 –Same as above
Observations CE Criterion 4 –LSM promotes more TPs to upper portion than t-test ATR Criterion 4 –Same as above
Conclusion LSM and LPM methods are better than t- test and frequency methods Pure statistics methods are worse than knowledge-based methods