Download presentation
Presentation is loading. Please wait.
Published byJonathan Davis Modified over 9 years ago
1
You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim Wermter and Udo Hahn Jena University ACL 2006 Regular Conference Paper
2
Objective Compare the performance of frequency, t- test, LSM and LPM methods on collocation extraction and domain-specific automatic term recognition
3
Collocation Extraction Extract idioms “kick the bucket”
4
Domain-Specific Term Extraction Extract domain-specific phrases “mitochondrial inheritance”
5
Corpus
6
LSM A “linguistic knowledge-based” method for collocation extraction proposed by the same authors in another paper Assumes that idioms are less modifiable by supplements –e.g. “kick the beautiful bucket” probability of PNV triple having Supp k : f(x) : frequency of x
7
LSM Modifiability of a PNV triple Probability of a PNV triple Collocation Score
8
LPM A “linguistic knowledge-based” method for automatic term recognition proposed by the same authors in another paper Assumes that words in a phrase are less interchangeable –e.g mitochondrion inheritance money inheritance Modifiability of a phrase: mod k (n-gram) : replace k words sel i : particular replacement
9
LPM Phrase Score:
10
Evaluation Criteria Compared to the baseline frequency ranking method, a good ranking function should have the four characteristics: 1.Keep the true positives in the upper portion of the list 2.Keep the true negatives in the lower portion of the list 3.Demote true negatives from the upper portion 4.Promote true positives from the lower portion
11
Collocation Extraction Results
12
Automatic Term Recognition Results
13
Observations CE Criterion 1 –t-test and frequency methods have similar performance –LSM promotes some TPs to top 1/6 ATR Criterion 1 –t-test and frequency methods have similar performance –LPM promotes a few TPs to top 1/6
14
Observations CE Criterion 2 –LSM promotes a lot more TNs to upper portion than t-test method (bad…) ATR Criterion 2 –Same as above
15
Observations CE Criterion 3 –LSM demotes a lot more TNs to the lower portion than t-test ATR Criterion 3 –Same as above
16
Observations CE Criterion 4 –LSM promotes more TPs to upper portion than t-test ATR Criterion 4 –Same as above
18
Conclusion LSM and LPM methods are better than t- test and frequency methods Pure statistics methods are worse than knowledge-based methods
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.