Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards the automatic identification of adjectival scales: clustering adjectives according to meaning Authors: Vasileios Hatzivassiloglou and Kathleen.

Similar presentations


Presentation on theme: "Towards the automatic identification of adjectival scales: clustering adjectives according to meaning Authors: Vasileios Hatzivassiloglou and Kathleen."— Presentation transcript:

1 Towards the automatic identification of adjectival scales: clustering adjectives according to meaning Authors: Vasileios Hatzivassiloglou and Kathleen R. McKeown Presenter: Marian Olteanu

2 Introduction Group adjectives according to their meaning  Semantic relateness – between adjectives which describe the same property Goal  Adjectival scales Method  Statistical  Augmented with linguistic information derived from the corpus

3 Adjectival scales Linguistic scale – set of words of the same grammatical category that can be ordered by their semantic strength or degree of informativeness Example: lukewarm, warm, hot Adjectives – elements on the scale can be partitioned into 2 groups, in each group – total order  Negative and positive degrees

4 Adjectival scales Tests for acceptance  Horn: “x even y”  Data sparseness – infrequent patterns in real corpora  Scales vary accros domains

5 Methodology Four stages  Extract linguistic data from the parsed corpus – word pairs Info processed by morphological component – group together similar pairs  Independent similarity modules – number between 0 and 1

6 Methodology Four stages (cont)  Module that combines all the similarity measures into one dissimilarity measure  Module that clusters adjectives into groups based on dissimilarity measure Linguistic data  That tell if adjectives are related – adj.-noun pairs  That tell if adjectives are unrelated – adj.-adj. pairs

7 Methodology Adj.-noun pairs  Distribution of nouns and adjective modifiers  Expectation: similar adjectives tend to modify the same set of nouns Adj.-adj. pairs  Adjectives that describe the same property do not appear in the same minimal NP Antithetical: hot cold, red black Non-antithetical: hot warm Adj. that modifies each other: light blue shirt

8 Computing similarity between adjectives Adjective-noun pairs  Robust non-parametric method – Kendall’s τ coefficient for two random variables with paired observations (X i,Y i ) and (X j,Y j ) – two pairs of observations for adj. X and Y on the nouns I and j  Concordant if X i >X j and Y i >Y j or X i <X j and Y i <Y j  Discordant, if X i >X j and Y i Y j τ = p c -p d Unbiased estimator:

9 Methodology Adjective-adjective pairs  Reject pairs that occur in the same NP  High accuracy, low coverage Combining similarity estimates  If pair was rejected by adj.-adj. module: dissimilarity = k (usually 10)  Else, dissimilarity = 1 – (similarity by adj.-noun module)

10 Clustering the adjectives Goal – optimal partition Algorithm  Non-hierarchical  Number of partitions – input parameter  Exchange method K-means is not applicable  Minimizing the objective function Φ

11 Clustering the adjectives Algorithm (cont.)  Random partition  Compute the improvement by moving an adjective to a different cluster  Hill-climbing Local minima Call the algorithm multiple times with different starting positions

12 Results

13 Clusters #5 and #8 – adjectives that indicate size Clustering discourages large clusters  Cluster #6: 5 words Methods to increase number of pairs  Larger corpus  More syntactical patterns

14 Evaluation  9 human judges manually created partitions (6 to 11 clusters)  “Cross-validation” for human judges: 49% to 59% for F-measure

15 Evaluation Lower bound  Monte Carlo analysis  F-measure: 1 in 20,000 trials  Fallout: 4.9%


Download ppt "Towards the automatic identification of adjectival scales: clustering adjectives according to meaning Authors: Vasileios Hatzivassiloglou and Kathleen."

Similar presentations


Ads by Google