Detecting compositionality using semantic vector space models based on syntactic context Guillermo Garrido and Anselmo Peñas NLP & IR Group at UNED Madrid,

Detecting compositionality using semantic vector space models based on syntactic context Guillermo Garrido and Anselmo Peñas NLP & IR Group at UNED Madrid, Spain {ggarrido,anselmo}@lsi.uned.es Shared Task System Description ACL-HLT 2011 Workshop on Distributional Semantics and Compositionality (DiSCo 2011) June 24, Portland, US

UNED nlp.uned.es Outline 1. About our participation 2. About the baselines

UNED nlp.uned.es Hypotheses 1. Non-compositional compounds are units of meaning 2. Compound meaning should be different from the meaning of the compound head Only partially true Doesn’t cover all cases of non-compositionality For similar approaches, see (Baldwin et al., 2003; Katz and Giesbrecht, 2006; Mitchell and Lapata, 2010).

UNED nlp.uned.es Example ≠

UNED nlp.uned.es Compositional example

UNED nlp.uned.es Approach 1. Lexico-syntactic contexts obtained from large corpora (UkWaC) 2. A compound as a set of vectors in different vector spaces 3. Classifier that model the compositionality Participation restricted to adjective-noun relations in English

UNED nlp.uned.es Lexico-syntactic contexts Matching the dependency trees to a set of pre-specified syntactic patterns Similarly to (Pado and Lapata, 2007) Frequency in the collection

UNED nlp.uned.es Syntactic dependency Context Word Subject of Object of Indirect Object of Passive logical subject of Passive subject of has prepositional complement modifies Which Contexts? Adjective + Noun Syntactic dependency Context Word is modified by Subject of to be with Predicate Predicate of to be with Subject has possesive modifier Is possesive modifier of And a few more …

UNED nlp.uned.es A compound as a set of vectors A vector space for each syntactic dependency has a vector in each space Compare to its complementary Complementary of : Set of all adjective-noun pairs with the same noun but a different adjective: = { | b≠a}

UNED nlp.uned.es Example of vectors hot dog Syntactic RelationContext WordFrequency an_obj skewer:v26 eat:v9 buy:v4 get:v4 sell:v4 want:v4 …… ann stand:n14 NAME11 stall:n5

UNED nlp.uned.es Approach Vector Space Subj-of hot dog hot c dog cosine hot dog, compositionality value 1 Vector Space Obj-of hot dog hot c dog blue chip, compositionality value 2 … …

UNED nlp.uned.es Why? We don’t know a priori what is the weight of each syntactic position We can try also to study it as a feature selection process

UNED nlp.uned.es Feature Selection Genetic algorithm for feature selection. Discarded: prepositional complexes noun complexes indirect object subject or attribute of the verb to be governor of a possessive. Among selected: subject and objects of both active and passive constructions dependent of possessives

UNED nlp.uned.es Classifiers Numeric evaluation task: Regression model by a SVM classifier Coarse scores: Binned the numeric scores dividing the score space in three equally sized parts

UNED nlp.uned.es Results (numeric ADJ-N task) Run Average Point Distance Spearman’s correlation ρ Kendall’s τ correlation UoY:Pro-Best14.620.330.23 UCPH-simple.en14.930.180.27 UoY:Exm-Best15.190.350.24 UoY: Exm15.820,260,18 (not directly comparable, above is for all phrases, below for ADJ_NN) RUN-SCORE-317.289 [5th]0.189 [12th]0.129 [12th] RUN-SCORE-217.180 [6th]0.219 [11th]0.145 [11th] RUN-SCORE-117.016 [7th]0.267 [8th]0.179 [9th] 0-response24.67–– Random34.57(0.02)

UNED nlp.uned.es Outline 1. About our participation 2. About the baselines

UNED nlp.uned.es About the baselines There is a bias in the training set: Average score = 68.4 Standard deviation = 21.7 A simple baseline can benefit from this: output for every sample the average score over the training set.

UNED nlp.uned.es Results Run Average Point Distance Spearman’s correlation ρ Kendall’s τ correlation RUN-SCORE-117.0160.2670.179 RUN-SCORE-217.1800.2190.145 RUN-SCORE-317.2890.1890.129 Training average17.370–– 0-response24.67–– Random34.57(0.02) Compared to the baselines:

UNED nlp.uned.es About the baselines So, in addition to the paper baselines: 0-response: always return score 0.5 Random baseline: return a random score uniformly between 0 and 100 We propose: Training average: return the average of the scores available for training (68.412)

UNED nlp.uned.es Conclusions Modest results in the task: 5th best of a total of 17 valid systems in average point difference But slightly above the average-score baseline Worse in terms ranking correlation scores We optimized for point difference Did we learn anything? Did we confirm our hypotheses? Not all syntactic contexts participate in the capture of meaning

UNED nlp.uned.es Conclusions Point difference has a strong baseline, using the sample bias: In hind-sight, we believe the ranking correlation quality measures are more sensible than the point difference for this particular task.

UNED nlp.uned.es Thanks! Got questions?

UNED nlp.uned.es Photo Credits Dog’s face: http://schaver.com/?p=87http://schaver.com/?p=87 Hot dog: http://www.flickr.com/photos/bk/3829486 195/ http://www.flickr.com/photos/bk/3829486 195/ Hot-dog dog: http://gawker.com/5380716/hot-dogs-in- the-hallway-of-wealth http://gawker.com/5380716/hot-dogs-in- the-hallway-of-wealth

UNED nlp.uned.es numerical scoresresponsesρτallADJSBJOBJ 0-response baseline0--23,4224,6717,0325,47 random baseline174-0,02 32,8234,5729,8332,34 UCPH-simple.en1740,270,1816,1914,9321,6414,66 UoY: Exm-Best1690,350,2416,5115,1915,7218,6 UoY: Pro-Best1690,330,2316,7914,6218,8918,31 UoY: Exm1690,260,1817,2815,8218,1818,6 SCSS-TCD: conf11740,270,1917,9518,5620,815,58 SCSS-TCD: conf21740,280,1918,3519,6220,215,73 Duluth-1174-0,01 21,2219,3526,7120,45 JUCSE-11740,330,2322,6725,3217,7122,16 JUCSE-21740,320,2222,9425,6917,5122,6 SCSS-TCD: conf31740,180,1225,5924,1632,0423,73 JUCSE-3174-0,04-0,0325,7530,0326,9119,77 Duluth-2174-0,06-0,0427,9337,4517,7421,85 Duluth-3174-0,08-0,0533,0444,0417,628,09 submission-ws1730,240,1644,2737,2450,0649,72 submission-pmi96----52,1350,46 UNED-1: NN770,2670,179-17,02-- UNED-2: NN770,2190,145-17,18-- UNED-3: NN770,1890,129-17,29--

Detecting compositionality using semantic vector space models based on syntactic context Guillermo Garrido and Anselmo Peñas NLP & IR Group at UNED Madrid,

Similar presentations

Presentation on theme: "Detecting compositionality using semantic vector space models based on syntactic context Guillermo Garrido and Anselmo Peñas NLP & IR Group at UNED Madrid,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Detecting compositionality using semantic vector space models based on syntactic context Guillermo Garrido and Anselmo Peñas NLP & IR Group at UNED Madrid,

Similar presentations

Presentation on theme: "Detecting compositionality using semantic vector space models based on syntactic context Guillermo Garrido and Anselmo Peñas NLP & IR Group at UNED Madrid,"— Presentation transcript:

Similar presentations

About project

Feedback