Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin.

Similar presentations


Presentation on theme: "Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin."— Presentation transcript:

1 Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin Erk, Ulrike Pado, Yves Peirsman)

2 Some context Computational lexical semantics: modeling the meaning of words and phrases Distributional approach Observe the usage of words in corpora Robustness: Broad coverage, manageable complexity Flexibility: Corpus choice determines model Knowledg e Corpus

3 Structure Methods: Distributional semantics Methods: Distributional semantics Phenomena: Semantic relations in bilingual dictionaries Phenomena: Semantic relations in bilingual dictionaries Application: Predictions of plausibility judgments Application: Predictions of plausibility judgments

4 Plausibility of Verb-Relation-Argument-Triples VerbRelationArgumentPlausibility eatsubjectcustomer6.9 eatobjectcustomer1.5 eatsubjectapple1.0 eatobjectapple6.4 Central aspect of language Selectional preferences [Katz & Fodor 1963, Wilks 1975] Generalization of lexical similarity Incremental language processing [McRae & Matsuki 2009] Disambiguation [Toutanova et al. 2005], Applicability of inference rules [Pantel et al. 2007], SRL [Gildea & Jurafsky 2002]

5 Modelling Plausibility Approximating plausibility by frequency Two lexical variables : Frequency of most triples is zero Implausibility or sparse data? Generalization based on an ontology (WordNet) [Resnik 1996] Generalization based on vector space [Erk, Padó, und Padó 2010] English corpus (eat, obj, apple) 100 (eat, obj, hat) 1 (eat, obj, telephone) 0 (eat, obj, caviar) 0 (eat, obj, apple): highly plausible (eat, obj, hat): somewhat plausible (eat, obj, telephone): ? (eat, obj, caviar): ?

6 Semantic Spaces Characterization of word meaning though profile over occurrence contexts [Salton, Wang, and Yang 1974, Landauer & Dumais 1997, Schütze 1998] Geometrically: Vector in high-dimensional space High vector similarity implies high semantic similarity Next neighbors = synonyms cultiv er rouler mandarine 51 clémentine 41 voiture 120 Fr cultiver rouler mandarine clémentine voiture

7 Similarity-based generalization [Pado, Pado & Erk 2010] Plausibility is average vector space similarity to seen arguments (v, r, a): verb – relation – argument head word triple seenargs: set of argument head words seen in the corpus wt: weight function Z: normalization constant sim: semantic (vector space) similarity

8 Geometrical interpretation Peter husband child orange apple breakfast caviar Seen objects of “eat” Seen subjects of “eat” telephone

9 Choice of contexts The vector space must have an appropriate topology Word-based contexts Space represents “topological similarity” Dependency-based contexts Space represents finer-grained “participation-based similarity”

10 Evaluation Triples with human plausibility ratings [McRae et al. 1996] Evaluation: Correlation of model predictions with human judgments Spearman’s  = 1: perfect correlation;  = 0: no correlation Result: Vector space model attains almost quality of “deep” model at 98% coverage ModellAbdeckungSpearman’s rho Resnik 1996 [ontology-based]100%0.123 n.s. EPP [vector space-based]98%0.325 *** U. Pado et al. 2006 [“deep” model] 78%0.415 ***

11 From one to many languages… Vector space model reduces the need for language resources to predict plausibility judgments No ontologies Still necessary: Observations of triples, target words Large, accurately parsed corpus Problematic for basically all languages except English Can we extend our strategy to new languages? Resnik [Brockmann & Lapata 2002] TIGER+ GermaNet ρ=.37 EPP [Pado & Peirsman 2010]HGCρ=.33

12 Predicting plausibility for new languages Transfer with a bilingual lexicon [Koehn and Knight 2002] Cross-lingual knowledge transfer Print dictionaries are problematic Instead: acquire from distributional data cultiver – grow pomme – apple cultiver – grow pomme – apple (cultiver, Obj, pomme) English model English corpus (grow, obj, apple): highly plausible

13 Bilingual semantic space Joint semantic space for words from both languages [Rapp 1995, Fung & McKeown 1997] Dimensions are bilingual word pairs, can be bootstrapped Frequencies observable from comparable corpora Nearest neighbors: Cross-lingual synonyms Translations (cultiv er, grow) (rouler, drive) mandarin e 51 mandarin 42 car 120 Fr cultiver/gro w rouler/drive mandarine mandarin car E E

14 Bootstrapping dimensions

15 Nearest neighbors in bilingual space Similar usages / context profiles do not necessarily indicate synonymy (cultiv er, grow) (rouler, drive) pear 51 pomme 42 car 120 Fr cultiver/gro w rouler/drive pear pomme car E E Bilingual case: Peirsman & Pado (2011) Lexicon extraction for EN/DE and EN/NL Bilingual case: Peirsman & Pado (2011) Lexicon extraction for EN/DE and EN/NL

16 Evaluation against Gold Standard Evaluation of nearest cross-lingual neighbors against a translators’ dictionary

17 Analysis of 200 noun pairs (EN-DE) Meta-RelationRelationFrequen cy Example Synonymy (50%)99Verhältnis - relationship Semantic similarity (16%) Antonymy1Inneres - exterior Co- Hyponymy 15Straßenbahn - bus Hyponymy3Kunstwerk - painting Hypernymy15Dramatiker - poet Semantic relatedness (19%) 39Kapitel - essay Errors (14%)28DDR-Zeit – trainee

18 Similarity by relation

19 How to proceed? Classical reaction: Focus on cross-lingual synonyms Aggressive filtering of nearest-neighbor lists Risk: Sparse data issues Our hypothesis (prelimimary version): Non-synonymous pairs still provide information about bilingual similarity Should be exploited for cross-lingual knowledge transfer Experimental validation: Vary number of synonyms, observe effect on cross-lingual knowledge transfer

20 Varying the number of neighbors Nearest neighbors: 50% of synonyms Further neighbors: quick decline to 10% of synonyms

21 Experimental setup rouler – drive bagnole – jalopy, banger, car rouler – drive bagnole – jalopy, banger, car (bagnole, subj, rouler) English model English corpus English corpus Consider plausibilities für: (jalopy, subj, drive) (banger, subj, drive) (car, subj, drive)

22 Details Model: English model: trained on BNC as before Bilingual lexicon extracted from BNC und Stuttgarter Nachrichtenkorpus HGC as comparable corpora Prediction based on n nearest English neighbours for German argument Evaluation: 90 German (v,r,a) triples with human plausibility ratings [Brockmann & Lapata 2003]

23 Results – EN-DE 1 NN2 NN3 NN4 NN5 NN Translated English EPP 0.340.410.440.460.40 ModelResourcesSperman’s ρ Resnik [Brockmann & Lapata 2002] TIGER corpus, German Word Net.37 EPP German [Pado & Peirsman 2010] HGC corpus parsed with PCFG.33 Result: Transfer model significantly better than monolingual model, but only if non-synonymous neighbors are included

24 Results: Details 1 NN2 NN3 NN4 NN5 NN English EPP (all )0.340.410.440.460.40 English EPP (subjects) 0.530.510.56 0.55 English EPP (objects) 0.580.61 0.640.58 English EPP (pp objects) 0.330.45 0.460.42

25 Sources of the positive effect Non-synonyms are in fact informative for plausibility translation Semantically similar verbs: eat – munch – feast Similar events, similar arguments [Fillmore et al. 2003, Levin 1993] Semantically related verbs: peel – cook – eat Schemas/narrative chains: shared participants [Shank & Abelson 1977, Chambers & Jurafsky 2009]

26 Our hypothesis with qualifications Using non-synonymous translation pairs is helpful 1. if transferred knowledge is lexical Many infrequently observed datapoints 2. if knowledge is stable across semantically related/similar word pairs Counterexample: polarity/sentiment judgments food – feast – grub Parallel experiment: best results for single nearest neighbor

27 Summary Plausibility can be modeled with fairly shallow methods Seen head words plus generalization in vector space Precondition: accurately parsed corpus If unavailable: Transfer from better-endowed language Translation through automatically induced lexicons Transfer of knowledge about certain phenomena can benefit from non-synonymous translations Corresponding to monolingual results from QA [Harabagiu et al. 2000], paraphrases [Lin & Pantel 2001], entailment [Dagan et al. 2006], …


Download ppt "Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin."

Similar presentations


Ads by Google