Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin.

Slides:



Advertisements
Similar presentations
Building Wordnets Piek Vossen, Irion Technologies.
Advertisements

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Scott Wen-tau Yih (Microsoft Research) Joint work with Vahed Qazvinian (University of Michigan)
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Components for a semantic textual similarity system Focus on word and sentence similarity Formal side: define similarity in principle.
Katrin Erk Distributional models. Representing meaning through collections of words Doc 1: Abdullah boycotting challenger commission dangerous election.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Shallow semantic parsing: Making most of limited training data Katrin Erk Sebastian Pado Saarland University.
Bilingual Lexical Acquisition From Comparable Corpora Andrea Mulloni.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Distributional clustering of English words Authors: Fernando Pereira, Naftali Tishby, Lillian Lee Presenter: Marian Olteanu.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Breaking the Resource Bottleneck for Multilingual Parsing Rebecca Hwa, Philip Resnik and Amy Weinberg University of Maryland.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
From Semantic Similarity to Semantic Relations Georgeta Bordea, November 25 Based on a talk by Alessandro Lenci titled “Will DS ever become Semantic?”,
Word sense induction using continuous vector space models
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
An Information Theoretic Approach to Bilingual Word Clustering Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Comparable Corpora Kashyap Popat( ) Rahul Sharnagat(11305R013)
Latent Semantic Analysis Hongning Wang VS model in practice Document and query are represented by term vectors – Terms are not necessarily orthogonal.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
1 How to Compute the Meaning of Natural Language Utterances Patrick Hanks, Research Institute of Information and Language Processing, University of Wolverhampton.
Computational Lexical Semantics Lecture 8: Selectional Restrictions Linguistic Institute 2005 University of Chicago.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval Doctorate Course Web Information Retrieval Speaker Gaia Trecarichi.
SALSA The Saarbrücken Lexical Semantics Annotation & Acquisition Project Aljoscha Burchardt, Katrin Erk, Anette Frank, Andrea Kowalski, Sebastian Pado,
L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
The interface between model-theoretic and corpus-based semantics
CLEF2003 Forum/ August 2003 / Trondheim / page 1 Report on CLEF-2003 ML4 experiments Extracting multilingual resources from corpora N. Cancedda, H. Dejean,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs SHONOSUKE ISHIWATARI NOBUHIRO KAJI NAOKI YOSHINAGA.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Knowledge Structure Vijay Meena ( ) Gaurav Meena ( )
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Vector Semantics Dense Vectors.
From Frequency to Meaning: Vector Space Models of Semantics
Statistical NLP: Lecture 7
A Brief Introduction to Distant Supervision
Statistical NLP: Lecture 13
Word embeddings (continued)
Statistical NLP: Lecture 10
Presentation transcript:

Mono- and bilingual modeling of selectional preferences Sebastian Padó Institute for Computational Linguistics Heidelberg University (joint work with Katrin Erk, Ulrike Pado, Yves Peirsman)

Some context Computational lexical semantics: modeling the meaning of words and phrases Distributional approach Observe the usage of words in corpora Robustness: Broad coverage, manageable complexity Flexibility: Corpus choice determines model Knowledg e Corpus

Structure Methods: Distributional semantics Methods: Distributional semantics Phenomena: Semantic relations in bilingual dictionaries Phenomena: Semantic relations in bilingual dictionaries Application: Predictions of plausibility judgments Application: Predictions of plausibility judgments

Plausibility of Verb-Relation-Argument-Triples VerbRelationArgumentPlausibility eatsubjectcustomer6.9 eatobjectcustomer1.5 eatsubjectapple1.0 eatobjectapple6.4 Central aspect of language Selectional preferences [Katz & Fodor 1963, Wilks 1975] Generalization of lexical similarity Incremental language processing [McRae & Matsuki 2009] Disambiguation [Toutanova et al. 2005], Applicability of inference rules [Pantel et al. 2007], SRL [Gildea & Jurafsky 2002]

Modelling Plausibility Approximating plausibility by frequency Two lexical variables : Frequency of most triples is zero Implausibility or sparse data? Generalization based on an ontology (WordNet) [Resnik 1996] Generalization based on vector space [Erk, Padó, und Padó 2010] English corpus (eat, obj, apple) 100 (eat, obj, hat) 1 (eat, obj, telephone) 0 (eat, obj, caviar) 0 (eat, obj, apple): highly plausible (eat, obj, hat): somewhat plausible (eat, obj, telephone): ? (eat, obj, caviar): ?

Semantic Spaces Characterization of word meaning though profile over occurrence contexts [Salton, Wang, and Yang 1974, Landauer & Dumais 1997, Schütze 1998] Geometrically: Vector in high-dimensional space High vector similarity implies high semantic similarity Next neighbors = synonyms cultiv er rouler mandarine 51 clémentine 41 voiture 120 Fr cultiver rouler mandarine clémentine voiture

Similarity-based generalization [Pado, Pado & Erk 2010] Plausibility is average vector space similarity to seen arguments (v, r, a): verb – relation – argument head word triple seenargs: set of argument head words seen in the corpus wt: weight function Z: normalization constant sim: semantic (vector space) similarity

Geometrical interpretation Peter husband child orange apple breakfast caviar Seen objects of “eat” Seen subjects of “eat” telephone

Choice of contexts The vector space must have an appropriate topology Word-based contexts Space represents “topological similarity” Dependency-based contexts Space represents finer-grained “participation-based similarity”

Evaluation Triples with human plausibility ratings [McRae et al. 1996] Evaluation: Correlation of model predictions with human judgments Spearman’s  = 1: perfect correlation;  = 0: no correlation Result: Vector space model attains almost quality of “deep” model at 98% coverage ModellAbdeckungSpearman’s rho Resnik 1996 [ontology-based]100%0.123 n.s. EPP [vector space-based]98%0.325 *** U. Pado et al [“deep” model] 78%0.415 ***

From one to many languages… Vector space model reduces the need for language resources to predict plausibility judgments No ontologies Still necessary: Observations of triples, target words Large, accurately parsed corpus Problematic for basically all languages except English Can we extend our strategy to new languages? Resnik [Brockmann & Lapata 2002] TIGER+ GermaNet ρ=.37 EPP [Pado & Peirsman 2010]HGCρ=.33

Predicting plausibility for new languages Transfer with a bilingual lexicon [Koehn and Knight 2002] Cross-lingual knowledge transfer Print dictionaries are problematic Instead: acquire from distributional data cultiver – grow pomme – apple cultiver – grow pomme – apple (cultiver, Obj, pomme) English model English corpus (grow, obj, apple): highly plausible

Bilingual semantic space Joint semantic space for words from both languages [Rapp 1995, Fung & McKeown 1997] Dimensions are bilingual word pairs, can be bootstrapped Frequencies observable from comparable corpora Nearest neighbors: Cross-lingual synonyms Translations (cultiv er, grow) (rouler, drive) mandarin e 51 mandarin 42 car 120 Fr cultiver/gro w rouler/drive mandarine mandarin car E E

Bootstrapping dimensions

Nearest neighbors in bilingual space Similar usages / context profiles do not necessarily indicate synonymy (cultiv er, grow) (rouler, drive) pear 51 pomme 42 car 120 Fr cultiver/gro w rouler/drive pear pomme car E E Bilingual case: Peirsman & Pado (2011) Lexicon extraction for EN/DE and EN/NL Bilingual case: Peirsman & Pado (2011) Lexicon extraction for EN/DE and EN/NL

Evaluation against Gold Standard Evaluation of nearest cross-lingual neighbors against a translators’ dictionary

Analysis of 200 noun pairs (EN-DE) Meta-RelationRelationFrequen cy Example Synonymy (50%)99Verhältnis - relationship Semantic similarity (16%) Antonymy1Inneres - exterior Co- Hyponymy 15Straßenbahn - bus Hyponymy3Kunstwerk - painting Hypernymy15Dramatiker - poet Semantic relatedness (19%) 39Kapitel - essay Errors (14%)28DDR-Zeit – trainee

Similarity by relation

How to proceed? Classical reaction: Focus on cross-lingual synonyms Aggressive filtering of nearest-neighbor lists Risk: Sparse data issues Our hypothesis (prelimimary version): Non-synonymous pairs still provide information about bilingual similarity Should be exploited for cross-lingual knowledge transfer Experimental validation: Vary number of synonyms, observe effect on cross-lingual knowledge transfer

Varying the number of neighbors Nearest neighbors: 50% of synonyms Further neighbors: quick decline to 10% of synonyms

Experimental setup rouler – drive bagnole – jalopy, banger, car rouler – drive bagnole – jalopy, banger, car (bagnole, subj, rouler) English model English corpus English corpus Consider plausibilities für: (jalopy, subj, drive) (banger, subj, drive) (car, subj, drive)

Details Model: English model: trained on BNC as before Bilingual lexicon extracted from BNC und Stuttgarter Nachrichtenkorpus HGC as comparable corpora Prediction based on n nearest English neighbours for German argument Evaluation: 90 German (v,r,a) triples with human plausibility ratings [Brockmann & Lapata 2003]

Results – EN-DE 1 NN2 NN3 NN4 NN5 NN Translated English EPP ModelResourcesSperman’s ρ Resnik [Brockmann & Lapata 2002] TIGER corpus, German Word Net.37 EPP German [Pado & Peirsman 2010] HGC corpus parsed with PCFG.33 Result: Transfer model significantly better than monolingual model, but only if non-synonymous neighbors are included

Results: Details 1 NN2 NN3 NN4 NN5 NN English EPP (all ) English EPP (subjects) English EPP (objects) English EPP (pp objects)

Sources of the positive effect Non-synonyms are in fact informative for plausibility translation Semantically similar verbs: eat – munch – feast Similar events, similar arguments [Fillmore et al. 2003, Levin 1993] Semantically related verbs: peel – cook – eat Schemas/narrative chains: shared participants [Shank & Abelson 1977, Chambers & Jurafsky 2009]

Our hypothesis with qualifications Using non-synonymous translation pairs is helpful 1. if transferred knowledge is lexical Many infrequently observed datapoints 2. if knowledge is stable across semantically related/similar word pairs Counterexample: polarity/sentiment judgments food – feast – grub Parallel experiment: best results for single nearest neighbor

Summary Plausibility can be modeled with fairly shallow methods Seen head words plus generalization in vector space Precondition: accurately parsed corpus If unavailable: Transfer from better-endowed language Translation through automatically induced lexicons Transfer of knowledge about certain phenomena can benefit from non-synonymous translations Corresponding to monolingual results from QA [Harabagiu et al. 2000], paraphrases [Lin & Pantel 2001], entailment [Dagan et al. 2006], …