Lexical Acquisition Extending our information about words, particularly quantitative information.

Slides:



Advertisements
Similar presentations
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Outline What is a collocation?
Statistical NLP: Lecture 3
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.
Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996.
1 Words and the Lexicon September 10th 2009 Lecture #3.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
Features and Unification
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.
Distributional clustering of English words Authors: Fernando Pereira, Naftali Tishby, Lillian Lee Presenter: Marian Olteanu.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Albert Gatt Corpora and Statistical Methods Lecture 5.
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
Albert Gatt Corpora and Statistical Methods Lecture 5.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
Natural Language Processing : Lexical Acquisition Lecture 8 Pusan National University Minho Kim
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Linguistic Essentials
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Chapter 23: Probabilistic Language Models April 13, 2004.
Rules, Movement, Ambiguity
CSA2050 Introduction to Computational Linguistics Parsing I.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
Preposition Phrase Attachment To what previous verb or noun phrase does a prepositional phrase (PP) attach? The womanwith a poodle saw in the park with.
LING 6520: Comparative Topics in Linguistics (from a computational perspective) Martha Palmer Jan 15,
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
SYNTAX.
Levels of Linguistic Analysis
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
11-1 Chapter 11 Lexical Acquisition Lecture Overview Methodological Issues: Evaluation Measures Verb Subcategorization –the syntactic means by which.
Chapter 8 Lexical Acquisition February 19, 2007 Additional Notes to Manning’s slides.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Natural Language Processing Vasile Rus
Statistical NLP: Lecture 3
Statistical NLP: Lecture 9
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
CS4705 Natural Language Processing
CS4705 Natural Language Processing
Levels of Linguistic Analysis
Linguistic Essentials
CS246: Information Retrieval
Statistical NLP : Lecture 9 Word Sense Disambiguation
Statistical NLP: Lecture 10
Presentation transcript:

Lexical Acquisition Extending our information about words, particularly quantitative information

Why lexical acquisition? “one cannot learn a new language by reading a bilingual dictionary” -- Mercer –Parsing ‘postmen’ requires context quantitative information is difficult to collect by hand –e.g., priors on word senses productivity of language –Lexicons need to be updated for new words and usages

Machine-readable Lexicons contain... Lexical vs syntactic information √Word senses –Classifications, subclassifications √Collocations –Arguments, preferences –Synonyms, antonyms –Quantitative information

Gray area between lexical and syntactic The rules of grammar are syntactic. –S ::= NP V NP –S ::= NP [V NP PP] But which one to use, when? –The children ate the cake with their hands. –The children ate the cake with blue icing.

Outline of chapter verb subcategorization –Which arguments (e.g. infinitive, DO) does a particular verb admit? attachment ambiguity –What does the modifier refer to? selectional preferences –Does a verb tend to restrict its object to a certain class? semantic similarity between words –This new word is most like which words?

Verb subcategorization frames Assign to each verb the sf’s legal for it. (see diagram) Crucial for parsing. –She told the man where Peter grew up. (NP NP S) –She found the place where Peter grew up. (NP NP)

Brent’s method (1993) Learn subcategorizations given a corpus, lexical analyzer, and cues. A cue is a pair : –L is a star-free regular expression over lexemes (OBJ | SUBJ-OBJ | CAP) (PUNC | CC) –SF is a subcategorization frame NP Strategy: find verb sf’s for which the cues provide strong evidence.

Brent’s method (cont’d) Compute the error rate of the cue E = Pr(false positives) For each verb v and cue c =, Test the hypothesis H 0 that verb v does not admit SF. –p E = If p E < a threshold, reject H 0.

Subcategorization Frames: Ideas Hypothesis testing gives high precision, low recall. Unreliable cues are necessary and helpful (independence assumption) Find SF’s for verb classes, rather than verbs, using a buggy tagger. As long as error estimates are incorporated into p E, it works great. Manning did this, and improved recall.

Attachment Ambiguity: PPs NP V NP PP -- Does PP mdify V or NP? Assumption: there is only one meaningful parse for each sentence: xThe children ate the cake with a spoon. √Bush sent 100,000 soldiers into Kuwait. √Brazil honored their deal with the IMF. Straw man: compare co-occurrence counts between pairs and.

Bias defeats simple counting Prob(into | send) > Prob(into | soldiers). Sometimes there will be strong association between PP and both V and NP. –Ford ended its venture with Fiat. In this case, there is a bias toward “low attachment” -- attaching PP to the nearer referent, NP.

Hindle and Ruth (1993) Elegant (?) method of quantifying the low attachment bias Express P(first PP after object attaches to object) and P(first PP after object attaches to verb) as a function of P(NA) = P(there is a PP following the object attaching to object) and P(VA) = P(there is a PP following the object attaching to verb) Estimate P(NA) and P(VA) based on counting

Estimating P(NA) and P(VA) are a particular verb, noun, and preposition P(VA p | v) = –(# times p attaches to v)/(# occs of v) P(NA p | n) = –(# times p attaches to n)/(# occs of v) The two are treated as independent!

Attachment of first PP P(Attach(p,n) | v,n) = P(NA p | n) –Whenever there is a PP attaching to the noun, the first such PP attaches to the noun! P(Attach(p,v) | v,n) = P((not NA p ) | n) P(VA p | v) –Whenever there is no PP attaching to the noun, AND a PP attaching to verb… –I (put the [book on the table) on WW2]

Selectional Preferences Verbs prefer classes of subjects, objects: –Objects of ‘eat’ tend to be food items –Subjects of ‘think’ tend to be people –Subjects of ‘bark’ tend to be dogs Used to –disambiguate word sense –infer class of new words –rank multiple parses

Disambiguate the class (Resnick) –She interrupted the chair. A(nc) = D(P(nc | v) || P(nc)) = P(nc|v)log(P(nc|v)/P(nc)) Relative entropy, or Kullback Leibler distance A(furniture) = P(furniture | interrupted) * log((P(furniture | interrupted) / P(furniture))

Estimating P(nc | v) P(nc | v) = P(nc,v) / P(v) P(v) is estimated to be the proportion of occurrences v among all verbs P(nc,v) is proposed to be –1/N Σ (n in nc) C(v,n)/|classes(n)| Now just take the class with highest A(nc) for maximum likelihood word sense.

Semantic similarity Uses –classifying a new word –expand queries in IR Are two words similar... –When they are used together? IMF and Brazil –When they are on the same topic? astronaut and spacewalking –When they function interchangeably? Soviet and American –When they are synonymous? astronaut and cosmonaut

Cosine is no panacea Corresponds to Euclidean distance between points Should document-space vectors be treated as points? Alternative: treat them as probability distributions (after normalizing) Now, no reason to use cosine. Why not try information-theoretic approach?

Alternatives distance metrics to cosine Cosine of square roots (Goldszmidt) L1 norm -- Manhattan distance –Sum of absolute value of difference of components KL Distance –D(p || q) Mutual information (why not?) –D(p ^ q || pq) Information radius -- information lost describing both p and q by their midpoint. –IRAD(p,q) = D(p||m) + D(q||m)