Collocations David Guy Brizan Speech and Language Processing Seminar 26 th October, 2006.

Slides:



Advertisements
Similar presentations
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Advertisements

Natural Language Processing COLLOCATIONS Updated 16/11/2005.
Outline What is a collocation?
Hypothesis Testing IV Chi Square.
N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.
Analysis of frequency counts with Chi square
PSY 307 – Statistics for the Behavioral Sciences
Sample size computations Petter Mostad
1 Collocation and translation MA Literary Translation- Lesson 2 prof. Hugo Bowles February
Fall 2001 EE669: Natural Language Processing 1 Lecture 5: Collocations (Chapter 5 of Manning and Schutze) Wen-Hsiang Lu ( 盧文祥 ) Department of Computer.
1 Lecture 8 Measures of association: chi square test, mutual information, binomial distribution and log likelihood ratio.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Chapter 2 Simple Comparative Experiments
Inferences About Process Quality
Today Concepts underlying inferential statistics
Collocations 09/23/2004 Reading: Chap 5, Manning & Schutze (note: this chapter is available online from the book’s page
PSY 307 – Statistics for the Behavioral Sciences
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Outline What is a collocation? Automatic approaches 1: frequency-based methods Automatic approaches 2: ruling out the null hypothesis, t-test Automatic.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
Albert Gatt Corpora and Statistical Methods – Part 2.
Statistical Natural Language Processing Diana Trandabăț
1 Introduction to Natural Language Processing ( ) Words and the Company They Keep AI-lab
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Statistical Analysis A Quick Overview. The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data)
1 COMP791A: Statistical Language Processing Collocations Chap. 5.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
Collocation 발표자 : 이도관. Contents 1.Introduction 2.Frequency 3.Mean & Variance 4.Hypothesis Testing 5.Mutual Information.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Collocations. Definition Of Collocation (wrt Corpus Literature) A collocation is defined as a sequence of two or more consecutive words, that has characteristics.
4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}
Chapter Outline Goodness of Fit test Test of Independence.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
COLLOCATIONS He Zhongjun Outline Introduction Approaches to find collocations Frequency Mean and Variance Hypothesis test Mutual information.
© Copyright McGraw-Hill 2004
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
The p-value approach to Hypothesis Testing
COGS Bilge Say1 Using Corpora for Language Research COGS 523-Lecture 8 Collocations.
Chapter Nine Hypothesis Testing.
Lecture8 Test forcomparison of proportion
Statistical NLP: Lecture 7
Two-Sample Hypothesis Testing
LECTURE 33: STATISTICAL SIGNIFICANCE AND CONFIDENCE (CONT.)
Inference and Tests of Hypotheses
Many slides from Rada Mihalcea (Michigan), Paul Tarau (U.North Texas)
Chapter 25 Comparing Counts.
CSC 594 Topics in AI – Natural Language Processing
Inferential Statistics
Correlation and Regression
Elementary Statistics
Statistical NLP: Lecture 9
Discrete Event Simulation - 4
INTRODUCTION TO HYPOTHESIS TESTING
Paired Samples and Blocks
Introduction: Statistics meets corpus linguistics
Chapter 26 Comparing Counts.
Introduction to Text Analysis
Language Model Approach to IR
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
Statistical NLP : Lecture 9 Word Sense Disambiguation
Statistical NLP: Lecture 10
Presentation transcript:

Collocations David Guy Brizan Speech and Language Processing Seminar 26 th October, 2006

2 References  Lakoff: Women, Fire and Dangerous Things: What Categories Reveal about the Mind (1990)  Lakoff & Johnson: Metaphors We Live By (1980)  Manning and Schutze: Foundations of Statistical Natural Language Processing (1999)  Seretan: Induction of Syntactic Collocation Patterns from Generic Syntactic Relations (2005)  Stevenson, et. al.: Statistical Measures of the Semi- Productivity of Light Verb Constructions (2004)

3 Agenda  Introduction  Finding Candidates  Hypothesis Testing  Applications

4 Agenda  Introduction  Hypothesis Testing  Finding Candidates  Applications

5 Collocations: Definitions  Broad definition: Words strongly associated with each other Example: hospital, insurance  Narrow definition: Two or more (consecutive) words functioning as a (syntactic or semantic) unit Example: dressed to the nines Test: translate word-for-word into another language?  Book (and I) will use a narrow-ish definition

6 Collocations: Features  Book’s definition: Non-Compositionality Non-Substitutability Non-Modifiability  Perhaps “limited” would be better than “non” in the above?

7 Definition: Non-Compositionality  The meaning of the collocation cannot easily be derived from the meanings of the individual words  Example: in broad daylight Literal meaning: during the day Subtle meaning: with no attempt to hide one’s actions  The whole unit must be considered in NLP tasks, such as Machine Translation

8 Definition: Non-Substitutability  Cannot substitute an equivalent word for any in the collocation  Example: in broad daylight Can never be * wide daylight

9 Definition: Non-Modifiability  The individual words in the collocation resist modification  Example: in broad daylight Never: * broad daylights

10 Example Classes  Names  Technical Terms  “Light” Verb Constructions  Phrasal verbs  Noun Phrases  Bromides, Stock Phrases, Idioms

11 Names  Example: City University of New York Different from: [a] university in the city of New York  Names are candidates for transliteration They are generally not subject to “normal” machine translation techniques  Important NLP tasks: Find all names Find all aliases for names  Example: CUNY = City University of New York

12 Technical Terms  Example: head gaskethead gasket Part of a car’s engine  Important NLP tasks: Treat each occurrence of a technical term consistently Translate each occurrence in only one fashion

13 Light Verb Constructions  Example: to take a walk  Features: “Light” verb: do, give, have, make, take Verb adds little semantic content to phrase  Important NLP tasks: Identify light verb constructions Generation of light verbs correctly * She makes a walk = elle fait une promenade

14 Phrasal Verbs  Example: we make up new stories Make: light verb Make up: invent  Features: long distance dependencies Example: we made that story up  Important NLP tasks: Identify phrasal verbs (despite distance!) Translate phrasal verbs as a unit: * hicimos una historia para arriba = we made a story up

15 Noun Phrases  Example: weapons of mass destruction Also: weapons of mass deception  Features: Long NPs often become TLAs (esp. in SAE) Possession follows a rule but is not obvious  Example: the weapon of mass destruction’s payload  Important NLP tasks: Identify noun phrases as collocations

16 Bromides, Stock Phrases, Idioms  Example: as blind as a bat French, German: as myopic as a mole  Features: All of the canonical features (non-compositionality, etc.)  Important NLP tasks: Identify idioms Translate them correctly The spirit is willing but the flesh is weak  Some idioms may be translated directly: my blood is boiling

17 Agenda  Introduction  Finding Candidates  Hypothesis Testing  Applications

18 Finding Candidates  Basic premise: If two (or more) words co-occur a lot, the co-occurrence is a collocation candidate Candidate may (must?) still be hypothesis-tested  Tools: Frequency Frequency with Tag Filters Mean and Variance

19 Frequency  Technique: unsmoothed bigram Count the number of times a bigram co-occurs Extract top counts and report them as candidates  Results: Corpus: New York Times  August – November, 1990 Extremely un-interesting

20 Frequency with Tag Filters Technique  Technique: unsmoothed N- gram Count the number of times a bigram co-occurs Tag candidates for POS Pass all candidates through POS filter, considering only ones matching filter Extract top counts and report them as candidates

21 Frequency with Tag Filters Results

22 Mean and Variance Technique  Technique: unsmoothed bigram Produce all possible pairs in a window Consider all pairs in window as candidates Keep data about distance of one word from another Count the number of time each candidate occurs  Measures: Mean: average offset (possibly negative)  Whether two words are related to each other Variance:  (offset)  Variability in position of two words

23 Mean and Variance Illustration  Candidate Generation example: Window: 3  Used to find collocations with long-distance relationships

24 Mean and Variance Graphed Results  Derived data can be placed in a histogram Example: strong vs. for (“strong [business] support for”?)

25 Mean and Variance Collocations

26 Agenda  Introduction  Finding Candidates  Hypothesis Testing  Applications

27 Hypothesis Testing  Basic premise: Two (or more) words co-occur a lot Is a candidate a true collocation, or a (not-at-all- interesting) phantom?  Distribution tools: Mutual Information / Pointwise Mutual Information (I) t Test  2 Test Likelihood Ratios ( )

28 [Pointwise] Mutual Information (I)  Intuition: Given a collocation (w 1, w 2 ) and an observation of w 1 I(w 1 ; w 2 ) indicates how more likely it is to see w 2 The same measure also works in reverse (observe w 2 )  Assumptions: Data is not sparse

29 Mutual Information Formula  Measures: P(w 1 ) = unigram prob. P(w 1 w 2 ) = bigram prob. P (w 2 |w 1 ) = probability of w 2 given we see w 1  Result: Number indicating increased confidence that we will see w 2 after w 1 Number of bits in increased probability

30 Mutual Information Criticism  A better measure of the independence of two words rather than the dependence of one word on another  Horrible on [read: misidentifies] sparse data

31 The t test Intuition  Intuition: Compute chance occurrence and ensure observed is significantly higher Take several permutations of the words in the corpus How more frequent is the set of all possible permutations than what is observed?  Assumptions: H 0 is the null hypothesis (chance occurrence)  P(w 1, w 2 ) = P(w 1 ) P(w 2 ) Distribution is “normal”

32 The t test Formula  Measures: x = bigram count  = H 0 = P(w 1 ) P(w 2 ) s 2 = bigram count (since p ~ p[1 – p]) N = total number of bigrams  Result: Number to look up in a tabletable Degree of confidence that collocation is not created by chance   = the confidence (%) with which one can reject H 0

33 The t test Sample Findings

34 The t test Criticism  Words are not normally distributed Can reject valid collocation  Not good on sparse data  Not good on data with large probabilities

35  2 Intuition  Pearson’s chi-square test  Intuition Compare observed frequencies to expected frequencies for independence Large difference = can reject H 0  Assumptions If sample is not small, the distribution is approximately  2

36  2 General Formula  Measures: E ij = Expected count of the bigram O ij = Observed count of the bigram  Result A number to look up in a table (like the t test) Degree of confidence (  ) with which H 0

37  2 Bigram Method and Formula  Technique for Bigrams: Arrange the bigrams in a 2x2 table with counts for each Formula  O ij : i = column; j = row

38  2 Sample Findings  Comparing corpora Machine Translation  Comparison of (English) “cow” and (French) “vache” gives a   2 = Similarity of two corpora

39  2 Criticism  Not good for small datasets

40 Intuition  Likelihood Ratio  Intuition How much more likely is a given collocation than occurrence by chance?  Assumptions Formulate two hypotheses:  H 1 = words in collocation are independent  H 2 = words are dependent

41 Formula  Measures L(H x ) = estimate of the distribution of hypothesis H x

42 Sample Findings

43 Agenda  Introduction  Finding Candidates  Hypothesis Testing  Applications

44 Applications  Collocations are useful in: Comparison of Corpora Parsing New Topic Detection Computational Lexicography Natural Language Generation Machine Translation

45 Comparison of Corpora  Compare corpora to determine: Document clustering (for information retrieval) Plagiarism  Comparison techniques: Competing hypotheses:  Documents are dependent  Documents are independent Compare hypotheses using, etc.

46 Parsing  When parsing, we may get more accurate data by treating a collocation as a unit (rather than individual words) Using the CMU parser, things can go wrongCMU parser Example: [ hand to hand ] is a unit in: (S (NP They) (VP engaged (PP in hand) (PP to (NP hand combat))))

47 New Topic Detection  When new topics are reported, the count of collocations associated with those topics increases When topics become old, the count drops

48 Computational Lexicography  As new multi-word expressions become part of the language, they can be detected Existing collocations can be acquired  Can also be used for cultural identification Examples:  My friend got an A in his class  My friend took an A in his class  My friend made an A in his class  My friend earned an A in his class

49 Natural Language Generation  Problem: Given two (or more) possible productions, which is more feasible? Productions usually involve synonyms or near-synonyms Languages generally favour one production

50 Machine Translation  Collocation-complete problem? Must find all used collocations Must parse collocation as a unit Must translate collocation as a unit In target language production, must select among many plausible alternatives

51 Thanks!  Questions?