1 Introduction to Natural Language Processing (600.465) Words and the Company They Keep AI-lab 2003.11.

Slides:



Advertisements
Similar presentations
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
Advertisements

The Chi-Square Test for Association
Natural Language Processing COLLOCATIONS Updated 16/11/2005.
Outline What is a collocation?
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Chapter 8 Hypothesis Testing I. Significant Differences  Hypothesis testing is designed to detect significant differences: differences that did not occur.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Probability theory Much inspired by the presentation of Kren and Samuelsson.
Statistics MP Oakes (1998) Statistics for corpus linguistics. Edinburgh University Press.
Fall 2001 EE669: Natural Language Processing 1 Lecture 5: Collocations (Chapter 5 of Manning and Schutze) Wen-Hsiang Lu ( 盧文祥 ) Department of Computer.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Chapter 2 Simple Comparative Experiments
Chapter 9 Hypothesis Testing.
Collocations 09/23/2004 Reading: Chap 5, Manning & Schutze (note: this chapter is available online from the book’s page
Nonparametric or Distribution-free Tests
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Outline What is a collocation? Automatic approaches 1: frequency-based methods Automatic approaches 2: ruling out the null hypothesis, t-test Automatic.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Choosing Statistical Procedures
AM Recitation 2/10/11.
Albert Gatt Corpora and Statistical Methods – Part 2.
Hypothesis Testing:.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Overview Definition Hypothesis
Chapter 5 Sampling and Statistics Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Statistical Natural Language Processing Diana Trandabăț
INFERENTIAL STATISTICS: Analysis Of Variance ANOVA
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Chapter 8 Hypothesis Testing I. Chapter Outline  An Overview of Hypothesis Testing  The Five-Step Model for Hypothesis Testing  One-Tailed and Two-Tailed.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
10/04/1999 JHU CS /Jan Hajic 1 *Introduction to Natural Language Processing ( ) Words and the Company They Keep Dr. Jan Hajič CS Dept., Johns.
Multinomial Distribution
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Chi-Square Test A fundamental problem in genetics is determining whether the experimentally determined data fits the results expected from theory. How.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Collocation 발표자 : 이도관. Contents 1.Introduction 2.Frequency 3.Mean & Variance 4.Hypothesis Testing 5.Mutual Information.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
1 Introduction to Natural Language Processing ( ) LM Smoothing (The EM Algorithm) AI-lab
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Copyright © 2010 Pearson Education, Inc. Unit 4 Chapter 14 From Randomness to Probability.
Chapter 8 Hypothesis Testing I. Significant Differences  Hypothesis testing is designed to detect significant differences: differences that did not occur.
"Classical" Inference. Two simple inference scenarios Question 1: Are we in world A or world B?
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Chapter Outline Goodness of Fit test Test of Independence.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
© Copyright McGraw-Hill 2004
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Problems with Variance ©2005 Dr. B. C. Paul. Determining What To Do We have looked at techniques that depend on normally distributed data with variance.
The Chi Square Equation Statistics in Biology. Background The chi square (χ 2 ) test is a statistical test to compare observed results with theoretical.
Section 9.1 First Day The idea of a significance test What is a p-value?
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Inferential Statistics. Population Curve Mean Mean Group of 30.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.
Log-linear Models Please read Chapter Two. We are interested in relationships between variables White VictimBlack Victim White Prisoner151 (151/160=0.94)
Collocations David Guy Brizan Speech and Language Processing Seminar 26 th October, 2006.
Statistical NLP: Lecture 7
Lecture Nine - Twelve Tests of Significance.
I. Statistical Tests: Why do we use them? What do they involve?
Significance Tests: The Basics
Type I and Type II Errors
Presentation transcript:

1 Introduction to Natural Language Processing ( ) Words and the Company They Keep AI-lab

2 Motivation Environment: –mostly “ not a full analysis (sentence/text parsing) ” Tasks where “ words & company ” are important: –word sense disambiguation (MT, IR, TD, IE) –lexical entries: subdivision & definitions (lexicography) –language modeling (generalization, [kind of] smoothing) –word/phrase/term translation (MT, Multilingual IR) –NL generation ( “ natural ” phrases) (Generation, MT) –parsing (lexically-based selectional preferences)

3 Collocations Collocation –Firth: “ word is characterized by the company it keeps ” ; collocations of a given word are statements of the habitual or customary places of that word. –non-compositionality of meaning cannot be derived directly from its parts (heavy rain) –non-substitutability in context for parts (red light) –non-modifiability (& non-transformability) kick the yellow bucket; take exceptions to

4 Association and Co-occurence; Terms Does not fall under “ collocation ”, but: Interesting just because it does often [rarely] appear together or in the same (or similar) context: (doctors, nurses) (hardware,software) (gas, fuel) (hammer, nail) (communism, free speech) Terms: –need not be > 1 word (notebook, washer)

5 Collocations of Special Interest Idioms: really fixed phrases kick the bucket, birds-of-a-feather, run for office Proper names: difficult to recognize even with lists Tuesday (person ’ s name), May, Winston Churchill, IBM, Inc. Numerical expressions –containing “ ordinary ” words Monday Oct , two thousand seven hundred fifty Phrasal verbs –Separable parts: look up, take off

6 Further Notions Synonymy: different form/word, same meaning: notebook / laptop Antonymy: opposite meaning: new/old, black/white, start/stop Homonymy: same form/word, different meaning: “ true ” (random, unrelated): can (aux. verb / can of Coke) related: polysemy; notebook, shift, grade,... Other: Hyperonymy/Hyponymy: general vs. special: vehicle/car Meronymy/Holonymy: whole vs. part: body/leg

7 How to Find Collocations? Frequency –plain –filtered Hypothesis testing –t test –   test Pointwise ( “ poor man ’ s ” ) Mutual Information (Average) Mutual Information

8 Frequency Simple –Count n-grams; high frequency n-grams are candidates: mostly function words frequent names Filtered –Stop list: words/forms which (we think) cannot be a part of a collocation a, the, and, or, but, not,... –Part of Speech (possible collocation patterns) A+N, N+N, N+of+N,...

9 Hypothesis Testing Hypothesis –something we test (against) Most often: –compare possibly interesting thing vs. “ random ” chance –“ Null hypothesis ” : something occurs by chance (that ’ s what we suppose). Assuming this, prove that the probabilty of the “ real world ” is then too low (typically < 0.05, also 0.005, 0.001)... therefore reject the null hypothesis (thus confirming “ interesting ” things are happening!) Otherwise, it ’ s possibile there is nothing interesting.

10 t test (Student ’ s t test) Significance of difference –compute “ magic ” number against normal distribution (mean  ) –using real-world data: (x ’ real data mean, s 2 variance, N size): t = (x ’ -  /  s 2 / N –find in tables (see MS, p. 609): d.f. = degrees of freedom (parameters which are not determined by other parameters) percentile level p = 0.05 (or better) –the bigger t: the better chances that there is the interesting feature we hope for (i.e. we can reject the null hypothesis) t: at least the value from the table(s)

11 t test on words null hypothesis: independence mean  : p(w 1 ) p(w 2 ) data estimates: x ’ = MLE of joint probability from data s 2 is p(1-p), i.e. almost p for small p; N is the data size Example: (d.f. ~ sample size) ‘ general term ’ (homework corpus): c(general) = 108, c(term) = 40 c(general,term) = 2; expected p(general)p(term) = 8.8E-8 t = (9.0E E-8) / (9.0E-6 / ) 1/2 = 1.40 (not > 2.576) thus ‘ general term ’ is not a collocation with confidence ‘ true species ’ : (84/1779/9): t = > !!

12 Pearson ’ s Chi-square test  2 test (general formula):  i,j (O ij -E ij ) 2 / E ij –where O ij /E ij is the observed/expected count of events i, j for two-outcomes-only events:  2 = (219243x9-75x1770) 2 /1779x84x221013x = > 7.88 (at.005 thus we can reject the independence assumption)

13 Pointwise Mutual Information This is NOT the MI as defined in Information Theory –(IT: average of the following; not of values)...but might be useful: I ’ (a,b) = log 2 (p(a,b) / p(a)p(b)) = log 2 (p(a|b) / p(a)) Example (same): I ’ (true,species) = log 2 (4.1e-5 / 3.8e-4 x 8.0e-3) = 3.74 I ’ (general,term) = log 2 (9.0e-6 / 1.8e-4 x 4.9e-4) = 6.68 measured in bits but it is difficult to give it an interpretation used for ranking (~ the null hypothesis tests)