1 Natural Language Processing (3b) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University 2010-2011

Slides:



Advertisements
Similar presentations
A small taste of inferential statistics
Advertisements

Chapter 7 Hypothesis Testing
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
R OBERTO B ATTITI, M AURO B RUNATO The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Natural Language Processing COLLOCATIONS Updated 16/11/2005.
Outline What is a collocation?
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Review: What influences confidence intervals?
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Evaluating Hypotheses
Sampling Distributions
Experimental Evaluation
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Information Theory and Security
Collocations 09/23/2004 Reading: Chap 5, Manning & Schutze (note: this chapter is available online from the book’s page
Albert Gatt Corpora and Statistical Methods. Probability distributions Part 2.
Outline What is a collocation? Automatic approaches 1: frequency-based methods Automatic approaches 2: ruling out the null hypothesis, t-test Automatic.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
AM Recitation 2/10/11.
Albert Gatt Corpora and Statistical Methods – Part 2.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
Statistical Natural Language Processing Diana Trandabăț
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Inference for a Single Population Proportion (p).
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
Slide Slide 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim about a Proportion 8-4 Testing a Claim About.
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
1 Natural Language Processing (5) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
1 Psych 5500/6500 Standard Deviations, Standard Scores, and Areas Under the Normal Curve Fall, 2008.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
DIRECTIONAL HYPOTHESIS The 1-tailed test: –Instead of dividing alpha by 2, you are looking for unlikely outcomes on only 1 side of the distribution –No.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
Testing the Differences between Means Statistics for Political Science Levin and Fox Chapter Seven 1.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
Copyright © Cengage Learning. All rights reserved. 15 Distribution-Free Procedures.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Statistical analysis.
Statistical NLP: Lecture 7
Unit 5: Hypothesis Testing
Statistical analysis.
Many slides from Rada Mihalcea (Michigan), Paul Tarau (U.North Texas)
Corpora and Statistical Methods
Roberto Battiti, Mauro Brunato
Review: What influences confidence intervals?
Significance Tests: The Basics
Significance Tests: The Basics
Sampling Distributions
Presentation transcript:

1 Natural Language Processing (3b) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University

2  Lexicons and Lexical Analysis  Collocation Outline

3 Lexicons and Lexical Analysis (254) Collocation (35) Hypothesis Testing (1)  One difficulty that we have glossed over so far is that high frequency and low variance can be accidental.  For example, if the two constituent words of a frequent bigram like new companies are frequently occurring words (as new and companies are), then we expect the two words to co-occur a lot just by chance, even if they do not form a collocation.

4 Lexicons and Lexical Analysis (255) Collocation (36) Hypothesis Testing (2)  What we really want to know is whether two words occur together more often than chance.  Assessing whether or not something is a chance event is one of the classical problems of statistics. It is usually couched in terms of hypothesis testing.

5 Lexicons and Lexical Analysis (256) Collocation (37) Hypothesis Testing (3)  We formulate a null hypothesis H 0 that there is no association between the words beyond chance occurrences, compute the probability p that the event would occur.  If H 0 were true, and then reject H if p is too low (typically if beneath a significance level of p < 0.05, 0.01, , or 0.001) and retain H 0 as possible otherwise.

6 Lexicons and Lexical Analysis (257) Collocation (38) Hypothesis Testing (4)  How can we apply the methodology of hypothesis testing to the problem of finding collocations?  We first need to formulate a null hypothesis which states what should be true if two words do not form a collocation.  For such a free combination of two words we will assume that each of the words w 1 and w 2 is generated completely independently of the other, and so their chance of coming together is simply given by:

7 Lexicons and Lexical Analysis (258) Collocation (39) Hypothesis Testing (5)  The model implies that the probability of co-occurrence is just the product of the probabilities of the individual words.  This is a rather simplistic model, and not empirically accurate, but for now we adopt independence as our null hypothesis.

8 Lexicons and Lexical Analysis (259) Collocation (40) The T Test (1)  We need a statistical test that tells us how probable or improbable it is that a certain constellation will occur. A test that has been widely used for collocation discovery is the t test.  The t test looks at the mean and variance of a sample of measurements, where the null hypothesis is that the sample is drawn from a distribution with mean μ.

9 Lexicons and Lexical Analysis (260) Collocation (41) The T Test (2) where is the sample mean, is the sample variance, N is the sample size, andμis the mean of the distribution.  If the t statistic is large enough we can reject the null hypothesis.

10 Lexicons and Lexical Analysis (261) Collocation (42) The T Test (3)  The test t looks at the difference between the observed ( ) and expected (μ) means, scaled by the variance of the data.  It tells us how likely one is to get a sample of that mean and variance (or a more extreme mean and variance) assuming that the sample is drawn from a normal distribution with mean μ.

11 Lexicons and Lexical Analysis (262) Collocation (43) The T Test (4)  For instance, our null hypothesis is that the mean height of a population of men is 158cm.  We are given a sample of 200 men with = 169 and = 2600 and want to know whether this sample is from the general population (the null hypothesis) or whether it is from a different population of smaller men.

12 Lexicons and Lexical Analysis (263) Collocation (44) The T Test (5)  This gives us the following t according to the above formula:  We can also find out exactly how large it has to be by looking up the table of the t distribution.

13 Lexicons and Lexical Analysis (264) Collocation (45) The T Test (6)  If we look up the value of t that corresponds to a confidence level of α= 0.005, we will find Since the t we got is larger than 2.576, we can reject the null hypothesis with 99.5% confidence.  So we can say that the sample is not drawn from a population with mean 158cm, and our probability of error is less than 0.5%.

14 Lexicons and Lexical Analysis (265) Collocation (46) The T Test (7)  To see how to use the t test for finding collocations, let us compute the t value for new companies.  We think of the text corpus as a long sequence of N bigrams, and the samples are then indicator random variables that take on the value 1 when the bigram of interest occurs, and are 0 otherwise.

15 Lexicons and Lexical Analysis (266) Collocation (47) The T Test (8)  Using maximum likelihood estimates, we can compute the probabilities of new and companies as follows.  In the corpus, new occurs 15,828 times, companies 4,675 times, and there are 14,307,668 tokens overall.

16 Lexicons and Lexical Analysis (267) Collocation (48) The T Test (9)  The null hypothesis is that occurrences of new and companies are independent.  If the null hypothesis is true, then the process of randomly generating bi-grams of words and assigning 1 to the outcome new companies and 0 to any other outcome can be treated as a Bernoulli trial.

17 Lexicons and Lexical Analysis (268) Collocation (49) The T Test (10)  The mean for this distribution is ; and the variance is, which is approximately p. The approximation holds since for most bigrams p is small.  It turns out that there are actually 8 occurrences of new companies among the bigrams in our corpus. So, for the sample, we have that the sample mean is:

18 Lexicons and Lexical Analysis (269) Collocation (50) The T Test (11)  Now we have everything we need to apply the t test:  This t value of is not larger than 2.576, the critical value for α= So we cannot reject the null hypothesis that new and companies occur independently and do not form a collocation.

19 Lexicons and Lexical Analysis (270) Collocation (51) The T Test (12)

20 Lexicons and Lexical Analysis (271) Collocation (52) The T Test (13)  The above table shows t values for ten bigrams that occur exactly 20 times in the corpus.  For the top five bigrams, we can reject the null hypothesis that the component words occur independently for α= 0.005, so these are good candidates for collocations.  The bottom five bigrams fail the test for significance, so we will not regard them as good candidates for collocations.

21 Lexicons and Lexical Analysis (272) Collocation (53) The T Test (14)  Note that a frequency-based method would not be able to rank the ten bigrams since they occur with exactly the same frequency.  We can see that the t test takes into account the number of co- occurrences of the bigram relative to the frequencies of the component words.

22 Lexicons and Lexical Analysis (273) Collocation (54) The T Test (15)  If a high proportion of the occurrences of both words (Ayatollah Ruhollah, videocassette recorder) or at least a very high proportion of the occurrences of one of the words (unsalted) occurs in the bigram, then its t value is high.  This criterion makes intuitive sense.

23 Lexicons and Lexical Analysis (274) Collocation (55) The T Test (16)  The analysis in the table includes some stop words (Note: A stop word is a word that is common and frequently used, such as the, a, for, of, etc.) – without stop words, it is actually hard to find examples that fail significance.  It turns out that most bigrams attested in a corpus occur significantly more often than chance. For 824 out of the 831 bigrams that occurred 20 times in our corpus the null hypothesis of independence can be rejected.

24 Lexicons and Lexical Analysis (275) Collocation (56) The T Test (17)  But we would only classify a fraction as true collocations. The reason for this surprisingly high proportion of possibly dependent bigrams is that language – if compared with a random word generator – is very regular so that few completely unpredictable events happen.  The t test and other statistical tests are most useful as a method for ranking collocations. The level of significance itself is less useful.

25 Lexicons and Lexical Analysis (276) Collocation (57) Mutual Information (1)  The entropy (or self-information) is the average uncertainty of a single random variable: H(p) = H(X) = -∑p(x)log 2 p(x) x ∈ χ Note: Let p(x) be the probability mass function of a random variable X, over a discrete set of symbols (or alphabet) χ: p(x) = P (X = x), x ∈ χ

26 Lexicons and Lexical Analysis (277) Collocation (58) Mutual Information (2)  Entropy measures the amount of information in a random variable. It is normally measured in bits (hence the log to the base 2), but using any other base yields only a linear scaling of results. For example, suppose you are reporting the result of rolling an 8-sided die. Then the entropy is: H(X) = -∑p(i)log 2 p(i) = -∑ log = -log =log 8 = 3 bits i=1 i=

27 Lexicons and Lexical Analysis (278) Collocation (59) Mutual Information (3)  The joint entropy of a pair of discrete random variables X, Y is the amount of information needed on average to specify both their values. It is defined as: H(X, Y) = - ∑∑ p(x, y)logp(x, y) x ∈ χy ∈ У

28 Lexicons and Lexical Analysis (279) Collocation (60) Mutual Information (4)  The condition entropy of a discrete random variables Y given another X, for X, Y, p(x, y), expresses how much extra information you still need to supply on average to communicate Y given that the other party knows X: H(Y|X) = ∑p(x) H(Y|X=x) = ∑p(x) [-∑ p(y|x)logp(y|x)] x ∈ χ x ∈ χ y ∈ У = - ∑ ∑ p(x, y)logp(y|x) x ∈ χy ∈ У

29 Lexicons and Lexical Analysis (280) Collocation (61) Mutual Information (5)  There is a Chain rule for entropy: H(X, Y) = H(X) + H(Y|X) H(X 1, …, X n ) = H(X 1 ) + H(X 2 |X 1 ) + … + H(X n |X 1, X 2, …, X n-1 )  By this Chain rule: H(X, Y) = H(X) + H(Y|X) = H(Y) + H(X|Y), therefore, H(X) - H(X|Y) = H(Y) - H(Y|X)

30 Lexicons and Lexical Analysis (281) Collocation (62) Mutual Information (6)  This difference is called the mutual information between X and Y.  It is the reduction in uncertainty of one random variable due to knowing about another.  In other words, the amount of information one random variable contains about another.

31 Lexicons and Lexical Analysis (282) Collocation (63) Mutual Information (7) H(X) H(Y) H(X, Y) I(X; Y) H(X|Y) H(Y|X)

32 Lexicons and Lexical Analysis (283) Collocation (64) Mutual Information (8)  Mutual information is a symmetric, non-negative measure of the common information in the two variables.  People often think of mutual information as a measure of dependence between variables.  However, it is actually better to think of it as a measure of independence because:

33 Lexicons and Lexical Analysis (284) Collocation (65) Mutual Information (9)  It is 0 only when two variables are independent, but  For two dependent variables, mutual information grows not only with the degree of dependence, but also according to the entropy of the variables.  I(X; Y) = H(X) - H(X|Y) = H(X) + H(Y) - H(X, Y) 1 1 = ∑p(x)log + ∑p(y)log + ∑p(x, y)logp(x, y) χ p(x) У p(y) χ, У

34 Lexicons and Lexical Analysis (285) Collocation (66) Mutual Information (10) p(x, y) = ∑p(x, y)log χ, У p(x) p(y) Since H(X|X) = 0, note that: H(X) = H(X) – H(X|X) = I(X; X)  This illustrates both why entropy is also called self- information, and how the mutual information between two totally dependent variables is not constant but depends on their entropy.

35 Lexicons and Lexical Analysis (286) Collocation (67) Mutual Information (11)  An information-theoretically motivated measure for discovering interesting collocations is pointwise mutual information (Church et al. (1991), Church & Hanks (1989) and Hindle (1990)).  Fano (1961) originally defined mutual information between particular events x’ and y’, in our case the event is occurrence of particular words.

36 Lexicons and Lexical Analysis (287) Collocation (68) Mutual Information (12)  This type of mutual information is roughly a measure of how much one word tells us about the other.

37 Lexicons and Lexical Analysis (288) Collocation (69) Mutual Information (13)  These two types of mutual information are quite different creatures.  When we apply this definition to the 10 collocations from the previous table, we get the same ranking as with the t test. See the following table:

38 Lexicons and Lexical Analysis (289) Collocation (70) Mutual Information (14)

39 Lexicons and Lexical Analysis (290) Collocation (71) Mutual Information (15)  As usual, we use maximum likelihood estimates to compute the probabilities, for example:  The mutual information measure tells us that the amount of information we have about the occurrence of Ayatollahat position i in the corpus increases by bits if we are told that Ruhollah occurs at position i + 1.

40 Lexicons and Lexical Analysis (291) Collocation (72) Mutual Information (16)  In other words, we can be much more certain that Ruhollah will occur next if we are told that Ayatollah is the current word.  Unfortunately, this measure of “increased information” is in many cases not a good measure of what an interesting correspondence between two events is.  Consider the two examples in the following table of counts of word correspondences between French and English sentences in the Hansard corpus, an aligned corpus of debates of the Canadian parliament.

41 Lexicons and Lexical Analysis (292) Collocation (73) Mutual Information (17) Note: χ 2 test is Pearson’s chi-square test. The χ 2 statistic sums the differences between observed and expected values in all squares of the table, scaled by the magnitude of the expected values.

42 Lexicons and Lexical Analysis (293) Collocation (74) Mutual Information (18)  The reason that house frequently appears in translations of French sentences containing chambre and communes is that the most common use of house is the phrase House of Commons which corresponds to Chambre de communes in French.  But it is easy to see that communes is a worse match for house than chambre since most occurrences of house occur without communes on the French side.

43 Lexicons and Lexical Analysis (294) Collocation (75) Mutual Information (19)  The χ 2 test is able to infer the correct correspondence whereas mutual information gives preference to the incorrect pair (communes, house).  The word communes in the French makes it more likely that house occurred in the English than chambre does.  The higher mutual information value for communes reflects the fact that communes causes a larger decrease in uncertainty.

44 Lexicons and Lexical Analysis (295) Collocation (76) Mutual Information (20)  In contrast, the χ 2 is a direct test of probabilistic dependence, which in this context we can interpret as the degree of association between two words and hence as a measure of their quality as translation pairs and collocations.  The next table shows a second problem with using mutual information for finding collocations.

45 Lexicons and Lexical Analysis (296) Collocation (77) Mutual Information (21)

46 Lexicons and Lexical Analysis (297) Collocation (78) Mutual Information (22)  We show ten bigrams that occur exactly once in the first 1000 documents of the reference corpus and their mutual information score based on the 1000 documents.  The right half of the table shows the mutual information score based on the entire reference corpus (about 23,000 documents).

47 Lexicons and Lexical Analysis (298) Collocation (79) Mutual Information (23)  The larger corpus of 23,000 documents makes some better estimates possible, which in turn leads to a slightly better ranking.  The bigrams marijuana growing and new converts (arguably collocations) have moved up and Reds survived (definitely not a collocation) has moved down.

48 Lexicons and Lexical Analysis (299) Collocation (80) Mutual Information (24)  However, what is striking is that even after going to a 10 times larger corpus 6 of the bigrams still only occur once. As a consequence, they have inaccurate maximum likelihood estimates and artificially inflated mutual information scores.  All 6 are not collocations and we would prefer a measure which ranks them accordingly.

49 Lexicons and Lexical Analysis (300) Collocation (81) Mutual Information (25)  None of the measures we have seen works very well for low- frequency events.  But there is evidence that sparseness is a particularly difficult problem for mutual information.  Consider two extreme cases: perfect dependence of the occurrences of the two words and perfect independence of that.

50 Lexicons and Lexical Analysis (301) Collocation (82) Mutual Information (26)  For perfect dependence (they only occur together ) we have: That is, among perfectly dependent bigrams, as they get rarer, their mutual information increases.

51 Lexicons and Lexical Analysis (302) Collocation (83) Mutual Information (27)  For perfect independence (the occurrence of one does not give us any information about the occurrence of the other ) we have:

52 Lexicons and Lexical Analysis (303) Collocation (84) Mutual Information (28)  We can say that mutual information is a good measure of independence. Values close to 0 indicate independence (independent of frequency).  But it is a bad measure of dependence because for dependence the score depends on the frequency of the individual words.

53 Lexicons and Lexical Analysis (304) Collocation (85) Mutual Information (29)  Other things being equal, bigrams composed of low- frequency words will receive a higher score than bigrams composed of high-frequency words.  That is the opposite of what we would want a good measure to do since higher frequency means more evidence and we would prefer a higher rank for bigrams for whose interestingness we have more evidence.

54 Lexicons and Lexical Analysis (305) Collocation (86) Mutual Information (30)  One solution that has been proposed for this is to use a cutoff and to only look at words with a frequency of at least 3. However, such a move does not solve the underlying problem, but only ameliorates its effects.  Since pointwise mutual information does not capture the intuitive notion of an interesting collocation very well, it is often not used when it is made available in practical applications.

55 Lexicons and Lexical Analysis (306) Collocation (87) Mutual Information (31)  The definition of mutual information used here is common in corpus linguistic studies, but is less common in Information Theory.  It is important to check what a mathematical concept is a formalization of.  As we have seen, pointwise mutual information is of limited utility for acquiring the types of linguistic properties.

56 Lexicons and Lexical Analysis (307) Collocation (88) Summary (1)  There are actually different definitions of the notion of collocation.  For instance, a sequence of two or more consecutive words, that has characteristics of a syntactic and semantic unit, and whose exact and unambiguous meaning or connotation cannot be derived directly from the meaning or connotation of its components (Choueka, 1988).

57 Lexicons and Lexical Analysis (308) Collocation (89) Summary (2)  The following criteria are typical of linguistic treatments of collocations. Non-compositionality is the main one we have relied on here.  Non-compositionality. The meaning of a collocation is not a straightforward composition of the meanings of its parts. Either the meaning is completely different from the free combination (such as idioms) or there is a connotation or added element of meaning that cannot be predicted from the parts.

58 Lexicons and Lexical Analysis (309) Collocation (90) Summary (3)  Non-substitutability. We cannot substitute near-synonyms for the components of a collocation. For example, we can’t say yellow wine instead of white wine even though yellow is as a good description of the color of white wine as white is (it is kind of a yellowish white).

59 Lexicons and Lexical Analysis (310) Collocation (91) Summary (4)  Non-modifiability. Many collocations cannot be freely modified with additional lexical material or through grammatical transformations. This is especially true for frozen expressions like idioms. For example, we can’t modify frog in to get a frog in one’s throat ( 喉咙不适 ) into to get an ugly frog in one’s throat although usually nouns like frog can be modified by adjectives like ugly.

60 Lexicons and Lexical Analysis (311) Collocation (92) Summary (5)  A nice way to test whether a combination is a collocation is to translate it into another language.  If we cannot translate the combination word by word, then that is evidence that we are dealing with a collocation.  For example, translating make a decision into French one word at a time we get faire une décision which is incorrect.

61 Lexicons and Lexical Analysis (312) Collocation (93) References K. W. Church and P. Hanks Word Association Norms, Mutual Information and Lexicography. Computational Linguistics, Vol. 16, No.1. T. Fontenelle et al Survey of Collocation Extraction Tools. Technical Report, University of Liege, Liege, Belgium. J. Hodges et al An Automated System that Assists in the Generation of Document Indexes. Natural Language Engineering No. 2.

62 Lexicons and Lexical Analysis (313) Assignment (8) 1.As we pointed out previously, almost all bigrams occur significantly more often than chance if a stop list is used for prefiltering. Verify that there is a large proportion of bigrams that occur less often than chance if we do not filter out function words. Note: A function word is a word which have no lexical meaning, and whose sole function is to express grammatical

63 Lexicons and Lexical Analysis (314) Assignment (8) relationships, such as prepositions, articles, and conjunctions. 2.What is the difference between I(x, y) and I(X; Y)?