1 Lecture 8 Measures of association: chi square test, mutual information, binomial distribution and log likelihood ratio
2 Experiments in Multidocument summarization (SNM’02) Summarization system based on a range of features Raises issues we have not discussed upto now Non-extractive techniques Ordering of information
3 Lead values feature Lead sentences of news articles can often make excellent brief summaries But for multi-document summaries there are several first sentences, so difficult to choose! They are information dense Can we find very informative words based on this observation Used Binomial test to decide if
4 Sample lead words
5 Verb specificity Compare “arrest” with “do” or “be” Often given subjects are very strongly associated with a verb Actors appear in movies Singers release an album Compute associations between subject nouns and verbs Use mutual association measure
6 Concept sets Frequency of words are not that reliable, even when stemming is used Synonyms, hypernyms and hyponyms from wordnet
7 Other features Location A negative value that penalizes sentencesthat appear late in the document. Publication Date Additional value to the most recent documents, on the assumption that users will want the most up-to-date information. Target Indicates the presence of the central personage in the document cluster, if one exists. Length A penalty for sentences that are below a minimum (15 words) and above a maximum (30 words). Short sentences are often require some introduction or reference resolution, or else are a kind of interjection. Long sentences can cover multiple thoughts that are often found elsewhere in the document cluster in single sentences. Others Indicates the presence of any named entity, weighted to the frequency of that entity across all documents. Pronoun A negative value on sentences that have pronouns in the beginning of the sentence.
8 Other issues Sentence ordering How to present the selected information? Even good choices might be hard to understand if they are presented in the wrong order Imagine a newspaper articles with all sentences randomly permuted Noun phrases Depend on the context
9 Extractive summary
10 Partly modified summary
11 Measures of associations For supervised learning, they can help us detrmine which features are predictive of the distinctions we want to make Chi square test from last lecture Words that are likely to appear in the first sentence rather than anywhere else Verbs that are strongly associated with a given subjects A variety of measures are defined in the Chapter 5 reading
12 2 statistic (pronounced “kai square”) A commonly used method of comparing proportions. Measures the lack of independence between a term and a category 2 statistic (CHI)
13 Is “jaguar” a good predictor for the “auto” class? We want to compare: the observed distribution above; and null hypothesis: that jaguar and auto are independent 2 statistic (CHI) Term = jaguar Term jaguar Class = auto2500 Class auto 39500
14 Under the null hypothesis: (jaguar and auto independent): How many co-occurrences of jaguar and auto do we expect? If independent: P r (j,a) = P r (j) P r (a) So, there would be N P r (j,a), i.e. N P r (j) P r (a) occurances of “jaguar” P r (j) = (2+3)/N; P r (a) = (2+500)/N; N= N(5/N)(502/N)=2510/N=2510/10005 0.25 2 statistic (CHI) Term = jaguar Term jaguar Class = auto2500 Class auto 39500
15 Under the null hypothesis: (jaguar and auto independent): How many co-occurrences of jaguar and auto do we expect? 2 statistic (CHI) Term = jaguar Term jaguar Class = auto2(0.25)500 Class auto expected: f e observed: f o
16 Under the null hypothesis: (jaguar and auto – independent): How many co-occurrences of jaguar and auto do we expect? 2 statistic (CHI) Term = jaguar Term jaguar Class = auto2(0.25)500(502) Class auto 3(4.75)9500(9498) expected: f e observed: f o
17 2 is interested in (f o – f e ) 2 /f e summed over all table entries: The null hypothesis is rejected with confidence.999, since 12.9 > (the value for.999 confidence). 2 statistic (CHI) Term = jaguar Term jaguar Class = auto2(0.25)500(502) Class auto 3(4.75)9500(9498) expected: f e observed: f o
18 There is a simpler formula for 2 : 2 statistic (CHI) N = A + B + C + D A = #(t,c)C = #(¬t,c) B = #(t,¬c)D = #(¬t, ¬c)
19 Finding translation equivalents
20 Binomial distribution k—number of “successes” n—number of trails x—probability of success
21 Log likelihood ratio test
22 Log likelihood ratio test