Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)

2/23 Abstract Ordering information is a critical task for natural language generation applications. We describe a model that learns constraints on sentence order from a corpus of domain specific texts and an algorithm that yields the most likely order among several alternatives. We evaluate the automatically generated orderings against authored texts and against human subjects We also access the appropriateness for multi- document summarization

3/23 Introduction Structuring a set of facts into a coherent text is a non-trivial task which has received much attention in the area of concept-to-text generation The problem of finding an acceptable ordering does not arise solely in concept-to-text generation but also in the emerging field of text-to-text generation Examples of applications are single- and multi- document summarization as well as question answering

4/23 Introduction Barzilay et al. (2002) address the problem of information ordering in multi-document summarization and propose two naïve algorithms Majority ordering Select most frequent orders across input documents Issues: (Th1, Th2, Th3) and (Th3, Th1) Chronological ordering Order facts according to publication date Issues: Event Switching Based on the human study Barzily et al. further proposed an algorithm that first identifies topically related groups of sentences (e.g. lexical chains) and then orders them according to chronological information

5/23 Introduction In this paper, we introduce an unsupervised probabilistic model for text structuring that learns ordering constraints Sentences are represented by a set of informative features that can be automatically extracted without recourse to manual annotation We also propose an algorithm to construct an acceptable ordering rather the best one We propose an automatic method of evaluating the orders by measuring closeness or distance from the gold standard

6/23 Learning to Order The method The task of predicting the next sentence is dependent on its n-i previous sentences We simplify by assuming the probability of any given sentence is determined only by its previous sentence:

7/23 Learning to Order We will therefore estimate P(S i |S i-1 ) from features that express its structure and content We further assume that these features are independent and that P(S i |S i-1 ) can be estimated from the pairs in the cartesian product

8/23 Learning to Order The probability P(a (i,j) |a (i-1,k) ) is estimated as: To illustrate with an example The probability of S2 and S3 can be estimated: P(h|e), P(h|f),P(h|g), P(i|e), P(i|f), P(i|g) P(h|e) = f(h,e)/f(e)= 1/6 = 0.16

9/23 Learning to Order Determining an order The set of orders can be represented as a complete graph, where the set of orders can be represented as a complete graph, where the set of vertices V is equal to the set of sentence S and each edge u->v has a weight, the probability p(v|u) <= NP-complete problem Fortunately, Cohen et al. (1999) propose an approximate solution which ca be easily modified for our task

10/23 Learning to Order The algorithm starts by assigning each vertex v V a probability (the product of their features) The greedy algorithm then picks the node with highest probability and orders it ahead of the other nodes The selected node and its incident edges are deleted from the graph and each remaining node is now assigned the conditional probability of seeing this node The node which yields the highest conditional probability is selected and order ahead The process is repeated until the graph is empty

11/23 Learning to Order As an example

12/23 Parameter Estimation The model was trained on BILLIP corpus (30M words), a collection of texts from the Wall Street Journal (1987~89) The average story length is 19.2 sentences, 71.3% of the texts are less than 50 sentences

13/23 Parameter Estimation The corpus is distributed in a Treebank-style machine-parsed version which was produced with Charniak’s (2000) parser We also obtained a dependency-style version of the corpus using MINIPAR (Lin, 1998) From the two different version of BILLIP corpus the following features were extracted Verbs, Nouns and dependencies

14/23 Parameter Estimation Verbs We capture the lexical inter-dependencies between sentences by focusing on verbs and their precedence relationships in the corpus From the Treebank parses we extracted the verbs contained in each sentence A lemmatized version Reduce to their base forms For example in Figure 3(1): say, will, be, ask and approve A non-lemmatized version Preserve tense-related information For example in Figure 3(1): said, will be asked, to approve

15/23 Parameter Estimation Nouns We operationalize entity-based coherence for text-to-text generation by simply keeping track of the nouns attested in a sentence without taking the personal pronouns into account We extracted nouns from a lemmatized version of Treebank-style parsed corpus In case of noun compounds, only the compound head was taken into account A small set of rules was used to identify organizations, person names, locations spanning more than one word Back-off model was used to tackle unknown words Examples in sentence (1) of Figure 3:Laidlaw Transportation Ltd., shareholder, Dec 7, meeting, change, name and Laidlaw Inc. In sentence (2), company, name, business, 1984, sale and operation

16/23 Parameter Estimation Dependencies The Noun and verb features do not capture the structure of the sentences to be ordered The dependencies were obtained from the output of MINIPAR and they are represented as triples, consisting of head, a relation and object modifier (N:mod:A) Verbs(49 types), nouns(52 types), verbs and nouns (101 types) (frequency larger than one per million)

17/23 Experiments Evaluation Metric Kendall’sτis based on the number of inversions in the rankings and is defined below Example Where N: the number of objects (i.e., sentences) being ranked Inversions: the number of interchanges of consecutive elements necessary to arrange them in their natural order M1 and M2: 1-8/45 = 0.822 M1 and M3: 1-34/45 = 0.244

18/23 Experiments Experiment 1: Ordering Newswire Texts The model was trained on the BILLIP corpus and tested on 20 held- out randomly selected unseen texts (average length 15.3) The ordered output was compared against the original authored text usingτ ANOVA test

19/23 Experiments Experiment 2: Human Evaluation We compare our model’s performance against judges Twelve texts were randomly selected from the 20 texts in our test data and the texts were presented to subjects with the order of their sentences scrambled Each participant (137 volunteers, 33 per text) saw three texts randomly chosen from the pool of 12 texts and they were asked to reorder the sentences so as to produce coherent text ANOVA test

20/23 Experiments Example 3: Summarization Barzilay et al. (2002) collected ten sets of articles each consisting of two to three articles reporting the same event and simulated MULTIGEN by manually selected the sentences to be included in the final summary. Ten subjects provided orders for each summary which had an average length of 8.8 We simulated the participants’ task by using the model to produce an order for each candidate summary and then compared the difference in the orderings generated by the model and participants Note that the model was trained on the BILLIP corpus, whereas the sentences to be ordered were taken from news articles describing the same event

21/23 Experiments Example 3: Summarization Not only were the news articles unseen but also their syntactic structure was unfamiliar to the model ANOVA test

22/23 Discussion In this paper, we proposed a data intensive approach to text coherence where constraints on sentence ordering are learned from a corpus of domain-specific texts We experimented with different feature encodings and showed that lexical and syntactic information is important for the ordering task Our results indicate that the model can successfully generate orders for texts taken from the corpus on which is trained The model also compares favorably with human performance on a single- and multiple document ordering task

23/23 Discussion Future works Our evaluation metric only measures order similarities or dissimilarities How about coherent? Whether a trigram model performs better than the proposed model? The greedy algorithm implements a search procedure with a beam of width one. How about using two or three? Introducing the features that express semantic similarities across documents by relying on WordNet or on automatic clustering methods

Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)

Similar presentations

Presentation on theme: "Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)

Similar presentations

Presentation on theme: "Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)"— Presentation transcript:

Similar presentations

About project

Feedback