Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)

Slides:

Advertisements

Similar presentations

Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.

Advertisements

ADGEN USC/ISI ADGEN: Advanced Generation for Question Answering Kevin Knight and Daniel Marcu USC/Information Sciences Institute.

1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.

Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.

Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.

Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,

NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.

Probabilistic Detection of Context-Sensitive Spelling Errors Johnny Bigert Royal Institute of Technology, Sweden

LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

A Statistical Model for Domain- Independent Text Segmentation Masao Utiyama and Hitoshi Isahura Presentation by Matthew Waymost.

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.

Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.

CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.

Query session guided multidocument summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.

Mining and Summarizing Customer Reviews

AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,

What is Readability?  A characteristic of text documents..  “the sum total of all those elements within a given piece of printed material that affect.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.

Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.

Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.

1 Text Summarization: News and Beyond Kathleen McKeown Department of Computer Science Columbia University.

Processing of large document collections Part 7 (Text summarization: multidocument summarization, knowledge- rich approaches, current topics) Helena.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.

Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.

+ Simulation Design. + Types event-advance and unit-time advance. Both these designs are event-based but utilize different ways of advancing the time.

Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.

1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.

A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Query Based Event Extraction along a Timeline H.L. Chieu and Y.K. Lee DSO National Laboratories, Singapore (SIGIR 2004)

Chapter 23: Probabilistic Language Models April 13, 2004.

Algorithmic Detection of Semantic Similarity WWW 2005.

Describing Images using Inferred Visual Dependency Representations Authors : Desmond Elliot & Arjen P. de Vries Presentation of Paper by : Jantre Sanket.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.

A New Multi-document Summarization System Yi Guo and Gorge Stylios Heriot-Watt University, Scotland, U.K. (DUC2003)

Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05

Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.

An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki,

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.

Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.

PRESENTED BY: PEAR A BHUIYAN

Presentation transcript:

Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)

2/23 Abstract Ordering information is a critical task for natural language generation applications. We describe a model that learns constraints on sentence order from a corpus of domain specific texts and an algorithm that yields the most likely order among several alternatives. We evaluate the automatically generated orderings against authored texts and against human subjects We also access the appropriateness for multidocument summarization

3/23 Introduction Structuring a set of facts into a coherent text is a non-trivial task which has received much attention in the area of concept-to-text generation The problem of finding an acceptable ordering does not arise solely in concept-to-text generation but also in the emerging field of text-to-text generation Examples of applications are single- and multidocument summarization as well as question answering

4/23 Introduction Barzilay et al. (2002) address the problem of information ordering in multi-document summarization and propose two naïve algorithms Majority ordering Select most frequent orders across input documents Issues: (Th1, Th2, Th3) and (Th3, Th1) Chronological ordering Order facts according to publication date Issues: Event Switching Based on the human study Barzily et al. further proposed an algorithm that first identifies topically related groups of sentences (e.g. lexical chains) and then orders them according to chronological information

5/23 Introduction In this paper, we introduce an unsupervised probabilistic model for text structuring that learns ordering constraints Sentences are represented by a set of informative features that can be automatically extracted without recourse to manual annotation We also propose an algorithm to construct an acceptable ordering rather the best one We propose an automatic method of evaluating the orders by measuring closeness or distance from the gold standard

6/23 Learning to Order The method The task of predicting the next sentence is dependent on its n-i previous sentences We simplify by assuming the probability of any given sentence is determined only by its previous sentence:

7/23 Learning to Order We will therefore estimate P(S i |S i-1 ) from features that express its structure and content We further assume that these features are independent and that P(S i |S i-1 ) can be estimated from the pairs in the cartesian product

8/23 Learning to Order The probability P(a (i,j) |a (i-1,k) ) is estimated as: To illustrate with an example The probability of S2 and S3 can be estimated: P(h|e), P(h|f),P(h|g), P(i|e), P(i|f), P(i|g) P(h|e) = f(h,e)/f(e)= 1/6 = 0.16

9/23 Learning to Order Determining an order The set of orders can be represented as a complete graph, where the set of orders can be represented as a complete graph, where the set of vertices V is equal to the set of sentence S and each edge u->v has a weight, the probability p(v|u) <= NP-complete problem Fortunately, Cohen et al. (1999) propose an approximate solution which ca be easily modified for our task

10/23 Learning to Order The algorithm starts by assigning each vertex v V a probability (the product of their features) The greedy algorithm then picks the node with highest probability and orders it ahead of the other nodes The selected node and its incident edges are deleted from the graph and each remaining node is now assigned the conditional probability of seeing this node The node which yields the highest conditional probability is selected and order ahead The process is repeated until the graph is empty

11/23 Learning to Order As an example

12/23 Parameter Estimation The model was trained on BILLIP corpus (30M words), a collection of texts from the Wall Street Journal (1987~89) The average story length is 19.2 sentences, 71.3% of the texts are less than 50 sentences

13/23 Parameter Estimation The corpus is distributed in a Treebank-style machine-parsed version which was produced with Charniak’s (2000) parser We also obtained a dependency-style version of the corpus using MINIPAR (Lin, 1998) From the two different version of BILLIP corpus the following features were extracted Verbs, Nouns and dependencies

14/23 Parameter Estimation Verbs We capture the lexical inter-dependencies between sentences by focusing on verbs and their precedence relationships in the corpus From the Treebank parses we extracted the verbs contained in each sentence A lemmatized version Reduce to their base forms For example in Figure 3(1): say, will, be, ask and approve A non-lemmatized version Preserve tense-related information For example in Figure 3(1): said, will be asked, to approve

15/23 Parameter Estimation Nouns We operationalize entity-based coherence for text-to-text generation by simply keeping track of the nouns attested in a sentence without taking the personal pronouns into account We extracted nouns from a lemmatized version of Treebank-style parsed corpus In case of noun compounds, only the compound head was taken into account A small set of rules was used to identify organizations, person names, locations spanning more than one word Back-off model was used to tackle unknown words Examples in sentence (1) of Figure 3:Laidlaw Transportation Ltd., shareholder, Dec 7, meeting, change, name and Laidlaw Inc. In sentence (2), company, name, business, 1984, sale and operation

16/23 Parameter Estimation Dependencies The Noun and verb features do not capture the structure of the sentences to be ordered The dependencies were obtained from the output of MINIPAR and they are represented as triples, consisting of head, a relation and object modifier (N:mod:A) Verbs(49 types), nouns(52 types), verbs and nouns (101 types) (frequency larger than one per million)

17/23 Experiments Evaluation Metric Kendall’sτis based on the number of inversions in the rankings and is defined below Example Where N: the number of objects (i.e., sentences) being ranked Inversions: the number of interchanges of consecutive elements necessary to arrange them in their natural order M1 and M2: 1-8/45 = M1 and M3: 1-34/45 = 0.244

18/23 Experiments Experiment 1: Ordering Newswire Texts The model was trained on the BILLIP corpus and tested on 20 held- out randomly selected unseen texts (average length 15.3) The ordered output was compared against the original authored text usingτ ANOVA test

19/23 Experiments Experiment 2: Human Evaluation We compare our model’s performance against judges Twelve texts were randomly selected from the 20 texts in our test data and the texts were presented to subjects with the order of their sentences scrambled Each participant (137 volunteers, 33 per text) saw three texts randomly chosen from the pool of 12 texts and they were asked to reorder the sentences so as to produce coherent text ANOVA test

20/23 Experiments Example 3: Summarization Barzilay et al. (2002) collected ten sets of articles each consisting of two to three articles reporting the same event and simulated MULTIGEN by manually selected the sentences to be included in the final summary. Ten subjects provided orders for each summary which had an average length of 8.8 We simulated the participants’ task by using the model to produce an order for each candidate summary and then compared the difference in the orderings generated by the model and participants Note that the model was trained on the BILLIP corpus, whereas the sentences to be ordered were taken from news articles describing the same event

21/23 Experiments Example 3: Summarization Not only were the news articles unseen but also their syntactic structure was unfamiliar to the model ANOVA test

22/23 Discussion In this paper, we proposed a data intensive approach to text coherence where constraints on sentence ordering are learned from a corpus of domain-specific texts We experimented with different feature encodings and showed that lexical and syntactic information is important for the ordering task Our results indicate that the model can successfully generate orders for texts taken from the corpus on which is trained The model also compares favorably with human performance on a single- and multiple document ordering task

23/23 Discussion Future works Our evaluation metric only measures order similarities or dissimilarities How about coherent? Whether a trigram model performs better than the proposed model? The greedy algorithm implements a search procedure with a beam of width one. How about using two or three? Introducing the features that express semantic similarities across documents by relying on WordNet or on automatic clustering methods