An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,

Slides:



Advertisements
Similar presentations
CS6800 Advanced Theory of Computation
Advertisements

1 Transportation problem The transportation problem seeks the determination of a minimum cost transportation plan for a single commodity from a number.
On the Genetic Evolution of a Perfect Tic-Tac-Toe Strategy
Tuesday, May 14 Genetic Algorithms Handouts: Lecture Notes Question: when should there be an additional review session?
Genetic Algorithms Representation of Candidate Solutions GAs on primarily two types of representations: –Binary-Coded –Real-Coded Binary-Coded GAs must.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Genetic Algorithms and Their Applications John Paxton Montana State University August 14, 2003.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Evolutionary Algorithms Simon M. Lucas. The basic idea Initialise a random population of individuals repeat { evaluate select vary (e.g. mutate or crossover)
Genetic Algorithm for Variable Selection
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Artificial Intelligence Genetic Algorithms and Applications of Genetic Algorithms in Compilers Prasad A. Kulkarni.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2002.
Basic Scientific Writing in English Lecture 3 Professor Ralph Kirby Faculty of Life Sciences Extension 7323 Room B322.
Chapter 14 Genetic Algorithms.
Genetic Algorithms Nehaya Tayseer 1.Introduction What is a Genetic algorithm? A search technique used in computer science to find approximate solutions.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2004.
16 November, 2005 Statistics in HEP, Manchester 1.
Reliability-Redundancy Allocation for Multi-State Series-Parallel Systems Zhigang Tian, Ming J. Zuo, and Hongzhong Huang IEEE Transactions on Reliability,
Genetic Programming.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Genetic Algorithm.
Towers of Hanoi. Introduction This problem is discussed in many maths texts, And in computer science an AI as an illustration of recursion and problem.
IMSS005 Computer Science Seminar
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.
© Negnevitsky, Pearson Education, Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming Evolution strategies Evolution.
How to Improve Your Communication of Ideas in an Essay.
CS 484 – Artificial Intelligence1 Announcements Lab 3 due Tuesday, November 6 Homework 6 due Tuesday, November 6 Lab 4 due Thursday, November 8 Current.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
Zorica Stanimirović Faculty of Mathematics, University of Belgrade
Genetic Algorithms Michael J. Watts
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
Genetic algorithms Charles Darwin "A man who dares to waste an hour of life has not discovered the value of life"
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
1 Chapter 14 Genetic Algorithms. 2 Chapter 14 Contents (1) l Representation l The Algorithm l Fitness l Crossover l Mutation l Termination Criteria l.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
GENETIC ALGORITHM A biologically inspired model of intelligence and the principles of biological evolution are applied to find solutions to difficult problems.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Processing of large document collections Part 5 (Text summarization) Helena Ahonen-Myka Spring 2005.
1 Genetic Algorithms K.Ganesh Introduction GAs and Simulated Annealing The Biology of Genetics The Logic of Genetic Programmes Demo Summary.
Genetic Algorithms Przemyslaw Pawluk CSE 6111 Advanced Algorithm Design and Analysis
Genetic Algorithms CSCI-2300 Introduction to Algorithms
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Killer Titles Using Entitlement: A tool for promoting discussion around the relationships between a research paper's content, citations, abstract and title.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
GENETIC ALGORITHM Basic Algorithm begin set time t = 0;
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Writing Technical Reports in Science Writing in Science Writing in Science.
Genetic Algorithms. Underlying Concept  Charles Darwin outlined the principle of natural selection.  Natural Selection is the process by which evolution.
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
Overview Last two weeks we looked at evolutionary algorithms.
1 Comparative Study of two Genetic Algorithms Based Task Allocation Models in Distributed Computing System Oğuzhan TAŞ 2005.
Advanced AI – Session 6 Genetic Algorithm By: H.Nematzadeh.
Genetic Algorithm(GA)
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
March 1, 2016Introduction to Artificial Intelligence Lecture 11: Machine Evolution 1 Let’s look at… Machine Evolution.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Intelligent Exploration for Genetic Algorithms Using Self-Organizing.
Chapter 14 Genetic Algorithms.
Bulgarian Academy of Sciences
Example: Applying EC to the TSP Problem
Presentation transcript:

An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities, Languages and Social Sciences University of Wolvehampton Proceeding of the ACL 2003 Workshop on Multilingual Summarization and Question Answering

Introduction There are two main approaches for producing automatic summarizations. Extract and rearrange Understand and generate Given that “ understand ” a text is usually domain-specific, extraction methods are preferred when robustness needed. Here we present a novel approach to improve the quality of summarization by ameliorating their local cohesion.

Continuity Principle Use the continuity principle defined in Centering Theory (Grosz et al., 1995) to improve the quality. This principle requires that two consecutive utterances have at least one entity in common. In general utterances are clauses or sentences, here we consider sentences as utterances. We try to produce summaries which do not violate the continuity principle. Produce sequences of sentences that refer the same entity, and therefore be more coherent.

Corpus Investigation We consider two utterances have an entity in common if the same head noun phrase appear in both utterances. Use the FDG tagger to determine the head of noun phrases. We investigated 146 human produced abstracts from the Journal of Artificial Intelligence Research and almost 75% satisfy the principle.

Use CP in Summarization and Text Generation In order to produce a summary which violate continuity principle least, we score a sentence use both content and context information. Karamanis and Manurung (2002) used the CP in text generation, however, summarization is harder because it needs firstly identify the important information in the document. Another difference is that we do not intend to change the order of the extracted sentences because preliminary experiments did not lead to any promising results.

Content-based scoring The existing heuristics are: Keyword method: TFIDF scores of words, the score of sentence is the sum of scores of words. Indicator phrase method: such as in this paper, we present, in conclusion (meta-discourse markers), … Location method: sentences in the first and last 13 paragraphs have their scores boosted. Title and headers method: sentences containing the words in title and headers have their score boosted. Special formatting rules: sentences that contain equations are excluded. The score of a sentence is a weighted function of these parameters established through experiments. One of the most important heuristics proved to be the indicating phrase method.

Context-based scoring Depending on the context in which a sentence appears in a summary, its score can be boosted or penalized. If the continuity principle satisfied with either the sentence that precedes or follows it the score boosted, otherwise penalized. After experiment we decide to boost the sentence ’ s score with the TFIDF scores of the common NPs ’ heads and penalize with the highest TFIDF score in the document.

The Greedy Algorithm Extract the highest scored sentence from those not extracted yet. Scores are computed in the way described above. Given the original order of sentences is maintained, the algorithm in Figure 1 is performed.

The Greedy Algorithm At score computing stage, a sentence ’ s score is computed as if it is included in the extract. The one with highest score is extracted, repeat until the required length reached. The first extracted sentence is always the one with highest content-based score. It is possible to extract S 1 and S 2 but in a later iteration extract S 3 between S 1 and S 2 that violates continuity principle with S 2.

The Evolutionary Algorithm The inclusion of a sentence in the above method depends on sentences existing in the summary. A specific type of evolutionary algorithms are genetic algorithm which encode the problem as a series of genes, called chromosome. Our genes take integer values representing the position of sentence in document.

The Evolutionary Algorithm Genetic algorithms use a fitness function to assess how good a chromosome is, in our case the function is the sum of the scores of the sentences. Genetic algorithms use genetic operations to evolve a population of chromosomes, in our case use weighted roulette wheel selection to select chromosomes. Once several chromosomes selected, they are evolved using crossover and mutation.

The Evolutionary Algorithm We use the single point crossover operator and two mutation operators. The first one replaces the value of a gene with a randomly generated integer value (try to include random sentences in the summary). The second replaces the values of a gene with the value of the preceding gene incremented by one (introduce consecutive sentences in the summary). Start with a population of randomly generated chromosomes which is then evolves using the operators, each has a certain probability of being applied.

The Evolutionary Algorithm The best chromosome (the one with highest fitness score) during all generations is the solution to the problem. In our case we iterated a population of 500 chromosomes for 100 generations.

Evaluation and Discussion We evaluated on 10 scientific papers on Journal of Artificial Intelligence Research, total words, given that from each text we produce eight different summaries which had to be assessed by humans, the evaluation was very time consuming. The quality of a summary can be measured in terms of coherence, cohesion and informativeness. Cohesion is indicated by # of dangling anaphoric expressions. Coherence is indicated by # of ruptures in the discourse. For informativeness we compute the similarity between summary and document.

Evaluation and Discussion In evaluation, TFIDF extracts sentences with highest TFIDF scores, Basic method refers to the content-based scoring, Greedy and Evolutionary are two algorithms which additionally use the continuity principle. Noticing only slight improvement in the 3% summary, we decided to increase the length to 5% (value shown in brackets).

We consider a discourse rupture occurs when a sentence seems completely isolated from the rest of the text. Usually happens due to presence of isolated discourse markers such as firstly, however, on the other hand, … For 3% summaries, context information has little influence because the indicating phrases has greater influence on coherence than the continuity principle. When longer summaries, evolutionary algorithm better than basic method in all cases, but greedy algorithm not. We believe that the improvement is due to the discourse information used by the methods.

Even though anaphora is not directly addressed here, a subsidiary effect of improving local cohesion should decrease # of dangling references. As in the case of DR, greedy algorithm does not perform significantly better than the basic method. Most frequent dangling references were due to referring to tables, figures, definitions and theorems (e.g. As we showed in Table 3 … ).

We use a content-based evaluation metric (Donaway et al., 2000) which computes similarity between summary and document. The evolutionary algorithm does not lead to major loss of information, and for several text this method obtains highest score. In contrast, the greedy method seems to exclude useful information, for several texts, performing worse than basic method and baseline.

Conclusion and Future Work We presented two algorithms combining content and context information. Experiments show that the evolutionary method performs better in coherence and cohesion, and does not degrade the information content. One could argue that 5% summary is too long, but these summaries can be shortened by using aggregation rules where two sentences referring to the same entity merged into one. We intend to extend the experiments and test combination of centering theory ’ s principle and the evaluation using other types of texts.