Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield.

Slides:



Advertisements
Similar presentations
Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.
Advertisements

What are the characteristics of academic journals
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
C ONCLUSION & A BSTRACT RESEARCH METHOD FOR ACADEMIC PROJECT I.
Lexical chains for summarization a summary of Silber & McCoy’s work by Keith Trnka.
IVITA Workshop Summary Session 1: interactive text analytics (Session chair: Professor Huamin Qu) a) HARVEST: An Intelligent Visual Analytic Tool for the.
Information Retrieval in Practice
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.
Project Workshops Assessment. 2 Deadlines and Deliverables No later than 16:00 on Tuesday, Week 21 in the Easter Term (second Tuesday) This is a hard.
Introduction to Information Extraction Chia-Hui Chang Dept. of Computer Science and Information Engineering, National Central University, Taiwan
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Introduction to Language Models Evaluation in information retrieval Lecture 4.
Overview of Search Engines
Indexing Overview Approaches to indexing Automatic indexing Information extraction.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
A Research Perspective on Text Mining: Tasks, Technologies and Prototype Applications Robert Gaizauskas Natural Language Processing Group Departments of.
Evaluating Language Processing Applications and Components PROPOR’03 Faro Robert Gaizauskas Natural Language Processing Group Department of Computer Science.
Dr. Alireza Isfandyari-Moghaddam Department of Library and Information Studies, Islamic Azad University, Hamedan Branch
What is Readability?  A characteristic of text documents..  “the sum total of all those elements within a given piece of printed material that affect.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008.
IE (Wilks)-1 Information Extraction: Beyond Document Retrieval Robert Gaizauskas and Yorick Wilks Computational Linguistics and Chinese Language Processing.
1 The BT Digital Library A case study in intelligent content management Paul Warren
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
Chuck Humphrey Data Library Co-ordinator University of Alberta May 16, Capitalising on Metadata Tool development plans IASSIST 2007.
Writing a Research Proposal. Today Definition and purpose of the proposal Structure of a proposal The process of writing.
Based on “Semi-Supervised Semantic Role Labeling via Structural Alignment” by Furstenau and Lapata, 2011 Advisors: Prof. Michael Elhadad and Mr. Avi Hayoun.
1 Text Summarization: News and Beyond Kathleen McKeown Department of Computer Science Columbia University.
1 Literature review. 2 When you may write a literature review As an assignment For a report or thesis (e.g. for senior project) As a graduate student.
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
Amy Dai Machine learning techniques for detecting topics in research papers.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
1 Two Applications of Information Extraction to Biological Science Journal Articles: Enzyme Interactions and Protein Structures Kevin Humphreys, George.
Information Extraction and Automatic Summarisation *
SIRS Discoverer An Introduction. About SIRS Discoverer A beginning reference database for middle and elementary students. Articles and graphics from 1900.
Processing of large document collections Part 5 (Text summarization) Helena Ahonen-Myka Spring 2005.
SPRINGER ONLINE
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Doing your literature review: an overview Katy Jordan Librarian, Social & Policy Sciences Library & Learning Centre.
Information Transfer through Online Summarizing and Translation Technology Sanja Seljan*, Ksenija Klasnić**, Mara Stojanac*, Barbara Pešorda*, Nives Mikelić.
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Headline Generation Based on Statistical Translation Michele Banko Computer Science Department Johns Hopkins University Vibhu O.Mittal Just Research 報告人.
Information Extraction from Single and Multiple Sentences Mark Stevenson Department of Computer Science University of Sheffield, UK.
Processing of large document collections Part 1 (Introduction) Helena Ahonen-Myka Spring 2006.
Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)
Notetaking Using Note Cards for Your Research Paper.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
INGENTA GATEWAY PORTAL
1 CS 430: Information Discovery Lecture 5 Ranking.
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
Chapter 5 Longer Reports Copyright © 2012 Pearson Canada Inc., Toronto, Ontario.
SUMMARIES The short version. What is it? A summary is a brief restatement of the main ideas of a written text. They are written in your own words and.
Information Extraction. Extracting Information from Text System : When would you like to meet Peter? User : Let’s see, if I can, I’d like to meet him.
LaSIE: The Large Scale Information Extraction System Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield.
Introduction to Information Extraction
Mark Chavira Ulises Robles
Presentation transcript:

Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield

January, 2001 AKT Workshop Outline Terminology Approach 1: Generation from Templates Approach 2: Coreference Chains Approach 3: Statistical

January, 2001 AKT Workshop Terminology Extract vs Abstract Extract - subset of the sentences in the original Abstract - fusion of topics in original + text generation Generic vs User-focused Generic - captures essence of text, independent of user’s interests User-focused – summarises content wrt a particular user interest Indicative vs Informative Indicative – indicates whether document should be examined in more detail Informative – serves as a surrogate for original

January, 2001 AKT Workshop Approach 1: Generation from Templates To generate user-focused informative abstracts we have used an IE system + simple NL generation techniques to produce simple summaries

January, 2001 AKT Workshop Example: A Wall Street Journal Article wsj94_ Who's Burns Fry Ltd. 04/13/94 WALL STREET JOURNAL (J), PAGE B10 MER SECURITIES (SCR) BURNS FRY Ltd. (Toronto) -- Donald Wright, 46 years old, was named executive vice president and director of fixed income at this brokerage firm. Mr. Wright resigned as president of Merrill Lynch Canada Inc., a unit of Merrill Lynch & Co., to succeed Mark Kassirer, 48, who left Burns Fry last month. A Merrill Lynch spokeswoman said it hasn't named a successor to Mr. Wright, who is expected to begin his new position by the end of the month.

January, 2001 AKT Workshop Example: BNF Definition of a Management Succession Event Template (MUC-6) := DOC_NR: "NUMBER" ^ CONTENT: * := ORGANIZATION: ^ POST: "POSITION TITLE" | "no title" ^ IN_AND_OUT: + VACANCY_REASON: {DEPART_WORKFORCE, REASSIGNMENT, NEW_POST_CREATED, OTH_UNK} ^ := PERSON: ^ NEW_STATUS: {IN, IN_ACTING, OUT, OUT_ACTING} ^ ON_THE_JOB: {YES, NO, UNCLEAR} OTHER_ORG: - REL_OTHER_ORG: {SAME_ORG, RELATED_ORG, OUTSIDE_ORG} - := ORG_NAME: "NAME" - ORG_ALIAS: "ALIAS" * ORG_DESCRIPTOR: "DESCRIPTOR" - ORG_TYPE: {GOVERNMENT, COMPANY, OTHER} ^ ORG_LOCALE: LOCALE_STRING {{CITY, PROVINCE, COUNTRY, REGION, UNK} * ORG_COUNTRY: NORMALIZED-COUNTRY-or-REGION | COUNTRY-or-REGION-STRING * := PER_NAME: "NAME" - PER_ALIAS: "ALIAS" * PER_TITLE: "TITLE" *

January, 2001 AKT Workshop := DOC_NR: " " CONTENT: := SUCCESSION_ORG: POST: "executive vice president" IN_AND_OUT: VACANCY_REASON: OTH_UNK := := IO_PERSON: IO_PERSON: NEW_STATUS: OUT NEW_STATUS: IN ON_THE_JOB: NO ON_THE_JOB: NO OTHER_ORG: REL_OTHER_ORG: OUTSIDE_ORG := := ORG_NAME: "Burns Fry Ltd.“ ORG_NAME: "Merrill Lynch Canada Inc." ORG_ALIAS: "Burns Fry“ ORG_ALIAS: "Merrill Lynch" ORG_DESCRIPTOR: "this brokerage firm“ ORG_DESCRIPTOR: "a unit of Merrill Lynch & Co." ORG_TYPE: COMPANY ORG_TYPE: COMPANY ORG_LOCALE: Toronto CITY ORG_COUNTRY: Canada := := PER_NAME: "Mark Kassirer" PER_NAME: "Donald Wright" PER_ALIAS: "Wright" PER_TITLE: "Mr." Example: A (Partially) Filled Management Succession Event Template

January, 2001 AKT Workshop Example: One Use for a Template - Generating a Summary From the completely filled version of the preceding template the LaSIE system generates the following natural language summary: BURNS FRY Ltd. named Donald Wright as executive vice president. Donald Wright resigned as president of Merrill Lynch Canada Inc.. Mark Kassirer left as president of BURNS FRY Ltd. Producing summaries in other languages is relatively easy (compared to full machine translation).

January, 2001 AKT Workshop Approach 2: Coreference Chains To generate generic informative extracts we have used coreference chains

January, 2001 AKT Workshop Approach 2: Coreference Chains (cont) Background: Morris and Hirst (’94) investigated lexical chains – chains of lexically-related words in a text that serve to make texts cohere Barzilay + Elhadad (’97) suggested using lexical chains as a basis for selecting sentences to form a summary – rank chains based on number of links + extent over text Halliday and Hassan (’76) proposed coreference as another major factor contributing to coherence of NL texts Idea: Explore use of coreference chains to produce summaries

January, 2001 AKT Workshop Approach 2: Coreference Chains (cont) Technique Use LaSIE to carry out discourse analysis of text, including coreference resolution Extract all coreference chains Rank chains by a metric which counts chain length + extent + starting point Intuition: entities which occur most frequently and most widely in a text are those which the text is most “about” Depending on desired summary length, select m sentences from top n chains Details in Azzam, Humphreys and Gaizauskas ’99

January, 2001 AKT Workshop Approach 3: Statistical To generate generic indicative extracts we have used a stastical approach based on a set of factors

January, 2001 AKT Workshop Approach 3: Statistical (cont) Factors which have been examined in selecting sentences for inclusion in extractive summaries include: number of content words shared with title/headings (T) presence of “cue words” (C) location of sentence in text (L) number of content words discriminative of current text as opposed to corpus of texts from which it is drawn, using, e.g. tf-idf measure (K)

January, 2001 AKT Workshop Approach 3: Statistical (cont) Assign a weight to each sentence according to a weighted linear combination of these factors Learn weights to optimise sentence selection as measured against a corpus of extracts + texts Select top ranked sentences up to desired summary length