Presentation is loading. Please wait.

Presentation is loading. Please wait.

Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield.

Similar presentations


Presentation on theme: "Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield."— Presentation transcript:

1 Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield

2 January, 2001 AKT Workshop Outline Terminology Approach 1: Generation from Templates Approach 2: Coreference Chains Approach 3: Statistical

3 January, 2001 AKT Workshop Terminology Extract vs Abstract Extract - subset of the sentences in the original Abstract - fusion of topics in original + text generation Generic vs User-focused Generic - captures essence of text, independent of user’s interests User-focused – summarises content wrt a particular user interest Indicative vs Informative Indicative – indicates whether document should be examined in more detail Informative – serves as a surrogate for original

4 January, 2001 AKT Workshop Approach 1: Generation from Templates To generate user-focused informative abstracts we have used an IE system + simple NL generation techniques to produce simple summaries

5 January, 2001 AKT Workshop Example: A Wall Street Journal Article wsj94_008.0212 940413-0062. Who's News: @ Burns Fry Ltd. 04/13/94 WALL STREET JOURNAL (J), PAGE B10 MER SECURITIES (SCR) BURNS FRY Ltd. (Toronto) -- Donald Wright, 46 years old, was named executive vice president and director of fixed income at this brokerage firm. Mr. Wright resigned as president of Merrill Lynch Canada Inc., a unit of Merrill Lynch & Co., to succeed Mark Kassirer, 48, who left Burns Fry last month. A Merrill Lynch spokeswoman said it hasn't named a successor to Mr. Wright, who is expected to begin his new position by the end of the month.

6 January, 2001 AKT Workshop Example: BNF Definition of a Management Succession Event Template (MUC-6) := DOC_NR: "NUMBER" ^ CONTENT: * := ORGANIZATION: ^ POST: "POSITION TITLE" | "no title" ^ IN_AND_OUT: + VACANCY_REASON: {DEPART_WORKFORCE, REASSIGNMENT, NEW_POST_CREATED, OTH_UNK} ^ := PERSON: ^ NEW_STATUS: {IN, IN_ACTING, OUT, OUT_ACTING} ^ ON_THE_JOB: {YES, NO, UNCLEAR} OTHER_ORG: - REL_OTHER_ORG: {SAME_ORG, RELATED_ORG, OUTSIDE_ORG} - := ORG_NAME: "NAME" - ORG_ALIAS: "ALIAS" * ORG_DESCRIPTOR: "DESCRIPTOR" - ORG_TYPE: {GOVERNMENT, COMPANY, OTHER} ^ ORG_LOCALE: LOCALE_STRING {{CITY, PROVINCE, COUNTRY, REGION, UNK} * ORG_COUNTRY: NORMALIZED-COUNTRY-or-REGION | COUNTRY-or-REGION-STRING * := PER_NAME: "NAME" - PER_ALIAS: "ALIAS" * PER_TITLE: "TITLE" *

7 January, 2001 AKT Workshop := DOC_NR: "9404130062" CONTENT: := SUCCESSION_ORG: POST: "executive vice president" IN_AND_OUT: VACANCY_REASON: OTH_UNK := := IO_PERSON: IO_PERSON: NEW_STATUS: OUT NEW_STATUS: IN ON_THE_JOB: NO ON_THE_JOB: NO OTHER_ORG: REL_OTHER_ORG: OUTSIDE_ORG := := ORG_NAME: "Burns Fry Ltd.“ ORG_NAME: "Merrill Lynch Canada Inc." ORG_ALIAS: "Burns Fry“ ORG_ALIAS: "Merrill Lynch" ORG_DESCRIPTOR: "this brokerage firm“ ORG_DESCRIPTOR: "a unit of Merrill Lynch & Co." ORG_TYPE: COMPANY ORG_TYPE: COMPANY ORG_LOCALE: Toronto CITY ORG_COUNTRY: Canada := := PER_NAME: "Mark Kassirer" PER_NAME: "Donald Wright" PER_ALIAS: "Wright" PER_TITLE: "Mr." Example: A (Partially) Filled Management Succession Event Template

8 January, 2001 AKT Workshop Example: One Use for a Template - Generating a Summary From the completely filled version of the preceding template the LaSIE system generates the following natural language summary: BURNS FRY Ltd. named Donald Wright as executive vice president. Donald Wright resigned as president of Merrill Lynch Canada Inc.. Mark Kassirer left as president of BURNS FRY Ltd. Producing summaries in other languages is relatively easy (compared to full machine translation).

9 January, 2001 AKT Workshop Approach 2: Coreference Chains To generate generic informative extracts we have used coreference chains

10 January, 2001 AKT Workshop Approach 2: Coreference Chains (cont) Background: Morris and Hirst (’94) investigated lexical chains – chains of lexically-related words in a text that serve to make texts cohere Barzilay + Elhadad (’97) suggested using lexical chains as a basis for selecting sentences to form a summary – rank chains based on number of links + extent over text Halliday and Hassan (’76) proposed coreference as another major factor contributing to coherence of NL texts Idea: Explore use of coreference chains to produce summaries

11 January, 2001 AKT Workshop Approach 2: Coreference Chains (cont) Technique Use LaSIE to carry out discourse analysis of text, including coreference resolution Extract all coreference chains Rank chains by a metric which counts chain length + extent + starting point Intuition: entities which occur most frequently and most widely in a text are those which the text is most “about” Depending on desired summary length, select m sentences from top n chains Details in Azzam, Humphreys and Gaizauskas ’99

12 January, 2001 AKT Workshop Approach 3: Statistical To generate generic indicative extracts we have used a stastical approach based on a set of factors

13 January, 2001 AKT Workshop Approach 3: Statistical (cont) Factors which have been examined in selecting sentences for inclusion in extractive summaries include: number of content words shared with title/headings (T) presence of “cue words” (C) location of sentence in text (L) number of content words discriminative of current text as opposed to corpus of texts from which it is drawn, using, e.g. tf-idf measure (K)

14 January, 2001 AKT Workshop Approach 3: Statistical (cont) Assign a weight to each sentence according to a weighted linear combination of these factors Learn weights to optimise sentence selection as measured against a corpus of extracts + texts Select top ranked sentences up to desired summary length


Download ppt "Summarisation Work at Sheffield Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield."

Similar presentations


Ads by Google