Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001.

Similar presentations


Presentation on theme: "Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001."— Presentation transcript:

1 Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001

2 2 Structure 1. The idea 2. Conceptual summarization 3. Linguistic summarization 4. Example system: Plandoc 5. Summary

3 3 The idea Full text is not the only possible source material for summarization Other sources: –databases –simulation data –user interaction sequences –etc

4 4 The idea Data with structure –easier to interpret than full text –no source text => no shortcuts –text generation phase is hard –domain-dependency

5 5 Conceptual summarization Sorting the source material –facts, events Choosing what is important –must be included in the summary and what is potentially important –can be left out or included

6 6 Conceptual summarization What is important? –depends on the domain –depends on the input material –depends on the user

7 7 Conceptual summarization Importance of a fact –manual decision Importance of an event –manual decision –frequency analysis

8 8 Conceptual summarization –Potentially important facts/events are included only if they fit in –Determined by space limit linguistic constraints possible ordering of facts

9 9 Linguistic summarization Expressing the same information in fewer sentences Method: linguistic constructs & revision Danger: over-effective compression leads to unreadable sentences

10 10 Linguistic summarization Linguistic constructs: –semantically rich words –modifiers of nouns or verbs –conjunction and ellipsis –abridged references –abstraction –aggregation –presentational techniques

11 11 Linguistic summarization Semantically rich words –killing two birds with one stone Karl Malone scored 39 points. + Karl Malone's 39 point performance is equal to his season high. becomes Karl Malone tied his season high with 39 points.

12 12 Linguistic summarization Modifiers of nouns or verbs –one fact specifies a verb or a noun in another fact Jay Humphries scored 24 points. He came in as a reserve. becomes Reserve Jay Humphries scored 24 points.

13 13 Linguistic summarization Conjunction –joining facts with "and" or "or" Mick Reynes scored 265 points last season and Jack Jones scored 265 points last season. Ellipsis –removing repetition Mick Reynes and Jack Jones scored 265 points last season.

14 14 Linguistic summarization Abridged references –using shorter names for already introduced things San Antonio Spurs took a 127-111 victory over Denver Nuggets and handed Denver their seventh straight loss.

15 15 Linguistic summarization Abstraction –replacing a series of events with a single event mission start, movements, firing, damages, mission abort => failed mission

16 16 Linguistic summarization Aggregation –connecting events with spatial or temporal adverbials Site-A and Site-B simultaneously fired a missile. Presentational techniques –using spatial or temporal adverbs Site-A fired a missile at 1302. Three minutes later Site-B fired a missile.

17 17 Linguistic summarization Revision: approach 1 –First create a draft summary from important facts –Then enrich the draft with potentially important facts Revision: approach 2 –Generate the draft by collecting similar facts into each sentence –Compress the sentences with ellipsis etc.

18 18 Example system: Plandoc Application developed by K. McKeown, J.Robin and K.Kukich at Columbia University, New York and Bell Communication Research (1995) Problem –a telephone company engineer plans how a telephone route should be developed in the next 20 years –the engineer uses PLAN planning system software –Goal: a documentation of the planning process

19 19 Plandoc: input and output Input: a trace of user's actions with the PLAN system 1. RUNID fiberall FIBER 6/19/93 act yes 2. FA 1301 2 1995 3. FA 1201 2 1995 4. FA 1501 3 1995 5. ANF 1201 1301 2 1995 24 END. 856.0 670.2

20 20 Plandoc: input and output Output: a 1-2 page report –the initial plan PLAN proposed –refinements the engineer made –alternative refinements the engineer tried but rejected –the final plan Purpose: documentation

21 21 Plandoc: conceptual summarization Important facts –accepted parts of the initial plan + accepted refinements to it = the final plan –rejected refinements? the engineer decides

22 22 Plandoc: overview of the method Fact generator converts the input to an internal representation –facts presented as feature structures (attribute/value pairs) Ontologizer enriches the facts with e.g. price information Discourse planner groups the facts A lexicalizer/sentence generator converts the groups into English

23 23 Plandoc: processing the input Example: FA 1301 2 1995 Enriched feature structure: class: refinement ref-type: fiber action: activation csa-site: 1301 date: year: 1995, quarter: 2 price: $56.00K

24 24 Plandoc: grouping facts into sentences Let's construct a sentence from the FA facts: FA 1301 2 1995 FA 1201 2 1995 FA 1501 3 1995 1. Group facts by common action –action = activation for all –one sentence is needed FA 1301 2 1995 FA 1201 2 1995 FA 1501 3 1995

25 25 Plandoc: grouping facts into sentences 2. For each common-action group (sentence): (a) Collapse groups which differ by one feature into a single group –two groups: FA 1301, 1201 2 1995 FA 1501 3 1995

26 26 Plandoc: grouping facts into sentences (b) If more than one group remains (sentence is broken into clauses by conjunction): i. Find the feature that is shared across most groups (but has not the same value for all) FA 1301, 1201 2 1995 FA 1501 3 1995 only the date feature is left and it has two values => two clauses are needed

27 27 Plandoc : grouping facts into sentences ii. Sort the groups to subgroups by the most common shared feature (nested conjunction inside the clause) –each group has only one member FA 1301, 1201 2 1995 FA 1501 3 1995

28 28 Plandoc : grouping facts into sentences iii. Repeat the selection of most common shared feature and sorting to subgroups until all have been sorted –no subgroups left iv. Sort the clauses by date FA 1301, 1201 2 1995 FA 1501 3 1995

29 29 Plandoc: grouping facts into sentences FA 1301, 1201 2 1995 FA 1501 3 1995 The produced sentence: This refinement activated fiber for CSAs 1301 and 1201 in 1995 Q2 and this refinement activated fiber for CSA 1501 in 1995 Q3. The final sentence after ellipsis: This refinement activated fiber for CSAs 1301 and 1201 in 1995 Q2 and for CSA 1501 in 1995 Q3.

30 30 Plandoc: grouping facts into sentences Readibility This refinement extended fiber from fiber hub 8107 to CSAs 8128,8126, 8121 and 8113 and from fiber hub 8120 to the CO in 1994 Q1 and from the CO to CSA 8120 in 1994 Q3, with the active fibers placed on the primary path. –limit the number of facts conjoined –limit the number of embedded conjunctions inside a clause

31 31 Summary Also other sources than text can be summarized Problems: –choosing the important elements –generating a compact and readable summary text –domain-dependency

32 32 Summary Applications: –automatic weather reports (not predictions!) –simulation reports –patient monitoring system summaries –etc


Download ppt "Knowledge-rich approaches for text summarization Minna Vasankari 27.11.2001."

Similar presentations


Ads by Google