Presentation is loading. Please wait.

Presentation is loading. Please wait.

Text Summarization -- In Search of Effective Ideas and Techniques Shuhua Liu, Assistant Professor Department of Information Systems Åbo Akademi University,

Similar presentations


Presentation on theme: "Text Summarization -- In Search of Effective Ideas and Techniques Shuhua Liu, Assistant Professor Department of Information Systems Åbo Akademi University,"— Presentation transcript:

1 Text Summarization -- In Search of Effective Ideas and Techniques Shuhua Liu, Assistant Professor Department of Information Systems Åbo Akademi University, Finland & Univercity Berkeley Modified By Shinta P., 2012

2 2 Headline news — informing

3 3 TV-GUIDES — decision making

4 4 Abstracts of papers — time saving

5 5 Graphical maps — orienting

6 What is text summarization? To reduce (long) textual information to its most essential points to distill the most important information from a source or sources to produce an abridged version of it (Endres-Niggemeyer, 1998; Mani and Maybury, 1999; Spärck-Jones, 1999).

7 Text summarization: a context-dependent activity

8 8 ‘Genres’ of Summary? Indicative vs. informative...used for quick categorization vs. content processing. Extract vs. abstract...lists fragments of text vs. re-phrases content coherently. Generic vs. query-oriented...provides author’s view vs. reflects user’s interest. Background vs. just-the-news...assumes reader’s prior knowledge is poor vs. up-to- date. Single-document vs. multi-document source...based on one text vs. fuses together many texts.

9 Shuhua Liu, IIS/IAMSR, ÅA Text summarization Key issues: how to identify the most important content out of the rest of the text? how to synthesize the substance and formulate a summary text based on the identified content? Major approaches: Selection based: produce ”extracts” Text understanding based: produce ”abstracts”

10 Shuhua Liu, IIS/IAMSR, ÅA

11 Selection based summarization: how does it work? The most content-bearing sentences or passages are identified and selected to compose a summary. Compute a significance value for each sentence: (Luhn, 1958; Edmundson, 1969) Count word frequency the keywords, title words, cue words it contains; the position of the sentence RST (Rhetorical structute theory) based discourse analysis (Marcu, 1997) Passage and sentence similarity analysis (Goldstein et al, 2000; CMU)

12 Shuhua Liu, IIS/IAMSR, ÅA MSWord AutoSummarize

13 Shuhua Liu, IIS/IAMSR, ÅA Text understanding system A text understanding task often aims to recover all of the information that there is in a text, including what is only implicit in what is actually written. “All the richness of natural language becomes fair game, including metaphor, metonymy, discourse structure, and the recognition of the author's underlying intentions, and the full interplay between language and world knowledge becomes central to the task.”

14 Shuhua Liu, IIS/IAMSR, ÅA Text understanding based summarization Depend on complete sentence analysis and discourse analysis with full knowledge support Syntactic pasrer, semantic interpreter Linguistic knowledge, world knowledge, domain knowledge Reasoning mechnisms that work effectively over huge knowledge collections.

15 Shuhua Liu, IIS/IAMSR, ÅA Selection based vs. Understanding based Selection based: general applicable, but incoherent content, poor readability due to unclear relationships between the selected text excerpts, dangling references, and so on. Understanding based: high precision, but very slow, large amount of wasted computation, highly domain specific. Endres-Niggenger (2000) found that, people prefer (sometimes) extractive summaries instead of gloss-over abstractive summaries!

16 Shuhua Liu, IIS/IAMSR, ÅA The reality: The dominant approach in practice is still selection-based; Understanding based systems only exist in theory, and will continue to be so for quite a while; However, certain text understanding tasks in small scale or restricted domains can be done.

17 Shuhua Liu, IIS/IAMSR, ÅA Topic guided text summarization Text summarization as a process of topic analysis, passage extraction, and text understanding, information integration/fusion, and text generation proces. Passage extraction guided by topic structure will expect to keep the logic relationships between the extracted text parts: e.g. sentences are arranged logically according to topic structure Topic representation will also be very helpful in next phase text analysis and information integration.

18 Shuhua Liu, IIS/IAMSR, ÅA Phase 1: Theme detection, topic labels, sentence/passage selection Theme detection through passage pairwise similarity analysis Vector space model of term and document TF-IDF: baseline method

19 Shuhua Liu, IIS/IAMSR, ÅA Passage similarity analysis with LSA method LSA (Latent Sematic Analysis) Similar results as using TF-IDF Fuzzy LSI approach (Nikravesh, 2002)

20 Shuhua Liu, IIS/IAMSR, ÅA Passage adjacency matrix (partial)

21 Shuhua Liu, IIS/IAMSR, ÅA Passage Relation Map

22 Shuhua Liu, IIS/IAMSR, ÅA Passage Extraction Rules Passage clusters help us to identify themes and topics; unconnected passages form distinct topics covered in a document. The MMR algorithm (CMU) (Goldstein et al, 2000) A sentence/passage closest to the centroid of the cluster be chosen to be included in the summary. Sentences that are maximally similar to the document and maximally dissimilar to sentences already in the summary are selected to compose a summary.

23 Shuhua Liu, IIS/IAMSR, ÅA Creating theme labels Keywords (TF based) Word families (semantic related words in a passage cluster) Key phrases Linguistic approach Statistical + simple heuristics (Kelledy and Smeaton, 1997) – seems quite effective.

24 Shuhua Liu, IIS/IAMSR, ÅA Next step

25 Shuhua Liu, IIS/IAMSR, ÅA WordNet, since 1985 Lexical database developed at Princeton University, led by George Miller Hand-coded, freely available Word knowledge of: nouns, verbs, adjectives, adverbs Semantic network representation with only a few semantic relations: Synonym, hypernynm, Categorization relation: Is-a Widely used in query expansion, word similarity determination (based on synsets)

26 Shuhua Liu, IIS/IAMSR, ÅA

27

28

29 ConceptNet, MIT Media Lab Common sense knowledge base with NLP capability Extracted automatically from common sense knowledge expressed in semi-structured NL sentences from OMCSNet (open mind common sense) – applying about 50 extraction rules ”The Effect of [falling off a bike] is [you get hurt].” ”A lime is a very sour fruit” at OMCS is extracted into two assertations: IsA (lime, fruit) PropertyOf (lime, very sour)

30 Shuhua Liu, IIS/IAMSR, ÅA

31 ConceptNet (Liu and Singh, 2004a, 2004b) Inference Spreading activation: node-activation radiating outward from an origin code GetContext (node) GetAnalogousConcept (node) Graph traversal: FindPathBetweenNodes (node1, node2)

32 Shuhua Liu, IIS/IAMSR, ÅA ConceptNet (Liu and Singh, 2004a, 2004b) Support Topic sensing Query expansion Semantic similarity of words Lexical generalization Thematic generalization Much needs to be examined; Uncontrolled vocabulary, can be biased in terms of content; but seems quite reliable knowledge.

33 Shuhua Liu, IIS/IAMSR, ÅA Topic-Sensing

34 Shuhua Liu, IIS/IAMSR, ÅA Eurovoc: multilingual thesaurus Controlled vocabulary, 20 languages, broad fields politics, international relations, European Communities, law, economics, trade, finance, social questions, education, science, international organizations, employment and working conditions industry, business and competition, production, technology and research, transport, environment, energy, agriculture, forestry and fisheries, agri-foodstuffs, geography


Download ppt "Text Summarization -- In Search of Effective Ideas and Techniques Shuhua Liu, Assistant Professor Department of Information Systems Åbo Akademi University,"

Similar presentations


Ads by Google