Download presentation
Presentation is loading. Please wait.
2
SIGIR 2001 – WTS / DUC13 Sep 20011/28 Centrifuser Output Min Yen Kan, 2001 Centrifuser’s output comes in three parts: Navigation; Informative extract, based on similarities; Indicative generated text, based on differences. Centrifuser can currently produce this output for documents with the same domain and genre
3
SIGIR 2001 – WTS / DUC13 Sep 20012/28 Part 1 Informative Summaries
4
SIGIR 2001 – WTS / DUC13 Sep 20013/28 Informative Summaries Informative = replaces the document with a shorter version Task Provide most important aspects of the document(s) Interaction Browsing Type Strategy Since search results are similar, put together similarities across documents
5
SIGIR 2001 – WTS / DUC13 Sep 20014/28 Algorithm 1. *Convert each document to a Document Topic Tree 2. *Compute Composite Topic Tree 3. Align query and topics across trees 4. Extract sentences 5. Order into summary
6
SIGIR 2001 – WTS / DUC13 Sep 20015/28 1. Document Topic Tree Hierarchical view of the document Layout (Hu, et al 99) Lexical chains (Hearst 94, Choi 00) Done offline per document AHA Recommendation Level: 2 Order: 1 Style: Prose Contents: 1 Table, … Related AHA publications Level: 2 Order:3 Style: Bulleted Contents: … See also in this guide Level: 2 Order: 3 Style: Prose Contents: 5 items, … High Blood Pressure Level: 1 Style: Prose Contents: 3 Headers, …
7
SIGIR 2001 – WTS / DUC13 Sep 20016/28 2. Composite Topic Tree Norm for a particular type of document Create by aligning topics in example trees by similarity Stores order, frequency and variants of each topic Done offline per domain and genre combination handled joined node at level 1 (e.g. disease) doc tree 1 (yellow) doc tree 2 (blue) newly joined node at level 2 (e.g. symptoms) symptoms node newly joined node at level 3 (e.g. nausea) disease node joining nodes at level 2joining nodes at level 3
8
SIGIR 2001 – WTS / DUC13 Sep 20017/28 3. Topic Alignment Use similarity metric to map query to composite and document trees Focus topic defines 3 regions Done online, to find scope of information needed in summary root as focus topic (e.g. About hypertension) 2nd level subtopic as focus topic (e.g. Guide to Cardiac Diseases) = irrelevant = relevant = focus topic = too detailed Query: Hypertension Composite treeDocument trees
9
SIGIR 2001 – WTS / DUC13 Sep 20018/28 4. Sentence Extraction Aligned topics chosen in descending typicality Use SimFinder to choose sentences Cover as many topics as possible to ensure breadth of summary *Disease* Freq: 1.0 Diet Freq: 0.6 For more information Freq: 0.7 Treatment Freq: 0.9 Diagnosis Freq: 0.8 Surgery Freq: 0.3 Drugs Freq: 0.7 Definition Freq: 0.2 Causes Freq: 0.8 Symptoms Freq: 0.8 Nausea Freq: 0.2 = aligned = focus topic = unaligned (no instance in documents) Composite topic tree 1.0 (hypertension) Since blood is carried … "If a drug that blocks … 0.9 (treatment) How Can I Reduce High … How Do I Manage My … 0.8 (causes) Blood pressure is … 0.7 (drugs) "Over-the-counter“ … 0.7 (for more 2000 Heart and Stroke … information) 0.6 (diet) Everybody's looking for … Extracted Sentences
10
SIGIR 2001 – WTS / DUC13 Sep 20019/28 5. Sentence Ordering Order extracted sentences by order in composite tree (by norm) Order by norm order to get best results Reordered Sentences 1.0 (hypertension) Since blood is carried … "If a drug that blocks … 0.9 (treatment) How Can I Reduce High … How Do I Manage My … 0.8 (causes) Blood pressure is … 0.7 (drugs) "Over-the-counter“ … 0.7 (for more 2000 Heart and Stroke … information) 0.6 (diet) Everybody's looking for … Extracted Sentences 1. (hypertension) Since blood is carried … "If a drug that blocks … 1.4 (causes) Blood pressure is … 1.5 (treatment) How Can I Reduce High … How Do I Manage My … 1.5.1 (drugs) "Over-the-counter“ … 1.5.2 (diet) Everybody's looking for … 1.6 (for more 2000 Heart and Stroke … information) (Ordered by typicality)(Ordered by normal first appearance)
11
SIGIR 2001 – WTS / DUC13 Sep 200110/28 Part 2 Indicative Summaries
12
SIGIR 2001 – WTS / DUC13 Sep 200111/28 Indicative Summaries Indicative = help decide whether document is worthwhile for retrieval TaskShow salient differences from other candidates Interaction Searching type StrategyIdentify content and non-content aspects in which each source is different
13
SIGIR 2001 – WTS / DUC13 Sep 200112/28 What goes into an Indicative Summary? Examine existing indicative summaries: Library card catalog Examine multidocument scenarios
14
SIGIR 2001 – WTS / DUC13 Sep 200113/28 Corpus Parameters 82 summaries from CU’s online catalog Healthcare domain Catalogued types of information present Document-derived features Metadata features Practical Interventional Cardiology represents a practical reference for the interventional cardiologist and those in training, as well as the non-invasive cardiologist and physician. […] Rather than providing detailed and exhaustive reviews, the purpose of this book is to present practical information regarding cardiac interventional procedures. […]
15
SIGIR 2001 – WTS / DUC13 Sep 200114/28 Corpus Analysis Results Freq Document Feature (Document Derived)(Metadata) Topicality 100% Content Types 37% Readability 18% Internal Structure 17% Special Content 7% Title 31% Revised/Edition 28% Author/Editor 21% Purpose 18% Audience 17% …… Practical Interventional Cardiology represents a practical reference for the interventional cardiologist and those in training, as well as the non-invasive cardiologist and physician. […] Rather than providing detailed and exhaustive reviews, the purpose of this book is to present practical information regarding cardiac interventional procedures. […]
16
SIGIR 2001 – WTS / DUC13 Sep 200115/28 Analysis - Multidocument Prescriptive Guidelines Open Directory Project – website hierarchy Differences are important! 1. Differences between documents 2. Differences from the norm 3. Those relevant to the query (Grice `75) Make clear what makes a site different from the rest
17
SIGIR 2001 – WTS / DUC13 Sep 200116/28 Corpus Analysis Discussion Topicality (i.e. content) is most important Other features have a strong role For Centrifuser Design summary around topics When space allows, add other features as needed –When feature differs from the norm –Future work: mimic the percentages in study Differences drive the text –Query and norm should affect the summary content.
18
SIGIR 2001 – WTS / DUC13 Sep 200117/28 Algorithm 1. *Make Composite and Document Topic Trees 2. Align query and topics across trees 3. Use region ratios to compute document categories 4. Decide messages to realize 5. Order messages 6. Generate the text
19
SIGIR 2001 – WTS / DUC13 Sep 200118/28 2. (recap) Align query and topics Map the query to a topic Query node divides nodes into relevant, irrelevant and intricate regions = irrelevant root as focus topic2nd level subtopic as focus topic = relevant = focus topic = intricate Query: Angina Query: Treatments of Angina Attributing the effect of the query on the generated text
20
SIGIR 2001 – WTS / DUC13 Sep 200119/28 Classifying Topics – By Norm Relevant nodes divided into typical and rare Composite topic tree = focus topic = typical node (freq >=.5) = rare node (freq <.5) Document topic tree Attributing the effect of the norm on the generated text = unaligned topic
21
SIGIR 2001 – WTS / DUC13 Sep 200120/28 3. Categorizing Documents Ratio of typical, rare, intricate and irrelevant determines category 7 categories altogether 3 typical, 2 rare, 2 intricate and 8 irrelevant 5 typical, 2 rare, 2 intricate Irrelevant Document 50+% irrelevant Specialized Document > 50+% typical, < 50% all possible typical
22
SIGIR 2001 – WTS / DUC13 Sep 200121/28 4. Forming Messages Messages and the text that they eventually realize Other messages may include: Number of categories in summary Other optional information (e.g. content type) Relation: category-elements Args:docCat: atypical element: AMA Guide element: CU Guide Relation: category-description Args:[ docCat: atypical ] [] [ ] [] Relation: has-topics Args:docCat: atypical topic: definition topic: risks [] ][ Document category description Documents belonging to category Topics in category More information on additional topics which are not included in the summary are available in these files (The American Medical Association family medical guide and The Columbia University College of Physicians and Surgeon complete home medical guide).. The topics include “definition” and “what are …
23
SIGIR 2001 – WTS / DUC13 Sep 200122/28 5. Ordering Messages Inter-category – by importance of dominant topic type. Intra-category – document category and elements before optional information.
24
SIGIR 2001 – WTS / DUC13 Sep 200123/28 6. Text Generation Use a small grammar to realize the messages Referring Expression Issues Size of referring expressions Re-ordering documents in the set
25
SIGIR 2001 – WTS / DUC13 Sep 200124/28 Task Based Evaluation Scenario: “ You ’ ve been diagnosed with cancer …” Compare against 3 real-world systems IR engine (google); Human expert (about.com). Goals Evaluate on subjective criteria, use think aloud techniques See which document features best fit user need Pilot study complete; full study going on now Hub (yahoo);
26
SIGIR 2001 – WTS / DUC13 Sep 200125/28 Conclusion An application of summarization for IR Performs informative and indicative summarization By using extraction and text generation techniques To support browsing and searching http://centrifuser.cs.columbia.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.