Download presentation
Presentation is loading. Please wait.
Published byReginald Pierce Modified over 9 years ago
1
Third International Conference on Language Resources and Evaluation, Las Palmas, Canary Islands, Spain, 29-31 May 2002 Columbia University Catalogued recommended information from 5 prescriptive guidelines for A.B.E.’s Using the Annotated Bibliography as a Resource for Indicative Summarization Min-Yen Kan*, Judith L. Klavans** and Kathleen R. McKeown* {min, judith, kathy}@cs.columbia.edu 1. Extract versus Abstract 2. Informative versus Indicative 3. Generic versus Query biased 4. Single document versus Multiple Selected Summary Dimensions News Summaries Scientific Summaries Snippets Card Catalog Entries Annotated Bibliography Entries Corpus Collection & Encoding Corpus Availability The corpus is available for academic and not-for-profit research, by request to: An annotation guide, explaining the annotation tagging guidelines in more detail, is also available. Command-line and web.CGI utilities are also provided to modify, insert and extract attributes from the corpus. * Department of Computer Science ** Center for Research on Information Access Maxwell, S. E., Delaney, H. D., & O'Callaghan, M. F. (1993). Analysis of... This paper gives a brief history of ANCOVA, and then discusses ANCOVA in... contains no matrix algebra. PROB 14659 -112.252 0 TOP -112.252 S - 105.049 NP-A -8.12201 NPB -7.82967 DT 0 This NN 0 paper... Our language resource of annotated bibliography entries was designed to ease the collection of the corpus as well as to make many features available for subsequent analysis for summarization and related natural language applications. Presently: - 1200 documents containing “annotated bibliography” were spidered - of those, 64 documents were hand parsed yield 2000 entries - of those 2000, 100 of the parsed tags were further annotated with semantic tags : the text before the body of the entry the subject or themelocation of the source document coarser granularity than title the position of the entry on the page Other fields, also optional: - : text that is distinctly marked off as coming after the entry - : the division that the page represents in the set of related pages the internal division in the page that this entry belongs to : the text with the 24 semantic tags : Collins’ 96 parse of the entry Annotated Bibliography Entries are indicative summaries. - longer than both card catalog summaries and snippets - organized around a theme; ideal standard for ``query-based'' summaries - have explicit comparisons of one resource versus another - have prefacing overviews of the documents in the bibliography. - rich in meta-information features. We study them as models for summaries, by examining prescriptive guidelines and performing a corpus study Media Type5548% Author/Editor4327% Content Types/Special Feature xx4129% Subjective Assess/Coverage xxx x 3624% Authority/Authoritativenessxx x2620% Background/Source2116% Navigation/Internal Structurex1611% Collection Size1310% Purpose x xx1310% Audiencex xx x1212% Contributor1212% Cross-resource comparisonx109% Size/Length97% Style86% Query Relevancex x43% Readability43% Difficulty44% Edition/Publication Information33% Language22% Copyright21% Award/Quality/Defectsx x x21% Detail13947% Overview7264% Topic3428% Topicality Features Prescriptive GuidelinesCorpus Study # tags in corpus % entries with tag Ree70 EBC98 Les01AACC98 Wil02 Metadata and Other Features x consist of structured fields, of which a summary is an optional field. Other types of information (such as notes, or book jacket texts, or book reviews) are often substituted for summaries. are short indicative descriptions given by authors of web pages. Often very short, (e.g. Yahoo! or ODP category pages). Amitay (2000) shows strategies for locating and extracting snippets and how to rank different ones for fitness as a summary. There have been a number of studies using abstracts of scientific articles as a target summary (e.g., Kupiec et al 1995). Abstracts tend to summarize the document's topics well but do not include much use of metadata. DUC provides a large corpus for informative summaries. Jing and McKeown (1999) use source document and target summary relations for ``cut and paste'' summarization. Abstract Both Both Mostly Single Yes Corpus Both InformativeGeneric BothYes Corpus Abstract Indicative Both SingleYes Algorithm Abstract IndicativeGeneric SingleYes Corpus Abstract InformativeGeneric Single No Corpus Mostly Extract InformativeGeneric SingleNo Corpus Extract vs. Indicative vs. Generic vs. Single vs. Uses Corpus vs. Abstract Informative Query-based MultidocumentMetadata? Algorithm Scientific Abstracts Snippets Card Catalog Entries Ziff Davis DUC Annotated Bibliography Entries Corpus Performed study of 100 entries (see right)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.