The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.

Slides:

Advertisements

Similar presentations

A probabilistic model for retrospective news event detection

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

Unsupervised Learning

An Overview of Machine Learning

Supervised Learning Recap

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Chapter 4: Linear Models for Classification

Statistical Topic Modeling part 1

 Andisheh Keikha Ryerson University Ebrahim Bagheri Ryerson University May 7 th

Generative Topic Models for Community Analysis

Overview Full Bayesian Learning MAP learning

Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.

Classification and risk prediction

Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.

Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

Relevance Feedback based on Parameter Estimation of Target Distribution K. C. Sia and Irwin King Department of Computer Science & Engineering The Chinese.

Latent Dirichlet Allocation a generative model for text

Text Classification from Labeled and Unlabeled Documents using EM Kamal Nigam Andrew K. McCallum Sebastian Thrun Tom Mitchell Machine Learning (2000) Presented.

Unsupervised Training and Clustering Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.

Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)

Slide 1 EE3J2 Data Mining EE3J2 Data Mining - revision Martin Russell.

Vector Space Model CS 652 Information Extraction and Integration.

Presented by Zeehasham Rasheed

Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.

Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Scalable Text Mining with Sparse Generative Models

Information Retrieval in Practice

Semi-Supervised Learning

(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence

Modeling (Chap. 2) Modern Information Retrieval Spring 2000.

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Text Classification, Active/Interactive learning.

1 Text Summarization: News and Beyond Kathleen McKeown Department of Computer Science Columbia University.

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

Text Clustering.

CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

1 SIGIR 2004 Web-page Classification through Summarization Dou Shen Zheng Chen * Qiang Yang Presentation ： Yao-Min Huang Date ： 09/15/2004.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.

UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.

Text Categorization by Boosting Automatically Extracted Concepts Lijuan Cai and Tommas Hofmann Department of Computer Science, Brown University SIGIR 2003.

2005/09/13 A Probabilistic Model for Retrospective News Event Detection Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma University of Science and Technology.

Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.

NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.

Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Classification of unlabeled data:

Multimodal Learning with Deep Boltzmann Machines

Probabilistic Models with Latent Variables

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Michal Rosen-Zvi University of California, Irvine

Presented by Wanxue Dong

Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]

Information Retrieval and Web Design

Presentation transcript:

The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the Semi-supervised NL Learning Reading Group

Presentation Outline Overview of Document Summarization Major contribution: Semi-Supervised Logistic Classification Maximum Likelihood summaries. Evaluation –Baseline Systems –Results

Document Summarization Motivation: [text volume] >> [user’s time] Single Document Summarization: –Used for display of search results, automatic ‘abstracting’, browsing, etc. Multi-Document Summarization: –Describe clusters & document collections, QA, etc. Problem: What is the summary used for? Does a generic summary exist?

Single Document Summarization example

Document Summarization Generative Summaries: –Synthetic text produced after analysis of high level linguistic features: discourse, semantics, etc. –Hard. Extract Summaries: –Text excerpts (usually sentences) composed together to create summary –Boils down to a passage classification/ranking problem

Major Contribution Semi-supervised Logistic Classifying Expectation Maximization (CEM) for passage classification Advantage over other methods: –Works on small set of labeled data + large set of unlabeled data –No modeling assumptions for density estimation Cons: –(probably) slow; no performance numbers given

Expectation Maximization (EM) Finds maximum likelihood estimates of parameters when underlying distribution depends on unobserved latent variables. Maximizes model fit to data distribution Criterion function:

Classifying EM (CEM) Like EM, with the addition of an indicator variable for component membership. Maximizes ‘quality’ of clustering Criterion function:

Semi-supervised generative-CEM Fix component membership for labeled data. Criterion function: Labeled DataUnlabeled Data

Semi-supervised logistic-CEM Use discriminative classifier (logistic) instead of generative. M-step, need to re-do gradient descent to estimate β’s Labeled DataUnlabeled Data

Evaluation Algorithm evaluated against 3 other single- document summarization algorithms –Non-trainable System: passage ranking –Trainable System: Naïve Bayes sentence classifier –Generative-CEM (using full Gaussians) Precision/Recall with regard to gold-standard extract summaries The fine print: –All systems used *similar* representation schemes, but not the same…

Baseline System: Sentence Ranking Rank sentences, using a TF-IDF similarity measure with query expansion (Sim 2 ) –Blind-relevance feedback from the top sentences –WordNet similarity thesaurus Generic query created with the most frequent words in the training set.

Naïve Bayes Model: Sentence Classification Simple Naïve Bayes classifier trained on 5 features: 1.Sentence length < t length {0,1} 2.Sentence contains ‘cue words’ {0,1} 3.Sentence query similarity (Sim 2 ) > t sim {0,1} 4.Upper-case/Acronym features (count?) 5.Sentence/paragraph position in text {1, 2, 3}

Logistic-CEM: Sentence Representation Features Features used to train Logistic-CEM: 1.Normalized sentence length [0, 1] 2.Normalized ‘cue word’ frequency [0, 1] 3.Sentence Query Similarity (Sim 2 ) [0, ∞) 4.Normalized acronym frequency [0, 1] 5.Sentence/paragraph position in text {1, 2, 3} (All of the binary features converted to continuous.)

Results on Reuters dataset