Generative and Discriminative Models in Text Classification David D. Lewis Independent Consultant Chicago, IL, USA

Slides:

Advertisements

Similar presentations

Week 3. Logistic Regression Overview and applications Additional issues Select Inputs Optimize complexity Transforming Inputs.

Advertisements

PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.

Partitioned Logistic Regression for Spam Filtering Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Christopher Meek Microsoft.

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.

S ENTIMENTAL A NALYSIS O F B LOGS B Y C OMBINING L EXICAL K NOWLEDGE W ITH T EXT C LASSIFICATION. 1 By Prem Melville, Wojciech Gryc, Richard D. Lawrence.

Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.

IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.

Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.

Decision Theory Naïve Bayes ROC Curves

Unsupervised Training and Clustering Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.

Presented by Zeehasham Rasheed

Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.

Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.

Scalable Text Mining with Sparse Generative Models

Copyright 2004, David D. Lewis (Naive) Bayesian Text Classification for Spam Filtering David D. Lewis, Ph.D. Ornarose, Inc. & David D. Lewis Consulting.

Crash Course on Machine Learning

(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence

1 Probabilistic Language-Model Based Document Retrieval.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Example 16,000 documents 100 topic Picked those with large p(w|z)

Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.

CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 2012

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Text Classification, Active/Interactive learning.

1 Bins and Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)

Copyright (c) 2003 David D. Lewis (Spam vs.) Forty Years of Machine Learning for Text Classification David D. Lewis, Ph.D. Independent Consultant Chicago,

1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.

CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:

Acclimatizing Taxonomic Semantics for Hierarchical Content Categorization --- Lei Tang, Jianping Zhang and Huan Liu.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.

CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.

29 August 2013 Venkat Naïve Bayesian on CDF Pair Scores.

Class Imbalance in Text Classification

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.

Logistic Regression William Cohen.

Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.

Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.

CSE 446 Logistic Regression Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.

Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

The Next Frontier in TAR: Choose Your Own Algorithm

CH 5: Multivariate Methods

CSC 594 Topics in AI – Natural Language Processing

Lecture 15: Text Classification & Naive Bayes

ECE 5424: Introduction to Machine Learning

Text Categorization Rong Jin.

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Multivariate Methods Berlin Chen

Basics of ML Rohan Suri.

Multivariate Methods Berlin Chen, 2005 References:

Recap: Naïve Bayes classifier

Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]

Naïve Bayes Classifier

Presentation transcript:

Generative and Discriminative Models in Text Classification David D. Lewis Independent Consultant Chicago, IL, USA Workshop on Challenges in Information Retrieval and Language Modeling, U Mass, CIIR, Amherst, MA, 11 Sept 2002

Text Classification Given a document, decide which of several classes it belongs to: –TREC filtering –TDT tracking task –Text categorization! Automated indexing, content filtering, alerting,... More LM papers here than any other IR problem –Others: parts of IE, author identification,...

Lang. Models are Generative Model predicts probability document d will be generated by a source c e.g. Unigram language model: Parameters, i.e. P(w|c)’s, are fit to optimally predict generation of d

Classify Text w/ Gen. Model One source model for each class c Choose class c with largest value of: For 2 classes, unigram P(d|c), we have: aka Naive Bayes (NB), Roberston/KSJ

The Discriminative Alternative Directly model probability of generating class conditional on words: P(c|w) Logistic regression: Tune parameters to optimize conditional likelihood (class probability predictions)

LR & NB: Same Parameters!

Observations LR & NB have same parameterization for 2- or k-class, binary or raw TF weighting LR outperforms NB in text categorization and batch filtering studies –NB optimizes parameters to predict words, LR optimizes to predict class

False Hopes for LM? Leveraging unlabeled data (e.g. EM)? –Initial results show only small impact (same story as syntactic class tagging) Non-unigram models –More accurately predict the wrong thing? Cross-lingual TC –Any more than MT followed by TC?

True LM Hopes 1: Small Data? Number training examples to reach maximum effectiveness (Ng & Jordan ‘01): –NB: O(log # features) –LR: O(# features) LR and NB not compared yet (?) in low data (TREC adaptive, TDT tracking) case –Priors/smoothing likely to prove critical

True LM Hopes 2: Facets? MeSH category assignments: Anti-Inflammatory Agents, Non- Steroidal/*therapeutic use Tumor Necrosis Factor/antagonists & inhibitors/immunology Most combinations have zero training data Berger & Lafferty MT approach?

Non-LM TC Challenges? Integration of prior knowledge Choosing documents to label (TREC adaptive, active learning, sampling) Combining text and nontext predictors Knowing how well a classifier will/can do Evolving category systems, switching vocabularies