Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang.

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Naïve-Bayes Classifiers Business Intelligence for Managers.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
Information Retrieval Lecture 7 Introduction to Information Retrieval (Manning et al. 2007) Chapter 17 For the MSc Computer Science Programme Dell Zhang.
Bayes Rule The product rule gives us two ways to factor a joint probability: Therefore, Why is this useful? –Can get diagnostic probability P(Cavity |
What is Statistical Modeling
Probabilistic inference
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 13: Text Classification & Naive.
Assuming normally distributed data! Naïve Bayes Classifier.
Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.
1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Lecture 13-1: Text Classification & Naive Bayes
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
TEXT CLASSIFICATION CC437 (Includes some original material by Chris Manning)
ApMl (All Purpose Machine Learning) Toolkit David W. Miller and Helen Howell Semantic Web Final Project Spring 2002 Department of Computer Science University.
Presented by Zeehasham Rasheed
CS276A Text Retrieval and Mining Lecture 11. Recap of the last lecture Probabilistic models in Information Retrieval Probability Ranking Principle Binary.
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Scalable Text Mining with Sparse Generative Models
Text Categorization Moshe Koppel Lecture 2: Naïve Bayes Slides based on Manning, Raghavan and Schutze.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Review: Probability Random variables, events Axioms of probability
Naïve Bayes for Text Classification: Spam Detection
Advanced Multimedia Text Classification Tamara Berg.

Information Retrieval and Web Search Introduction to Text Classification (Note: slides in this set have been adapted from the course taught by Chris Manning.
Bayesian Networks. Male brain wiring Female brain wiring.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 11 9/29/2011.
How to classify reading passages into predefined categories ASH.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Classification Techniques: Bayesian Classification
Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.
MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,
Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Catalog Integration R. Agrawal, R. Srikant: WWW-10.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Information Retrieval and Organisation Chapter 14 Vector Space Classification Dell Zhang Birkbeck, University of London.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
KNN & Naïve Bayes Hongning Wang
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.
Introduction to Machine Learning Nir Ailon Lecture 11: Probabilistic Models.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Reading Notes Wang Ning Lab of Database and Information Systems
Information Retrieval
Lecture 15: Text Classification & Naive Bayes
Data Mining Lecture 11.
Bayesian Inference for Mixture Language Models
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Topic Models in Text Processing
Parametric Methods Berlin Chen, 2005 References:
Information Retrieval
INF 141: Information Retrieval
NAÏVE BAYES CLASSIFICATION
Presentation transcript:

Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang Birkbeck, University of London

Is this spam?

Text Classification/Categorization Given:  A document, d  D.  A set of classes C = {c 1, c 2,…, c n }. Determine:  The class of d : c(d)  C, where c(d) is a classification function (“classifier”).

Classification Methods (1) Manual Classification  For example, Yahoo! Directory, DMOZ, Medline, etc.  Very accurate when job is done by experts.  Difficult to scale up.

Classification Methods (2) Hand-Coded Rules  For example, CIA, Reuters, SpamAssassin, etc.  Accuracy is often quite high, if the rules have been carefully refined over time by experts.  Expensive to build/maintain the rules.

Classification Methods (3) Machine Learning (ML)  For example Automatic Classification: PopFile  Automatic Webpage Classification: MindSet   There is no free lunch: hand-classified training data are required.  But the training data can be built up (and refined) easily by amateurs.

Text Classification via ML L Classifier U LearningPredicting Training Documents Test Documents

Training Data: Test Data: Classes: Text Classification via ML - Example MultimediaGUIGarb.Coll.Semantics ML Planning planning temporal reasoning plan language... programming semantics language proof... learning intelligence algorithm reinforcement network... garbage collection memory optimization region... “planning language proof intelligence” (AI)(Programming)(HCI)...

Evaluating Classification Classification Accuracy  The proportion of correct predictions Precision, Recall  F 1 (for each class)  macro-averaging: computes performance measure for each class, and then computes a simple average over classes.  micro-averaging: pools per-document predictions across classes, and then computes performance measure on the pooled contingency table.

Sample Learning Curve Yahoo Science Data

Bayesian Methods for Classification Before seeing the content of document d  Classify d to the class with maximum prior probability Pr[c].  For each class c j  C, Pr[c j ] could be estimated from the training data: N j : the number of documents in the class c j

Bayesian Methods for Classification After seeing the content of document d  Classify d to the class with maximum a posterio probability Pr[c|d].  For each class c j  C, Pr[c j |d] could be computed by the Bayes’ Theorem.

Bayes’ Theorem prior probability class-conditional probability a posterior probability a constant

Naïve Bayes: Classification as Pr[d] is a constant How can we compute Pr[d|c j ] ?

Naive Bayes Assumptions To facilitate the computation of Pr[d|c j ], two simplifying assumptions are made.  Conditional Independence Assumption Given the doc’s topic, word in one position tells us nothing about words in other positions.  Positional Independence Assumption Each doc as a bag-of-words: the occurrence of word does not depend on position. Then Pr[d|c j ] is given by the class-specific unigram language model  Essentially a multinomial distribution.

Unigram Language Model 0.2the 0.1a 0.01man 0.01woman 0.03said 0.02likes … Model for c j themanlikesthewoman multiply

Naïve Bayes: Learning Given the training data  for each class c j  C estimate Pr[c j ] (as before) for each term w i in the vocabulary V  estimate Pr[w i |c j ] T ji : the number of occurrences of term i in documents of class c j

Smoothing Why not just use MLE? If a term w (in a test doc d) did not occur in the training data, Pr[w|c j ] would be 0, and then Pr[d|c j ] would be 0 no matter how strongly other terms in d are associated with class c j. Add-One (Laplace) Smoothing

Naïve Bayes is Not So Naïve Fairly Effective  The Bayes optimal classifier if the independence assumptions do hold.  Often performs well even if the independence assumptions are badly violated.  Usually yields highly accurate classification (though the estimated probabilities are not so accurate).  The 1st & 2nd place in KDD-CUP 97 competition, among 16 (then) state-of-the-art algorithms.  A good dependable baseline for text classification (though not the best).

Naïve Bayes is Not So Naïve Very Efficient  Linear time complexity for learning/classification.  Low storage requirements.

Take Home Messages Text Classification via Machine Learning Bayes’ Theorem Naïve Bayes Learning Classification