Optimizing Text Classification Mark Trenorden Supervisor: Geoff Webb.

Slides:



Advertisements
Similar presentations
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Advertisements

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 16 10/18/2011.
Outline input analysis input analyzer of ARENA parameter estimation
Naïve Bayes Advanced Statistical Methods in NLP Ling572 January 19,
Chapter 4 Probability and Probability Distributions
Lecture 10 – Introduction to Probability Topics Events, sample space, random variables Examples Probability distribution function Conditional probabilities.
Data Mining Classification: Naïve Bayes Classifier
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Evaluation.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Discrete Probability Distributions
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.
Text Categorization Moshe Koppel Lecture 2: Naïve Bayes Slides based on Manning, Raghavan and Schutze.
Class notes for ISE 201 San Jose State University
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Lecture 10 – Introduction to Probability Topics Events, sample space, random variables Examples Probability distribution function Conditional probabilities.
Advanced Multimedia Text Classification Tamara Berg.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Bayesian Networks. Male brain wiring Female brain wiring.
STAT02 - Descriptive statistics (cont.) 1 Descriptive statistics (cont.) Lecturer: Smilen Dimitrov Applied statistics for testing and evaluation – MED4.
Statistical Techniques I EXST7005 Review. Objectives n Develop an understanding and appreciation of Statistical Inference - particularly Hypothesis testing.
Text Classification, Active/Interactive learning.
Naive Bayes Classifier
Using Probability and Discrete Probability Distributions
Feature Selection: Why?
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 5 Discrete Random Variables.
Chapter 12 Probability. Chapter 12 The probability of an occurrence is written as P(A) and is equal to.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 3. Word Association.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
A Comparison of Event Models for Naïve Bayes Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.
Naïve Bayes Classifier Christina Wallin, Period 3 Computer Systems Research Lab Christina Wallin, Period 3 Computer Systems Research Lab
Naïve Bayes Classification Christina Wallin Computer Systems Research Lab
Chapter 31Introduction to Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2012 John Wiley & Sons, Inc.
Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Background for Machine Learning (I) Usman Roshan.
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.
Chi Square Chi square is employed to test the difference between an actual sample and another hypothetical or previously established distribution such.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Bayesian and Markov Test
Discrete Random Variables
Tackling the Poor Assumptions of Naive Bayes Text Classifiers Pubished by: Jason D.M.Rennie, Lawrence Shih, Jamime Teevan, David R.Karger Liang Lan 11/19/2007.
Lecture 15: Text Classification & Naive Bayes
Machine Learning. k-Nearest Neighbor Classifiers.
Language Models for Information Retrieval
People-LDA using Face Recognition
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
CSE 321 Discrete Structures
Naive Bayes for Document Classification
Topic Models in Text Processing
Chapter 16 Random Variables Copyright © 2009 Pearson Education, Inc.
Part II: Discrete Random Variables
Presentation transcript:

Optimizing Text Classification Mark Trenorden Supervisor: Geoff Webb

Introduction What is Text Classification? Naïve Bayes Event Models Binomial Model Binning Conclusion

Text Classification Grouping documents of the same topics For example, Sport, Politics, e.t.c. Slow process for humans

Naïve Bayes P(c j | d i ) = P(c j ) P(d i | c j ) P(d) This is Bayes theorem Naïve Bayes assumes independence between attributes, in this case words. Not a correct assumption however still performs classification well.

Event Models Different ways of viewing a document In Bayes rule this translates to different ways of calculating, P(d i | c j ). There are two frequently used models

Multi–Variate Bernoulli Model In text classification terms, – A document(d i ) is an EVENT – Words(w t ) within the document are considered as ATTRIBUTES of d i – Number of occurrences of a word in a document is not recorded – When calculating the probability of class member ship all words in the vocabulary are considered even if thet don’t appear in document

Multinomial Model Number of occurrences of a word is captured Individual word occurrences are considered as “events” The document is considered to be a collection of events Only words that appear in the document and their counts are considered when calculation class membership

Previous Comparison Multi-Variate model good for small vocabulary Multi-Nomial model good for large vocabulary. Multi-Nomial much faster then the Multi- Variate

Binomial Model Want to capture occurances and non- occurances as well as word frequencies. P(d i | c j ) = Sum of P(c) + P(w | d) N * P(~w | d) L-N Where c = class, w = word, d = document, L = length and n = no of occurances of word

Binomial Results Performed just as well as multinomial with large vocabulary, however much slower. Outperformed Multi-Variate once vocabulary increased However did worse then existing techniques with smaller vocabulary sizes

Binomial Results Number of Words in the Vocabulary % Correctly Classed

Document Length None of the techniques take in to account document length. Currently, P(d | c) = f (w Є d, c) However we should incorporate document length. P(d | c) = f (w Є d, l, c)

Binning Discretization has been found to be effective for numeric variables for Naïve Bayes. Groups documents of similar lengths Theory is the distributions will differ significantly for different lengths This will help improve classification

Binning For my tests, bin size = 1000, if less then 2000 documents only use two bins. Increasing Document Size Bin 1Bin 2Bin 3 Bin 4

Binning Example Two Bins are created. Tables with word counts for each class within a bin for are created as opposed to one table for all words as per traditional methods. George Bush GWB Not GWB 4/20 7/20 3/20 1/20 Cat Length words Length words 3/20 2/20 3/20 2/20 3/207/20 George BushCat GWB Not GWB

Binning Given a unseen document, binning helps refine probabilities. For example If no bins, the probability that the word ‘Bush’ occurs in the GWB class is 10/40 or 25%. If we know that the document is in the words bin the probability of the word ‘Bush’ appearing in GWB is 7/20 or 35%.

Binning Results When applied to all datasets binning improved classification accuracy on all techniques

Binning Results 7 Sectors Dataset, Multi-Variate Method

Binning Results WebKB Dataset, Multi-Nomial Method

Conclusion/Future Goals Binning best solution Applicable to all event models In future apply event models and binning techniques to classification techniques other then Naïve Bayes.