11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 1 Classification.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Bayesian Learning Provides practical learning algorithms
What is Statistical Modeling
Classification and Prediction.  What is classification? What is prediction?  Issues regarding classification and prediction  Classification by decision.
Classification and risk prediction
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Bayesian classifiers.
Classification and Prediction
Classifiers in Atlas CS240B Class Notes UCLA. Data Mining z Classifiers: yBayesian classifiers yDecision trees z The Apriori Algorithm zDBSCAN Clustering:
Classification and Regression. Classification and regression  What is classification? What is regression?  Issues regarding classification and regression.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Visual Recognition Tutorial
Classification.
Thanks to Nir Friedman, HU
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Oct 17, 2006Sudeshna Sarkar, IIT Kharagpur1 Machine Learning Sudeshna Sarkar IIT Kharagpur.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Bayesian Networks. Male brain wiring Female brain wiring.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Naive Bayes Classifier
Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.
Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes.
Basic Data Mining Technique
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Naïve Bayes Classifier. Bayes Classifier l A probabilistic framework for classification problems l Often appropriate because the world is noisy and also.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Bayesian Classifier. 2 Review: Decision Tree Age? Student? Credit? fair excellent >40 31…40
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
Supervised Learning Approaches Bayesian Learning Neural Network Support Vector Machine Ensemble Methods Adapted from Lecture Notes of V. Kumar and E. Alpaydin.
Bayesian Classification Using P-tree  Classification –Classification is a process of predicting an – unknown attribute-value in a relation –Given a relation,
Algorithms for Classification: The Basic Methods.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Classification And Bayesian Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Classification and Prediction
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Classification & Prediction — Continue—. Overfitting in decision trees Small training set, noise, missing values Error rate decreases as training set.
Data Mining and Decision Support
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
Supervise Learning. 2 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.”
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Naive Bayes Classifier
Data Science Algorithms: The Basic Methods
Chapter 6 Classification and Prediction
Data Mining Lecture 11.
Classification and Prediction
Data Mining: Concepts and Techniques Classification
Classification Techniques: Bayesian Classification
Data Mining Algorithms
Lecture 3 Classification and Prediction
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Classification Bayesian Classification 2018年12月30日星期日.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Classification and Prediction
CSCI N317 Computation for Scientific Applications Unit Weka
Data Mining: Classification
©Jiawei Han and Micheline Kamber
Foundations 2.
Classification 1.
Presentation transcript:

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 1 Classification

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 2 Learning Objectives Define what is classification and what is prediction. Solve issues regarding classification and prediction Classification with naïve Bayes algorithm

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 3 Classification: –predicts categorical class labels –classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data Prediction: –models continuous-valued functions, i.e., predicts unknown or missing values Typical Applications –credit approval –target marketing –medical diagnosis –treatment effectiveness analysis Classification vs. Prediction

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 4 Classification—A Two-Step Process Model construction: describing a set of predetermined classes –Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute –The set of tuples used for model construction: training set –The model is represented as classification rules, decision trees, or mathematical formulae Model usage: for classifying future or unknown objects –Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model Test set is independent of training set, otherwise over-fitting will occur

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 5 Classification Process (1): Model Construction Training Data Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model)

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 6 Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured?

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 7 Supervised vs. Unsupervised Learning Supervised learning (classification) –Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations –New data is classified based on the training set Unsupervised learning (clustering) –The class labels of training data is unknown –Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 8 What is classification? What is prediction? Issues regarding classification and prediction Classification with naïve Bayes algorithm

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 9 Issues regarding classification and prediction (1): Data Preparation Data cleaning –Preprocess data in order to reduce noise and handle missing values Relevance analysis (feature selection) –Remove the irrelevant or redundant attributes Data transformation –Generalize and/or normalize data

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 10 Issues regarding classification and prediction (2): Evaluating Classification Methods Predictive accuracy Speed and scalability –time to construct the model –time to use the model Robustness –handling noise and missing values Scalability –efficiency in disk-resident databases Interpretability: –understanding and insight provided by the model Goodness of rules –decision tree size –compactness of classification rules

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 11 Measuring Error Accuracy = # of found / # of instances = (TN+TP) / N Error rate = # of errors / # of instances = (FN+FP) / N Recall = # of found positives / # of positives = TP / (TP+FN) = sensitivity = hit rate Precision = # of found positives / # of found = TP / (TP+FP) Specificity = TN / (TN+FP) False alarm rate = FP / (FP+TN) = 1 - Specificity

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 12 ROC Curve

1/13/2010TCSS888A Isabelle Bichindaritz13 Approaches to Evaluating Classification Separate training (2/3) and testing (1/3) sets Use cross validation, e.g., 10-fold cross validation –Separate the data training (9/10) and testing (1/10) –Repeat 10 times by rotating the (1/10) used for testing –Average the results

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 14 What is classification? What is prediction? Issues regarding classification and prediction Classification with naïve Bayes algorithm

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 15 Probabilistic Framework Bayesian inference is a method based on probabilities to draw conclusions in the presence of uncertainty. It is an inductive method (pair deduction/induction) A  B, IF A (is true) THEN B (is true) (deduction) IF B (is true) THEN A is plausible (induction)

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 16 Probabilistic Framework Statistical framework –State hypotheses and models (sets of hypotheses) –Assign prior probabilities to the hypotheses –Draw inferences using probability calculus: evaluate posterior probabilities (or degrees of belief) for the hypotheses given the data available, derive unique answers.

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 17 Probabilistic Framework Conditional probabilities (reads probability of X knowing Y): Events X and Y are said to be independent if: or

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 18 Probabilistic Framework Bayes rule:

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 19 Bayesian Inference Bayesian inference and induction consists in deriving a model from a set of data –M is a model (hypotheses), D are the data –P(M|D) is the posterior - updated belief that M is correct –P(M) is our estimate that M is correct prior to any data –P(D|M) is the likelihood –Logarithms are helpful to represent small numbers Permits to combine prior knowledge with the observed data.

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 20 Bayesian Inference To be able to infer a model, we need to evaluate: –The prior P(M) –The likelihood P(D/M) P(D) is calculated as the sum of all the numerators P(D/M)P(M) over all the hypotheses H, and thus we do not need to calculate it:

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 21 Bayesian Networks: Classification diagnostic P (C | x ) Bayes’ rule inverts the arc:

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 22 Naive Bayes’ Classifier Given C, x j are independent: p(x|C) = p(x 1 |C) p(x 2 |C)... p(x d |C)

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 23 Naïve Bayes Classifier (I) A simplified assumption: attributes are conditionally independent: p(C/ x ) ~ p(C) p(x|C) p(x|C) ~ p(x 1 |C) p(x 2 |C)... p(x d |C) Greatly reduces the computation cost, only count the class distribution.

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 24 Naive Bayesian Classifier (II) Given a training set, we can compute the probabilities

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 25 Bayesian classification The classification problem may be formalized using a-posteriori probabilities: P(C|X) = prob. that the sample tuple X= is of class C. E.g. P(class=N | outlook=sunny,windy=true,…) Idea: assign to sample X the class label C such that P(C|X) is maximal

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 26 Estimating a-posteriori probabilities Bayes theorem: P(C|X) = P(X|C)·P(C) / P(X) P(X) is constant for all classes P(C) = relative freq of class C samples C such that P(C|X) is maximum = C such that P(X|C)·P(C) is maximum Problem: computing P(X|C) is unfeasible!

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 27 Naïve Bayesian Classification Naïve assumption: attribute independence P(x 1,…,x k |C) = P(x 1 |C)·…·P(x k |C) If i-th attribute is categorical: P(x i |C) is estimated as the relative freq of samples having value x i as i-th attribute in class C If i-th attribute is continuous: P(x i |C) is estimated thru a Gaussian density function Computationally easy in both cases

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 28 Play-tennis example: estimating P(x i |C) outlook P(sunny|p) = 2/9P(sunny|n) = 3/5 P(overcast|p) = 4/9P(overcast|n) = 0 P(rain|p) = 3/9P(rain|n) = 2/5 temperature P(hot|p) = 2/9P(hot|n) = 2/5 P(mild|p) = 4/9P(mild|n) = 2/5 P(cool|p) = 3/9P(cool|n) = 1/5 humidity P(high|p) = 3/9P(high|n) = 4/5 P(normal|p) = 6/9P(normal|n) = 2/5 windy P(true|p) = 3/9P(true|n) = 3/5 P(false|p) = 6/9P(false|n) = 2/5 P(p) = 9/14 P(n) = 5/14

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 29 Play-tennis example: classifying X An unseen sample X = P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = Sample X is classified in class n (don ’ t play)

11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 30 The independence hypothesis… … makes computation possible … yields optimal classifiers when satisfied … but is seldom satisfied in practice, as attributes (variables) are often correlated. Attempts to overcome this limitation: –Bayesian networks, that combine Bayesian reasoning with causal relationships between attributes –Decision trees, that reason on one attribute at the time, considering most important attributes first