A Modified Naïve Possibilistic Classifier for Numerical Data

Slides:



Advertisements
Similar presentations
Principles of Density Estimation
Advertisements

ICS-171:Notes 8: 1 Notes 8: Uncertainty, Probability and Optimal Decision-Making ICS 171, Winter 2001.
A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
WRSTA, 13 August, 2006 Rough Sets in Hybrid Intelligent Systems For Breast Cancer Detection By Aboul Ella Hassanien Cairo University, Faculty of Computer.
Multiple Instance Learning
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Basic Data Mining Techniques Chapter Decision Trees.
Basic Data Mining Techniques
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Remote Sensing Laboratory Dept. of Information Engineering and Computer Science University of Trento Via Sommarive, 14, I Povo, Trento, Italy Remote.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Chapter 1: Introduction to Statistics
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Department of Electrical Engineering, Southern Taiwan University Robotic Interaction Learning Lab 1 The optimization of the application of fuzzy ant colony.
3. Rough set extensions  In the rough set literature, several extensions have been developed that attempt to handle better the uncertainty present in.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Uncertainty Management in Rule-based Expert Systems
Recommendation for English multiple-choice cloze questions based on expected test scores 2011, International Journal of Knowledge-Based and Intelligent.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
CONTROL ENGINEERING IN DRYING TECHNOLOGY FROM 1979 TO 2005: REVIEW AND TRENDS by: Pascal DUFOUR IDS’06, Budapest, 21-23/08/2006.
AUTOMATIC TARGET RECOGNITION AND DATA FUSION March 9 th, 2004 Bala Lakshminarayanan.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Classification And Bayesian Learning
International Conference on Fuzzy Systems and Knowledge Discovery, p.p ,July 2011.
Classification and Regression Trees
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Sampling and Sampling Distribution
Inexact Reasoning 2 Session 10
Queensland University of Technology
Semi-Supervised Clustering
Presented by Khawar Shakeel
Sentiment analysis algorithms and applications: A survey
Boosted Augmented Naive Bayes. Efficient discriminative learning of
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Semi-supervised Machine Learning Gergana Lazarova
Inexact Reasoning 2 Session 10
Course: Autonomous Machine Learning
Data Mining Lecture 11.
Professor S K Dubey,VSM Amity School of Business
Data Mining Practical Machine Learning Tools and Techniques
An Inteligent System to Diabetes Prediction
Statistical NLP: Lecture 9
Prepared by: Mahmoud Rafeek Al-Farra
Data Mining – Chapter 3 Classification
iSRD Spam Review Detection with Imbalanced Data Distributions
Machine Learning in Practice Lecture 17
Intelligent Contextual Data Stream Monitoring
Text Categorization Berlin Chen 2003 Reference:
Parametric Methods Berlin Chen, 2005 References:
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
Supervised machine learning: creating a model
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

A Modified Naïve Possibilistic Classifier for Numerical Data Presented by: Karim Baati Ph.D. Student at REGIM-Lab., University of Sfax, Tunisia Teaching and Research Assistant at ESPRIT School of Engineering, Tunis, Tunisia.

Presentation outline Context Possibilistic classifier Estimation of possibility beliefs G-Min algorithm Experimental results Conclusion and perspectives

Context

Context (1) Machine learning is a field that involves all techniques that help to reach a decision. These techniques may be divided into two major categories namely, supervised and unsupervised. The difference is that the first category (supervised) assigns a final decision among a set of predefined classes whereas the second one (unsupervised) allocates a final decision to a hidden class that it searches itself. It is rather with regard to the first category that we can locate the current work.

Context (2) Regardless the type of the supervised machine learning technique, the main objective is usually defined as how to find the right class In order to do that, each technique requires data that are often depicted by vectors in which every value stands for a particular attribute. C = {c1=No disease, c2=Heart disease, c3=Lung disease} A = {a1= Gender {Man, Woman}, a2 = Temperature {Real}, a3= Blood pressure{Real}, a4=Chest pain {Yes, No}} vt={Man, 39, 17, No}. What is the final decision c*?

Context (3) Normally, a perfect classification process has to be based on perfect data Yet, data which are handled in our daily life are never perfect Main types of imperfections: uncertainty, imprecision, heterogeneity, insufficiency, etc. In the current work, we are dealing with poor data. Poor data refer to data which are not sufficient enough to let us acquire the necessary knowledge to make decision.

Context (4) Poor data may exist for different reasons : insufficient number, of instances, many missing values , imbalanced data, etc. That could be encountered in many fields especially in medical diagnosis when a new pathology emerges. Challenge: poor data often leads to ambiguity when making the final decision. Ambiguity refers to the fact that the final decision has very close possibility estimate to the other alternatives of the classification problem.

Possibilistic classifier

Possibilistic classifier (1) Naïve Possibilistic Classifier (NPC) hybridizes the naïve Bayesian structure as a good pattern that has proven its efficiency although the strong assumption of attributes independency with the possibilistic framework as a strong tool to deal with poor data.

Prossibilistic classifier (2) In the estimation step, possibilisty distributions must be normalized (at least one event must have a possibility equal to 1) but no need to have the sum of values equal to 1 as in probability theory) In the fusion step, possibilistic classification requires the use of either the product or the minimum as a fusion rule.

Estimation of possibility beliefs

Estimation of possibility beliefs (1) The proposed method is based on the probability to possibility transformation of Dubois et al. in the continuous case. First advantage: Method based on the maximum specificity that produces the upper bound of probability of a given event. Second advantage : Possiblity theory is based on fuzzy sets theory and therefore the method allows to converge from the probability-based estimation and its “strict” Bayes rule to a fuzzy-sets-based estimation which is more suitable to handle ambiguity.

Estimation of possibility beliefs (2) To estimate possibilistic beliefs from numerical data, attribute values are first normalized as follows: Afterward, we make use of the probability-possibility transformation of Dubois et al. in the continuous case where G is a Gaussian cumulative distribution which may be assessed using the table of the standard normal distribution.

G-Min algorithm

G-Min algorithm (1) The Generalized Minimum-based algorithm (G-Min) aims to avoid ambiguity during the final decision from possibilistic beliefs. It is based on two steps. The first aims to build a set of possible decisions whereas the second aims to filter this set in order to find a final class with a high score of reliability The principle behind the proposed algorithm is to simulate a wise human behavior that delays the final decision in case of ambiguity until having a reliable decision.

G-Min algorithm (2)

Experimental results

Experimental results Experiments are conducted on 15 datasets provided by UCI.

Experimental results The proposed classifier is the best in terms of average rank

Conclusion and perspectives

Conclusion and perspectives The new version hybridizes the capability of the former NPC to estimate possibilistic beliefs from numerical data with the efficiency of the G-Min as a novel algorithm for the fusion of possibilistic beliefs. Experimental results have shown that the proposed classifier largely outperforms the former NPC in terms of accuracy The good behavior of the proposed G-Min-based NPC may be promising in the sense that this classifier can be joined to one of the previously proposed possibilistic classifiers for categorical data in order to treat uncertainty stemming from mixed numerical and categorical data

Thank You For Your Attention ! Please send your questions to: karim.baati@ieee.org