Fuzzy Interpretation of Discretized Intervals Dr. Xindong Wu Andrea Porter April 11, 2002.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
RIPPER Fast Effective Rule Induction
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Deriving rules from data Decision Trees a.j.m.m (ton) weijters.
Final Exam: May 10 Thursday. If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis.
CHAPTER 2 D IRECT M ETHODS FOR S TOCHASTIC S EARCH Organization of chapter in ISSO –Introductory material –Random search methods Attributes of random search.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.
Rule Induction with Extension Matrices Dr. Xindong Wu Journal of the American Society for Information Science VOL. 49, NO. 5, 1998 Presented by Peter Duval.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Decision Tree Algorithm
Lecture 3: Integration. Integration of discrete functions
Fuzzy Interpretation of Discretized Intervals Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by Peter Duval.
PART 7 Constructing Fuzzy Sets 1. Direct/one-expert 2. Direct/multi-expert 3. Indirect/one-expert 4. Indirect/multi-expert 5. Construction from samples.
Constraint Satisfaction Problems
Induction of Decision Trees
Lecture 5 (Classification with Decision Trees)
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Fuzzy Interpretation of Discretized Intervals Author: Dr. Xindong Wu IEEE TRANSACTIONS ON FUZZY SYSTEM VOL. 7, NO. 6, DECEMBER 1999 Presented by: Gong.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Cluster Analysis (1).
Rule Induction with Extension Matrices Yuen F. Helbig Dr. Xindong Wu.
Rule Induction with Extension Matrices Leslie Damon, based on slides by Yuen F. Helbig Dr. Xindong Wu, 1998.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Introduction to Directed Data Mining: Decision Trees
Module 04: Algorithms Topic 07: Instance-Based Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
A Simple Method to Extract Fuzzy Rules by Measure of Fuzziness Jieh-Ren Chang Nai-Jian Wang.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Weka Project assignment 3
1 Statistical Distribution Fitting Dr. Jason Merrick.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining and Decision Support
Decision Tree Algorithms Rule Based Suitable for automatic generation.
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Classification and Regression Trees
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
10. Decision Trees and Markov Chains for Gene Finding.
Machine Learning: Ensemble Methods
Chapter 6 Classification and Prediction
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Dr. Unnikrishnan P.C. Professor, EEE
KAIST CS LAB Oh Jong-Hoon
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Machine Learning in Practice Lecture 17
A task of induction to find patterns
A task of induction to find patterns
Presentation transcript:

Fuzzy Interpretation of Discretized Intervals Dr. Xindong Wu Andrea Porter April 11, 2002

Plan For Presentation  Introduction to Problem, HCV  Discretization Techniques/Fuzzy Borders  A Hybrid Solution for HCV  Experiments and Results  Conclusion

Introduction  Real-world data contains both numerical and nominal data, must be able to deal with different types of data.  Existing systems discretize numerical domains into intervals and treat intervals as nominal values during induction.  Problems occur if test examples are not covered in training data (no-match, multiple match)  The solution is a hybrid approach using fuzzy intervals for no-match problem.

HCV  Attribute based rule induction algorithm, extension matrix approach  Divide positive examples into intersecting groups  Find a heuristic conjunctive rule in each group that covers all PE and no NE  HCV can find a rule in the form of variable- valued logic  More compact than the decision trees/rules of ID3 and C4.5

Variable Valued Logic and Selectors  Represents decisions where variables can take a range  Selector: [ X # R ] X = attribute # = relational operator ( =,,... ) R = Reference, list of 1 or more values e.g [ Windy = true][Temp > 90]

HCV Software  C++ implementation  Can work with noisy and real-valued domains as well as nominal and noise- free databases  Provides a set of deduction facilities for the user to test the accuracy of the produced rules on test examples

Example DB ORDERX1X2X3X4CLASS 11aa1F 21ab1F 31ac1F 41aa0F 51bc1T 60bb0T 70ac1T 81ba0T 91ba1T 101cc0F 111cb1F 120cb0T 130aa0T 140cc1F 150ca0T 161ab0F 170aa1T 180ba1T

C4.5 Results vs. HCV  C4.5:The T class X2 = b  X1 = 0 & X3 = a  X1 = 0 & X3 = b  X1 = 0 & X2 = a  HCV:The T class X2 = b  X1 = 0 & X2 = a  X1 = 0 & X4 = 0  C4.5:The F class X1 = 1 & X2 = a  X1 = 1 & X2 = c  X2 = c & X3 = c

Deduction of Induction Results  Induction generates knowledge from existing data  Deduction applies induction results to interpret new data.  With real-world data, induction can not be assumed to be perfect  Three cases: 1) no-match (measure of fit) 2) single-match 3) multiple-match (estimate of probability)

Discretization  Occurs during rule induction  Discretize numerical domains into intervals and treat similar to nominal values.  The challenge is to find the right borders for the intervals  Possible Methods: 1) Simplest Class-Separating Method 2) Information Gain Heuristic (implemented in HCV)

Simplest Class- Separating Method:  Interval Borders are places between each adjacent pair of examples which have different classes.  If attribute is very informative - method is efficient and useful.  If attribute is not informative - method produces too many intervals

Information Gain Heuristic Use IGH to find more informative border.  x = (x i + x i+1 )/2 for (i = 1, …, n-1)  x is a possible cut point if xi and xi+1 are of different classes.  Use IGH to find best x  Recursively split on left and right  To stop recursive splitting: 1) stop if IGH is same on all possible cut points. 2) stop if # of examples to split is less than a predefined number 3) limit the number of intervals

Fuzzy Borders  Discretization of continuous domains does not always fit accurate interpretation.  Instead of using sharp borders, use a membership function, measures the degree of membership.  A value can be classified into a few different intervals at the same time (e.g. single to multiple match)

Fuzzy Borders (2)  Fuzzy matching - deduction with fuzzy borders of discretized intervals.  Take the interval with the greatest degree as the value’s discrete value.  3 functions to fuzzify borders: 1) linear 2) polynomial 3) arctan  Definitions s = spread parameter l = length of original x left, x right = left/right sharp borders x left x right l

a = -kx left + 1/2b = kx right + 1/2 lin left (x) = kx + a lin right (x) = -kx + b lin(x) = MAX(0, MIN(1,lin left (x),lin right (x))) Linear Membership Function k = 1/2sl x left x right l sl

Arctan Membership Function

*Polynomial Membership Function poly side (x) = a side x 3 + b side x 2 + c side x + d side a side = 1/(4(ls) 3 ) b side = -3a side x side side  {left,right} c side = 3a side (x side 2 - (ls) 2 ) d side = -a(x side 3 -3x side (ls) 2 + 2(ls) 3 ) poly left (x),if x left -ls  x  x left + ls poly(x) = poly right (x),if x right -ls  x  x right +ls 1,if x left +ls  x  x right -ls 0,otherwise

Match Degree  Selector method - take the max membership degree of the value in all the intervals involved. If 2 adjacent intervals have the same class, values close to the border will have low membership.  Conjunction method - adds with fuzzy plus a  b=a + b - ab

No-Match Resolution Largest Class  Assign all no match examples to the largest class, the default class.  Works well, if the number of classes in a training set is small and one class is clearly larger.  Deteriorates if there is a larger number of classes and the examples are evenly distributed

No-Match Resolution Measure of Fit Calculate the measure of fit for each class: 1) calculate MF for each selector (sel) MF(sel, e) = 1,if sel is satisfied by e n/|x|,otherwise 2) calculate MF for each conjunctive rule(conj) MF(conj, e) =  MF(sel, e) * n(conj)/N

No-Match Resolution Measure of Fit (2) 3) calculate MF for each class c MF(c, e) = MF(conj 1, e) + MF(conj 2, e) - MF(conj 1,e)MF(conj 2,e) * For more than two rules, apply formula recursively. * Find maximum MF - determines which class is closest to the example

Multiple-Match  Caused by over-generalization of the training examples at induction time  Example  (X 1 = a, X 2 = 1)  All PE cover X 1 = a  All NE cover X 2 = 1  Multiple Match

Multiple-Match Resolution First Hit  Use first rule which classifies the example  Produces reasonable results if the rules from induction have been ordered according to a measure of reliability  Advantages - straightforward, efficient  Disadvantages - have to sort rules at induction time

Multiple-Match Resolution Largest Rule  Similar to largest class method from no- match resolution  Choose conjunctive rule that covers the most examples in the training set.

Multiple-Match Resolution Estimation of Probability  Assign EP value to each class based on the size of the satisfied conjunctive rules. 1) Find EP for each conjunctive rule (conj): EP(conj, e)= {n(conj)/N, if conj is satisfied by e 0, otherwise n(conj) = number of examples covered by conj N = number of total examples

Multiple-Match Resolution Estimation of Probability (2) 2) Find EP value for each class: EP(c, e) = EP(conj 1, e) + EP(conj 2, e) - EP(conj 1,e)EP(conj 2,e). * For more rules, apply formula recursively * Choose class with highest EP value

Hybrid Interpretation  Used because fuzzy borders only add conflicts because they don’t reduce the number rules that are applicable  HCV - use sharp borders during induction and use fuzzy borders only during deduction  Algorithm: * Single match - use class indicated by rules * Multiple match - use estimation probability (EP) with sharp borders * No match - use fuzzy borders with polynomial membership function to find closest rule

The Data  Used 17 databases from the Machine Learning Database Repository, U. of California, Irvine.  Databases selected because: 1) All include numerical data 2) All lead to situations where no rules clearly apply.

Results – Predictive Accuracy

Results (cont.)  The results shown for C4.5 and NewID are the pruned ones  These were usually better than the unpruned ones in this experiment  HCV did not fine tune different parameters because this would be loss of generality and applicability of the conclusions

Accuracy Results  HCV(hybrid) - 9 databases  C4.5 (R 8) - 7 databases  C4.5 (R 5) - 6 databases  HVC (large) - 3 databases  HCV (fuzzy) - 2 databases

HCV Comparison  HCV (fuzzy) generally performs better than the simple largest class method  HCV’s performance improves significantly when the fuzzy borders (for no match) are combined with probability estimation (for multiple match) in HCV (hybrid)

Conclusions  Fuzzy borders are constructed and used at deduction time only when a no match case occurs.  This hybrid method performs more accurately than several other current deduction programs.  Fuzziness is strongly domain dependent, HCV allows the user to specify their own intervals and fuzzy functions.