Data classification based on tolerant rough set reporter: yanan yean
Abstract Similarity measure between two data is described by a distance function of all constituent attributes. Optimal similarity threshold value –GA Two-stage classification method –Lower approximation –Rough membership functions obtained from the upper approximation BPNN,OFUNN,FCM
Outline Introduction Tolerant rough set Determination of similarity thresholds Data classification based on the tolerant rough set Simulation results and discussion conclusion
Carpenter and Grossberg –Fuzzy adaptive resonance theory (ART) Lin and Lee –A general neural-network model for fuzzy logic control and decision systems Simpson –A fuzzy min-max classification neural network Banzan et al. –Multi-modal logics for automatic feature extraction –Rough-set-based induct reasoning for discovering optimal feature set. Nguyen et al. –The tolerance relation among the objects for pattern classification. 1.Introduction
2.Tolerant rough set Some objects have an indiscernibility relation I from each other with the given attributes. A tolerance relation that satisfies only the reflexive and symmetric property.
A tolerance set Define a similarity measure that quantifies the closeness between attribute values of objects. –t(a) is a similarity threshold value We can relate the tolerance relation with the similarity measure as
One of the most important tasks in the data classification using the similarity measure defined above is the optimal determination of the similarity threshold Apply the GA to solve this optimization problem
3.Determination of similarity thresholds 3-1. Chromosome representation –The Inputs: the information table –The similarity measure –The output: a set of optimal similarity threshold values –An object is represented by n attributes –The chromosome for the GA consists of n+1 consecutive real numbers of the similarity thresholds –t(A) : the similarity threshold that defines the tolerance relation when all attributes A are considered together.
3-2. Initial population generation The initial gene values in the chromosome are obtained by generating n+1 real-valued random numbers in the interval of [0.5,1.0]
3-3.Fitness function If,then we can say that there is a connection between two objects x and y. When two objects are tolerant and contained in the same class, they have good connection.
Some objects that are tolerant of each other are included in the same class as many as possible. A quality of approximation of classification that express the ratio of all classified objects to all objects. A set of objects contained in the same class The tolerance set of an object x whose all elements in TS(x) is contained in the same class d i
A quality of approximation of classification that express the ratio of all classified objects to all objects. ; the size of tolerant sets ;similarity thresholds The ratio of good connection –Express a ration of good connections to all possible connections as
;the size of tolerant sets ;the similarity thresholds The fitness function F in order to balance two coefficients The first term makes some tolerant objects to be contained in the same class The second term makes the objects in the same class to be tolerant.
3-4. Genetic operations Reproduction –First selection method : F –Second selection method: a modified k- tournament method. F, k chromosomes selected from the upper class of fitness values randomly is chosen => reproduction –Choromosomes : C 1.C 2 =>C c+m
Crossover –(C 1,t 1 (a i ),F 1 ) (C 2,t 2 (a i ),F 2 ) –The new chromosome C c created by the chromosome operation is computed by an average weighted by fitness value as Mutation
4.Data classification based on the tolerant rough set We define a rough membership function u di (x) –Express the degree of inclusion of the sample x in the decision class d i as 1st stage: Classification using the lower approximation set –A tolerant set of a test sample x, 2nd stage: Classification using the upper approximation set
5.Simulation results and discussion