Download presentation
Presentation is loading. Please wait.
1
Xi’an Jiaotong University Title: Attribute reduction in decision systems based on relation matrix Authors: Cheng Zhong and Jin-hai Li
2
Xi’an Jiaotong University Contents 1. Introduction 2. Some basic notions related to RST 3. Two indices for measuring the significance of the attributes in a decision system 4. A heuristic attribute-reduction algorithm for decision systems 5. Numerical experiments
3
Xi’an Jiaotong University 1. Introduction Rough set theory (RST), proposed by Pawlak in 1982 [1], is one of the effective mathematical tools for processing fuzzy and uncertainty knowledge. Nowadays, RST has been applied to a variety of fields such as artificial intelligence, data mining, pattern recognition and knowledge discovery [2-7]. As well known, attribute reduction is one of the key issues in RST. It is performed in information systems by means of the notion of a reduct based on
4
Xi’an Jiaotong University a specialization of the notion of independence due to Marczewski [8]. Up to now, much attention has been paid to this issue and many different methods of attribute reduction have been proposed for decision systems. For example, the reduction approaches in [9-13] are respectively based on partition, discernibility matrix, conditional information entropy, positive region, and ant colony optimization approach.
5
Xi’an Jiaotong University Though there are already many reduction methods for decision systems, it may be necessary to further investigate the issue of attribute reduction, because the Boolean reasoning-based algorithms for finding a minimal reduct of a decision system are computationally expensive and they are even impossibly implemented for a large dataset; on the other hand, it is hard for heuristic reduction methods to check whether the obtained reduct is minimal when the given decision system is large.
6
Xi’an Jiaotong University In [14], numerical experiments illustrate that, by designing the reduction algorithm from the viewpoint of relation matrix, it is efficient for the algorithm to find a minimal reduct of an information system in the operating environment of the MATLAB software which is of great ability in dealing with matrix computations. In this study, a new heuristic attribute-reduction algorithm in decision systems is proposed from the viewpoint of relation matrix, and some numerical experiments are conducted to access the performance of the proposed algorithm.
7
Xi’an Jiaotong University 2. Some basic notions related to RST Definition 1. An information system is a quadruple (U, A, V, f ), where U is a nonempty and finite set of objects, A is a nonempty and finite set of attributes, V := ∪ V a with V a being the domain of attribute a, and f is an information function such that f(x,a) ∈ V a for every x ∈ U and every a ∈ A. A decision system is an information system (U, C ∪ D, V, f ) with C ∩ D=Ф, where C and D are called the conditional and decision attribute sets, respectively.
8
Xi’an Jiaotong University For a subset P of A, let us define the corresponding equivalence relation as IND(P):={(x, y) ∈ U×U| f(x, a)= f(y, a) for any a ∈ P} (1) and denote the equivalence class of IND(P) which contains the object x ∈ U by [x] P, i.e. [x] P :={ y ∈ U | (x, y) ∈ IND(P) }. (2) The factor set of all equivalence classes of IND(P) is denoted by U/P, i.e. U/P :={ [x] P | x ∈ U }.
9
Xi’an Jiaotong University Definition 2. Let (U, A, V, f) be an information system and P A. For a subset X of U, R P (X):={x ∈ U| [x] P X} and R P (X):={x ∈ U| [x] P ∩X ≠ Ф } are called P-lower and P-upper approximations of X, respectively. Definition 3. Let (U, A, V, f) be an information system and let P and Q be two subsets of A. Then POS P (Q):= ∪ X ∈ U/Q R P (X) is called P-positive region of Q, where R P (X) is the P-lower approximation of X.
10
Xi’an Jiaotong University Definition 4. Let S:=(U, C ∪ D, V, f) be a decision system, a ∈ C, and P C. If POS C (Q)= POS C\{a} (Q), a is said to be D-dispensable in C; otherwise, a is said to be D-indispensable in C. The set of all the D- indispensable attributes is called the core of S and denoted by Core(S). Furthermore, if POS P (Q)= POS C (Q), and each of the attributes of P is D- indispensable, then P is called a reduct of S.
11
Xi’an Jiaotong University 3. Two indices for measuring significance of the attributes of a decision system In order to propose a heuristic algorithm of attribute reduction from the viewpoint of relation matrix, we design two indices below to measure significance of the attributes of a decision system based on relation matrix. Before embarking on this issue, we first briefly introduce how to connect positive regions with relation matrices.
12
Xi’an Jiaotong University Definition 5. Let (U, A, V, f) be an information system and let P be a subset of A. The relation matrix of U×U under P, denoted by P(U), is defined as P(U):=(P ij ) n×n where n is the cardinality of U, and P ij =1 if (x i, x j ) ∈ IND(P); otherwise, P ij =0. It can be known from Definition 5 that P(U)= ∏ a ∈ P a (U), (3) where a (U) times b(U) equals c(U) with its elements being c ij =a ij b ij.
13
Xi’an Jiaotong University For a given decision system S:=(U, C ∪ D, V, f), let U:={x 1, x 2, …, x n }, POS C (Q):={x C(1), x C(2),, …, x C(m) }, and P C. We give an index ω(P):= ∑ i ∈ {C(1), …, C(m) } ∑ j ∈ {1,2, …, n } P ij ∣ P ij - D ij ∣ (4) to connect the positive regions POS P (Q) and POS C (Q) with the relation matrices P(U) and D(U). Then the following two conditions hold: ( ⅰ ) If P Q C, then ω(P) ≤ ω(Q). ( ⅱ ) POS P (Q)=POS C (Q) if and only if ω(P)=0.
14
Xi’an Jiaotong University Now, we are ready to define two indices to measure significance of the attributes of a decision system. Definition 6. Let S:=(U, C ∪ D, V, f) be a decision system and let P be a subset of C. The significance of each attribute b of P is defined by SIG (P|b, D) := ω(P\{b}) - ω(P). (5) It can easily be known from Definition 6 that the significance of each b of P, measured by the difference between ω(P\{b}) andω(P), indicates that how much the index ω(P) changes when b is removed from P.
15
Xi’an Jiaotong University Definition 7. Let S:=(U, C ∪ D, V, f) be a decision system and let P be a proper subset of C. The significance of each attribute b of C\P with respect to P is defined by SIG (b|P, D) := ω(P) - ω(P ∪ {b}). (6) It should be noted that SIG (b|P, D) is different from SIG (P|b, D) because the former holds for b ∈ C\P while the latter is defined for b ∈ p. It can be known from Definition 7 that the significance of each b ∈ C\P with respect to P is measured by the magnitude that the index ω(P) changes when b is added into P.
16
Xi’an Jiaotong University 4. A heuristic attribute-reduction algorithm for decision systems According to the above two indices, we propose a heuristic attribute-reduction algorithm below for decision systems. To this end, we first give some properties related to these two indices. Proposition 1. Let S:=(U, C ∪ D, V, f) be a decision system. The following two conditions are satisfied: ( ⅰ ) a ∈ C is D-indispensable in C if and only if SIG (C|a, D) >0. ( ⅱ ) Core (S)={a ∈ C | SIG (C|a, D) >0}.
17
Xi’an Jiaotong University Let S:=(U, C ∪ D, V, f) be a decision system and let P be a proper subset of C. b ∈ C\P is called unimportant with respect to P if SIG (b|P, D) =0. (7) Proposition 2. Let S:=(U, C ∪ D, V, f) be a decision system, P Q C, and b ∈ C\Q. If b is unimportant with respect to P, then b is also unimportant with respect to Q.
18
Xi’an Jiaotong University Proposition 3. Let S:=(U, C ∪ D, V, f) be a decision system and let P be a subset of C. If ω(P)=0, and SIG (P|b, D) >0 for any b ∈ P, then P is a reduct of S. Now, we are ready to present a heuristic attribute- reduction algorithm for decision systems.
19
Xi’an Jiaotong University Step 1: Set Core(S)=Ф, E=Ф. Step 2: Compute SIG (C|a, D) for every a ∈ C; if SIG (C|a, D) >0, then Core(S) is updated by Core(S) ∪ {a}. Step 3: If ω(Core(S) )=0, go to Step 7; otherwise, go to Step 4. Step 4: Set E= Core(S). Step 5: Choose an attribute b from C\E with SIG (b|E, D) =max a ∈ C\E { SIG (a|E, D) }, delete all the unimportant attributes with respect to E from C\E, and set E=E ∪ {a}. Step 6: If ω(E )=0, go to Step 7; otherwise, go back to Step 5. Step 7: If there exists an attribute e ∈ E such that SIG (E| e, D) =0, the go to Step 8; otherwise, go to Step 9. Step 8: E is updated by E\{e}. Step 9: Output E and end the algorithm. Input: A decision system S:=(U, C ∪ D, V, f) Output: A reduct of S
20
Xi’an Jiaotong University For convenience of description, the above algorithm is termed as ACMR for short. Note that in Step 5, deleting unimportant attributes gradually from the search space not only does not affect the effectiveness of the algorithm, but also can improve its efficiency. The time complexity of the algorithm ACMR is O(|A| 2 |U| 2 ), where |A|=|C|+|D|. Proposition 4. The algorithm ACMR is complete. That is, the attribute set output by ACMR is a reduct of the input decision system S with certainty.
21
Xi’an Jiaotong University 5. Numerical experiments In order to access the performance of the algorithm ACMR, we chose from UCI (University of California, Irvine) six databases Iris Plants Database, BUPA Liver Disorders, Balance Scale Weight, Tic-Tac-Toe Endgame, Zoo, and Chess End- Game to do experiments. The operating results of the algorithm ACMR on MATLAB software are reported in Table 1., End- Game
22
Xi’an Jiaotong University Table 1. Experimental results output by the algorithm ACMR Decision systems|U| |C||R|Running Time (second) Iris Plants Database150430.24 BUPA Liver Disorders345630.56 Balance Scale Weight625440.82 Tic-Tac-Toe Endgame9589817.18 Zoo1011650.34 Chess End-Game31963629226.99 *|U| is the cardinality of the set of objects, |C| the cardinality of the set of conditional attributes, and |R| the cardinality of the output set by the algorithm ACMR. It can be known from Table 1 that the running time of each of the chosen databases is quite short.
23
Xi’an Jiaotong University Decision systems|R||M R |Whether or not the output reduct is minimal Iris Plants Database33Yes BUPA Liver Disorders33Yes Balance Scale Weight44Yes Tic-Tac-Toe Endgame88Yes Zoo55Yes Chess End-Game29---Yes Table 2 below is used to check whether or not the reduct output by the algorithm ACMR is minimal. * |M R | is the cardinality of minimal reduct. In order to check whether or not the reduct output by the algorithm ACMR is minimal, the Boolean reasoning-based algorithm in [3] is used to compute minimal reducts of the above six decision systems. The notation “---” means that the result is not obtained by the Boolean reasoning-based algorithm within three days. However, we can still conclude that the output reduct by the algorithm ACMR is minimal because the core of the dataset Chess End-Game is 27.
24
Xi’an Jiaotong University Furthermore, a contrast between the algorithm ACMR and the algorithm in [8] (denoted by algorithm a) in terms of the running time is given below: The reason why the running time of the algorithm ACMR is less than that of the algorithm a is shown as follows: 1) For the algorithm ACMR, in the process of finding minimal reducts, unimportant attributes are gradually deleted from the search space; 2) The algorithm ACMR is designed from the viewpoint of relation matrix and the MATLAB software is of great ability in dealing with matrix computations.
25
Xi’an Jiaotong University
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.