Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006
2 Lecture Outline Rough Sets Major Concepts Running Example Rough Sets : Identifying Significant Attributes in Data Performing pre-processing Concluding Remarks Beyond Pre-processing to Data Mining References/Resources
3 Rough Sets Zdislaw Pawlak, 1982 Extension of traditional set theory Classification and analysis of data tables Handling uncertainty in data Missing data Noisy data Ambiquity in semantics Produce an inexact or rough classification of data
4 Negative Region Upper Approximation Lower Approximation Boundary Region Rough Sets Membership
5 Information System Information System (S) = {U, A, V, f } U - non-empty, finite set of objects called Universe U = {x1, x2, ….., xn} A - finite, non-empty set of attributes. A = C D and C D = . Condition attributes (C) and Decision attributes (D). V - set of domains of all attributes (A) of S ( i.e. Va is the domain of the attribute a ). f : U A, is a function such that f(x, a) Va, for a A and x U.
6 Example: Information Systems Uabcde
7 Equivalence Classes x i, x j U are indiscernible if for a given set of attributes B (i.e. B A ), x i, x j have the same values. a(x i ) = a(x j ) for all a B. Indiscernible objects are elements of an equivalence class [x] B The set U/IND(B) is the set of all equivalence classes in the relation B The equivalence relation U/IND(B) is mathematically defined as : U/ IND(B) = { ( xi, xj ) U : for every a B, a(xi) = a(xj) }
8 Example: Information Systems Uabcde Let B = {a, b, c}. U/IND(B) = {{1,5}, {2,8}, {3}, {4}, {6}, {7}}
9 Approximation Space Central concept for dealing with uncertainty & vagueness Specifies boundaries for classifying objects Lower approximation - objects that can be classified with certainty as elements of X (where X U), according to the attribute set B (B A) Upper approximation - objects that can be classified as possibly being elements of X - can neither be accepted nor rejected with certainty
10 S = {U, A, V, f}, let X U be a set of objects and B A be a set of attributes. Then the lower approximation of X with respect to B is: BX = {x U | [x] B X} The upper approximation of X with respect to B is: X = {x U | [x] B X } Boundary region of X is BN B (X) = BX – BX. Strong member if it is part of the lower approximation Weak member if it is part of the boundary region. Approximation Space
11 Example: Approximation Space Let X = {1, 2, 3, 4, 5} and B = {a, b, c} U/IND(B) = {{1,5}, {2,8}, {3}, {4}, {6}, {7}} Object 1 belongs to the equivalence class {1,5} This class is a subset of X. Therefore object 1 is considered as belonging to the lower approximation. Object 2 belongs to the equivalence class {2,8}, This class is not a subset of X (since 8 does not belong to X). Hence, object 2 is not classified as belonging to the lower approximation. However, object 2 belongs to the upper approximation since the {2,8} X is not empty. The lower and upper approximation for the example is: Lower Approximation = {1, 5, 3, 4} Upper Approximation = {1, 2, 3, 4, 5, 8}
12 Dispensability For an Information System S={U, V, A, f} an attribute a is said to be dispensable or superfluous if, in a given subset of attributes B A, IND(B) = IND( B – {a} ) (Note: a B, IND is the indiscernibility relation).
13 Reduct A reduct of B is a set of attributes B B, such that all attributes a B - B are dispensable and IND(B) = IND(B). A reduct: - contains only non-superfluous attribute - maintains the indiscernibility relation between the original attribute subset and itself (i.e. the reduct). There can be several reducts for a given subset of attributes B. It is relatively simple to compute a single reduct The general solution for finding all reducts is NP-complex.
14 Core The set of elements that are common to all the reducts. Core can be computed in a straightforward manner using a tabular representation concept developed by Skowron [SkR92], known as the Discernibility Matrix. D-core & D-reducts : Core and Reducts relative to the decision attributes
15 Core The set of elements that are common to all the reducts. Core can be computed in a straightforward manner using a tabular representation concept developed by Skowron [SkR92], known as the Discernibility Matrix. D-core & D-reducts : Core and Reducts relative to the decision attributes
16 Positive Region A positive region for an equivalence class is defined with respect to another equivalence class Let C and D be two equivalence classes over a universe U. The C-positive region of D, denoted by POS C (D), is: The set of all the objects of the universe U that can be classified as the lower approximation of D on the basis of the knowledge regarding the lower approximation of C This is expressed as follows: POS C (D) = {CX : X U/IND(D)}
17 Example: Positive Region Let D = {a, b, c} and C = {d, e}. U/IND(C) = {{1}, {2, 7}, {3, 6}, {4}, {5, 8}} Let us name the equivalence classes in U/IND(C) as X1, X2, X3, X4, X5 as follows: X1 = {1}, X2 = {2, 7}, X3 = {3, 6}, X4 = {4}, X5= {5, 8} U/IND(D) = {{1,5}, {2,8}, {3}, {4}, {6}, {7}} Let us name the equivalence classes in U/IND(D) as Y1, Y2, Y3, Y4, Y5, Y6 as follows: Y1 = {1, 5}, Y2 = {2, 8}, Y3 = {3}, Y4 = {4}, Y5= {6}, Y5= {7}
18 Example: Positive Region Let us now compute the POS C (D) as follows. Determine the objects in C that can be classified as being in the lower approximation with respect to D. C X1 = { } C X2 = {7} C X3 = {3, 6} C X4 = {4} C X5 = { } The positive region computed as the union of the lower approximations. POS C (D) = C X1 C X2 C X3 C X4 C X5 = {3, 4, 6, 7}.
19 Degree of Dependency Degree of Dependency (k) between two sets of attributes, C and D (where C, D U) is measured using the concept of positive region as follows: k(C, D) = card (POS C (D) ) / card (U) The value of k(C, D) takes values 0 k 1 The higher the value of k, the greater is the dependency between the two sets of attributes.
20 Example: Degree of Dependency We can compute the degree of dependency between the attributes C = {a, b, c} and D = {d, e} as follows: We know the positive region POS C (D) = {3, 4, 6, 7} k(C, D) = |{3, 4, 6, 7}| / |{1, 2, 3, 4, 5, 6, 7, 8}| = 4 /8 = 0.5
21 Significance of Attributes Significance of an attribute a : SGF(a) = K(C a), D) – K(C,D) Measures extent by which an attribute alters the degree of dependency between C and D If an attribute is “important” in discerning/determining the decision attribute, then its value will be closer to 1.
Back to CSE3212 -Preprocessing CSE5610 Intelligent Software Systems Semester 1, 2006
23 Pre-proccesing A Refresher… Data Reduction Why ? How Aggregation Dimensionality Reduction Numerosity Reduction Discretisation Dimensionality Reduction Feature/Attribute Selection Different Techniques including Rough Sets
24 Dimensionality Reduction Feature selection (i.e., attribute subset selection): –Select a minimum set of attributes such that the probability distribution of different classes given the values for those attributes is as close as possible to the original distribution given the values of all features –Reduction in size and easier to understand. A number of heuristic methods (due to exponential # of choices): –step-wise forward selection –step-wise backward elimination –combining forward selection and backward elimination –decision-tree induction
25 Lets Try & Work This Step-wise forward selection Step-wise backward selection
26 Rough Sets: Bigger Picture Used for Data Mining Several Algorithms for Learning Mostly Classification Deals with real world data Noisy and Missing Values And many more applications …
27 References Sever, H., Raghavan, V, V., and Johnsten, T, D., (1998), “The Status of Research on Rough Sets for Knowledge Discovery in Databases”, Proceedings of the Second International Conference on Nonlinear Problems in Aviation and Aerospace (ICNPAA98), Daytona Beach, Florida, USA, Apr- May, Vol. 2, pp The Status of Research on Rough Sets for Knowledge Discovery in Databases”, Komorowski, J., Pawlak, Z., Polkowski, L., and Skowron, A., (1998), “Rough sets: A Tutorial”, Rough-Fuzzy Hybridization: A New Trend in Decision Making, (eds) S.K.Pal and A.Skowron, Springer Verlag, pp Rough sets: A Tutorial Pawlak, Z., (1992), “Rough sets: Theoretical Aspects of Reasoning about Data”, Kluwer Academic Publishers, London, UK.