Www.monash.edu.au Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006.

Slides:

Advertisements

Similar presentations

(Fuzzy Set Operations)

Advertisements

Protein Secondary Structure Prediction Using BLAST and Relaxed Threshold Rule Induction from Coverings Leong Lee Missouri University of Science and Technology,

Mathematical Preliminaries

3.6 Support Vector Machines

Constraint Satisfaction Problems

© Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems Introduction.

Advanced Piloting Cruise Plot.

Extension Principle Adriano Cruz ©2002 NCE e IM/UFRJ

Introductory Mathematics & Statistics for Business

Chapter 1 The Study of Body Function Image PowerPoint

Analysis of Algorithms

and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $

February 21, 2002 Simplex Method Continued

Thursday, March 7 Duality 2 – The dual problem, in general – illustrating duality with 2-person 0-sum game theory Handouts: Lecture Notes.

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Title Subtitle.

My Alphabet Book abcdefghijklm nopqrstuvwxyz.

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.

FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.

Year 6 mental test 5 second questions

Relational data objects 1 Lecture 6. Relational data objects 2 Answer to last lectures activity.

Learning Objectives for Section 7.2 Sets

Copyright © Cengage Learning. All rights reserved.

1 Machine Learning: Lecture 1 Overview of Machine Learning (Based on Chapter 1 of Mitchell T.., Machine Learning, 1997)

Randomized Algorithms Randomized Algorithms CS648 1.

ABC Technology Project

演算法實驗室演算法實驗室 On the Minimum Node and Edge Searching Spanning Tree Problems Sheng-Lung Peng Department of Computer Science and Information Engineering.

Advance Mathematics Section 3.5 Objectives:

Quadratic Inequalities

1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.

Constant, Linear and Non-Linear Constant, Linear and Non-Linear

Machine Learning: Intro and Supervised Classification

© 2012 National Heart Foundation of Australia. Slide 2.

Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN

Lecture plan Outline of DB design process Entity-relationship model

Chapter 5 Test Review Sections 5-1 through 5-4.

Addition 1’s to 20.

25 seconds left…...

We will resume in: 25 Minutes.

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

Mathematics1 Mathematics 1 Applied Informatics Štefan BEREŽNÝ.

A SMALL TRUTH TO MAKE LIFE 100%

UNIT-2 Data Preprocessing LectureTopic ********************************************** Lecture-13Why preprocess the data? Lecture-14Data cleaning Lecture-15Data.

Equivalence Relations

Distributed Constraint Satisfaction Problems M OHSEN A FSHARCHI.

Constraint Optimization We are interested in the general non-linear programming problem like the following Find x which optimizes f(x) subject to gi(x)

Rough Sets Tutorial.

Artificial Intelligence Knowledge Representation Problem.

Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.

_ Rough Sets. Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core _ Rough Membership.

Huge Raw Data Cleaning Data Condensation Dimensionality Reduction Data Wrapping/ Description Machine Learning Classification Clustering Rule Generation.

Rough Sets Theory Speaker：Kun Hsiang.

August 2005RSFDGrC 2005, Regina, Canada 1 Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han 1, Ricardo Sanchez.

Uncertainty Measure and Reduction in Intuitionistic Fuzzy Covering Approximation Space Feng Tao Mi Ju-Sheng.

On Applications of Rough Sets theory to Knowledge Discovery Frida Coaquira UNIVERSITY OF PUERTO RICO MAYAGÜEZ CAMPUS

CSE & CSE6002E - Soft Computing Winter Semester, 2011 Finish Fuzzy Sets and Logic Begin Rough Sets.

CSE & CSE6002E - Soft Computing Winter Semester, 2011 More Rough Sets.

3. Rough set extensions  In the rough set literature, several extensions have been developed that attempt to handle better the uncertainty present in.

From Rough Set Theory to Evidence Theory Roman Słowiński Laboratory of Intelligent Decision Support Systems Institute of Computing Science Poznań University.

Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.

Rough Sets, Their Extensions and Applications 1.Introduction  Rough set theory offers one of the most distinct and recent approaches for dealing with.

More Rough Sets.

Finish Fuzzy Sets and Logic Begin Rough Sets

Dependencies in Structures of Decision Tables

Rough Sets (Theoretical Aspects of Reasoning about Data)

Presentation transcript:

Rough Sets in Data Mining CSE5610 Intelligent Software Systems Semester 1, 2006

2 Lecture Outline Rough Sets Major Concepts Running Example Rough Sets : Identifying Significant Attributes in Data Performing pre-processing Concluding Remarks Beyond Pre-processing to Data Mining References/Resources

3 Rough Sets Zdislaw Pawlak, 1982 Extension of traditional set theory Classification and analysis of data tables Handling uncertainty in data Missing data Noisy data Ambiquity in semantics Produce an inexact or rough classification of data

4 Negative Region Upper Approximation Lower Approximation Boundary Region Rough Sets Membership

5 Information System Information System (S) = {U, A, V, f } U - non-empty, finite set of objects called Universe U = {x1, x2, ….., xn} A - finite, non-empty set of attributes. A = C  D and C  D = . Condition attributes (C) and Decision attributes (D). V - set of domains of all attributes (A) of S ( i.e. Va is the domain of the attribute a ). f : U  A, is a function such that f(x, a)  Va, for a  A and x  U.

6 Example: Information Systems Uabcde

7 Equivalence Classes x i, x j  U are indiscernible if for a given set of attributes B (i.e. B A ), x i, x j have the same values. a(x i ) = a(x j ) for all a  B. Indiscernible objects are elements of an equivalence class [x] B The set U/IND(B) is the set of all equivalence classes in the relation B The equivalence relation U/IND(B) is mathematically defined as : U/ IND(B) = { ( xi, xj )  U : for every a  B, a(xi) = a(xj) }

8 Example: Information Systems Uabcde Let B = {a, b, c}. U/IND(B) = {{1,5}, {2,8}, {3}, {4}, {6}, {7}}

9 Approximation Space Central concept for dealing with uncertainty & vagueness Specifies boundaries for classifying objects Lower approximation - objects that can be classified with certainty as elements of X (where X  U), according to the attribute set B (B A) Upper approximation - objects that can be classified as possibly being elements of X - can neither be accepted nor rejected with certainty 

10 S = {U, A, V, f}, let X U be a set of objects and B A be a set of attributes. Then the lower approximation of X with respect to B is: BX = {x U | [x] B X} The upper approximation of X with respect to B is: X = {x U | [x] B  X   } Boundary region of X is BN B (X) =  BX – BX. Strong member if it is part of the lower approximation Weak member if it is part of the boundary region. Approximation Space

11 Example: Approximation Space Let X = {1, 2, 3, 4, 5} and B = {a, b, c} U/IND(B) = {{1,5}, {2,8}, {3}, {4}, {6}, {7}} Object 1 belongs to the equivalence class {1,5} This class is a subset of X. Therefore object 1 is considered as belonging to the lower approximation. Object 2 belongs to the equivalence class {2,8}, This class is not a subset of X (since 8 does not belong to X). Hence, object 2 is not classified as belonging to the lower approximation. However, object 2 belongs to the upper approximation since the {2,8}  X is not empty. The lower and upper approximation for the example is: Lower Approximation = {1, 5, 3, 4} Upper Approximation = {1, 2, 3, 4, 5, 8}

12 Dispensability For an Information System S={U, V, A, f} an attribute a is said to be dispensable or superfluous if, in a given subset of attributes B  A, IND(B) = IND( B – {a} ) (Note: a  B, IND is the indiscernibility relation).

13 Reduct A reduct of B is a set of attributes B  B, such that all attributes a  B - B are dispensable and IND(B) = IND(B). A reduct: - contains only non-superfluous attribute - maintains the indiscernibility relation between the original attribute subset and itself (i.e. the reduct). There can be several reducts for a given subset of attributes B. It is relatively simple to compute a single reduct The general solution for finding all reducts is NP-complex.

14 Core The set of elements that are common to all the reducts. Core can be computed in a straightforward manner using a tabular representation concept developed by Skowron [SkR92], known as the Discernibility Matrix. D-core & D-reducts : Core and Reducts relative to the decision attributes

15 Core The set of elements that are common to all the reducts. Core can be computed in a straightforward manner using a tabular representation concept developed by Skowron [SkR92], known as the Discernibility Matrix. D-core & D-reducts : Core and Reducts relative to the decision attributes

16 Positive Region A positive region for an equivalence class is defined with respect to another equivalence class Let C and D be two equivalence classes over a universe U. The C-positive region of D, denoted by POS C (D), is: The set of all the objects of the universe U that can be classified as the lower approximation of D on the basis of the knowledge regarding the lower approximation of C This is expressed as follows: POS C (D) =  {CX : X  U/IND(D)}

17 Example: Positive Region Let D = {a, b, c} and C = {d, e}. U/IND(C) = {{1}, {2, 7}, {3, 6}, {4}, {5, 8}} Let us name the equivalence classes in U/IND(C) as X1, X2, X3, X4, X5 as follows: X1 = {1}, X2 = {2, 7}, X3 = {3, 6}, X4 = {4}, X5= {5, 8} U/IND(D) = {{1,5}, {2,8}, {3}, {4}, {6}, {7}} Let us name the equivalence classes in U/IND(D) as Y1, Y2, Y3, Y4, Y5, Y6 as follows: Y1 = {1, 5}, Y2 = {2, 8}, Y3 = {3}, Y4 = {4}, Y5= {6}, Y5= {7}

18 Example: Positive Region Let us now compute the POS C (D) as follows. Determine the objects in C that can be classified as being in the lower approximation with respect to D. C X1 = { } C X2 = {7} C X3 = {3, 6} C X4 = {4} C X5 = { } The positive region computed as the union of the lower approximations. POS C (D) = C X1  C X2  C X3  C X4  C X5 = {3, 4, 6, 7}.

19 Degree of Dependency Degree of Dependency (k) between two sets of attributes, C and D (where C, D  U) is measured using the concept of positive region as follows: k(C, D) = card (POS C (D) ) / card (U) The value of k(C, D) takes values 0  k  1 The higher the value of k, the greater is the dependency between the two sets of attributes.

20 Example: Degree of Dependency We can compute the degree of dependency between the attributes C = {a, b, c} and D = {d, e} as follows: We know the positive region POS C (D) = {3, 4, 6, 7} k(C, D) = |{3, 4, 6, 7}| / |{1, 2, 3, 4, 5, 6, 7, 8}| = 4 /8 = 0.5

21 Significance of Attributes Significance of an attribute a : SGF(a) = K(C  a), D) – K(C,D) Measures extent by which an attribute alters the degree of dependency between C and D If an attribute is “important” in discerning/determining the decision attribute, then its value will be closer to 1.

Back to CSE3212 -Preprocessing CSE5610 Intelligent Software Systems Semester 1, 2006

23 Pre-proccesing A Refresher… Data Reduction Why ? How Aggregation Dimensionality Reduction Numerosity Reduction Discretisation Dimensionality Reduction Feature/Attribute Selection Different Techniques including Rough Sets

24 Dimensionality Reduction Feature selection (i.e., attribute subset selection): –Select a minimum set of attributes such that the probability distribution of different classes given the values for those attributes is as close as possible to the original distribution given the values of all features –Reduction in size and easier to understand. A number of heuristic methods (due to exponential # of choices): –step-wise forward selection –step-wise backward elimination –combining forward selection and backward elimination –decision-tree induction

25 Lets Try & Work This Step-wise forward selection Step-wise backward selection

26 Rough Sets: Bigger Picture Used for Data Mining Several Algorithms for Learning Mostly Classification Deals with real world data Noisy and Missing Values And many more applications …

27 References Sever, H., Raghavan, V, V., and Johnsten, T, D., (1998), “The Status of Research on Rough Sets for Knowledge Discovery in Databases”, Proceedings of the Second International Conference on Nonlinear Problems in Aviation and Aerospace (ICNPAA98), Daytona Beach, Florida, USA, Apr- May, Vol. 2, pp The Status of Research on Rough Sets for Knowledge Discovery in Databases”, Komorowski, J., Pawlak, Z., Polkowski, L., and Skowron, A., (1998), “Rough sets: A Tutorial”, Rough-Fuzzy Hybridization: A New Trend in Decision Making, (eds) S.K.Pal and A.Skowron, Springer Verlag, pp Rough sets: A Tutorial Pawlak, Z., (1992), “Rough sets: Theoretical Aspects of Reasoning about Data”, Kluwer Academic Publishers, London, UK.