Rough Set Strategies to Data with Missing Attribute Values Jerzy W. Grzymala-Busse Department of Electrical Engineering and Computer Science University.

Slides:



Advertisements
Similar presentations
Partial Orderings Section 8.6.
Advertisements

Protein Secondary Structure Prediction Using BLAST and Relaxed Threshold Rule Induction from Coverings Leong Lee Missouri University of Science and Technology,
Math 3121 Abstract Algebra I
Relations Relations on a Set. Properties of Relations.
_ Rough Sets. Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core _ Rough Membership.
Cryptography and Network Security
More Set Definitions and Proofs 1.6, 1.7. Ordered n-tuple The ordered n-tuple (a1,a2,…an) is the ordered collection that has a1 as its first element,
Copyright © Cengage Learning. All rights reserved.
Discrete Mathematics Lecture 5 Alexander Bukharovich New York University.
Instructor: Hayk Melikya
1 Chapter Equivalence, Order, and Inductive Proof.
Representing Relations Using Matrices
Applied Discrete Mathematics Week 11: Graphs
Fuzzy Sets and Fuzzy Logic Theory and Applications
Math 3121 Abstract Algebra I
_ Rough Sets. Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core.
Lecture 21 Rule discovery strategies LERS & ERID.
Discrete Mathematics Lecture 4 Harper Langston New York University.
Discrete Structures Chapter 3 Set Theory Nurul Amelina Nasharuddin Multimedia Department.
August 2005RSFDGrC 2005, Regina, Canada 1 Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han 1, Ricardo Sanchez.
Regular Expression (EXTRA)
Cardinals Georg Cantor ( ) thought of a cardinal as a special represenative. Bertrand Russell ( ) and Gottlob Frege ( ) used the.
Normal forms for Context-Free Grammars
1 Foundations of Interval Computation Trong Wu Phone: Department of Computer Science Southern Illinois University Edwardsville.
Lecture Slides Elementary Statistics Twelfth Edition
1 Partial Orderings Based on Slides by Chuck Allison from Rosen, Chapter 8.6 Modified by.
Partially Ordered Sets (POSets)
Relations Chapter 9.
Set theory Sets: Powerful tool in computer science to solve real world problems. A set is a collection of distinct objects called elements. Traditionally,
Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
On Applications of Rough Sets theory to Knowledge Discovery Frida Coaquira UNIVERSITY OF PUERTO RICO MAYAGÜEZ CAMPUS
Math 3121 Abstract Algebra I Section 0: Sets. The axiomatic approach to Mathematics The notion of definition - from the text: "It is impossible to define.
CSE & CSE6002E - Soft Computing Winter Semester, 2011 More Rough Sets.
Basic Concepts of Discrete Probability (Theory of Sets: Continuation) 1.
Discrete Structures – CNS2300
Set, Combinatorics, Probability & Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS Slides Set,
3. Rough set extensions  In the rough set literature, several extensions have been developed that attempt to handle better the uncertainty present in.
Decidable Questions About Regular languages 1)Membership problem: “Given a specification of known type and a string w, is w in the language specified?”
Relations, Functions, and Matrices Mathematical Structures for Computer Science Chapter 4 Copyright © 2006 W.H. Freeman & Co.MSCS Slides Relations, Functions.
April 14, 2015Applied Discrete Mathematics Week 10: Equivalence Relations 1 Properties of Relations Definition: A relation R on a set A is called transitive.
Lesson Reasoning with Postulates and Algebra Properties.
CS201: Data Structures and Discrete Mathematics I
Fall 2015 COMP 2300 Discrete Structures for Computation Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1.
Relations and their Properties
Week 6 - Friday.  What did we talk about last time?  Solving recurrence relations.
A Logic of Partially Satisfied Constraints Nic Wilson Cork Constraint Computation Centre Computer Science, UCC.
Chapter SETS DEFINITION OF SET METHODS FOR SPECIFYING SET SUBSETS VENN DIAGRAM SET IDENTITIES SET OPERATIONS.
ICS 253: Discrete Structures I Induction and Recursion King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Fall 2008/2009 I. Arwa Linjawi & I. Asma’a Ashenkity 11 Basic Structure : Sets, Functions, Sequences, and Sums Sets Operations.
Sets and Subsets Set A set is a collection of well-defined objects (elements/members). The elements of the set are said to belong to (or be contained in)
Chapter 2 With Question/Answer Animations. Section 2.1.
Problem Statement How do we represent relationship between two related elements ?
Discrete Mathematics CS 2610 January 27, part 2.
Chapter 8: Relations. 8.1 Relations and Their Properties Binary relations: Let A and B be any two sets. A binary relation R from A to B, written R : A.
1 Section 1.2 Sets A set is a collection of things. If S is a set and x is a member or element of S we write x  S. Othewise we write x  S. The set with.
Copyright © Cengage Learning. All rights reserved. CHAPTER 8 RELATIONS.
Set Theory Concepts Set – A collection of “elements” (objects, members) denoted by upper case letters A, B, etc. elements are lower case brackets are used.
SECTION 2 BINARY OPERATIONS Definition: A binary operation  on a set S is a function mapping S X S into S. For each (a, b)  S X S, we will denote the.
Relations and Functions ORDERED PAIRS AND CARTESIAN PRODUCT An ordered pair consists of two elements, say a and b, in which one of them, say a is designated.
Notions & Notations (2) - 1ICOM 4075 (Spring 2010) UPRM Department of Electrical and Computer Engineering University of Puerto Rico at Mayagüez Spring.
1 Partial Orderings Based on Slides by Chuck Allison from Rosen, Chapter 8.6 Modified by.
Chapter 2 1. Chapter Summary Sets (This Slide) The Language of Sets - Sec 2.1 – Lecture 8 Set Operations and Set Identities - Sec 2.2 – Lecture 9 Functions.
“It is impossible to define every concept.” For example a “set” can not be defined. But Here are a list of things we shall simply assume about sets. A.
Chapter 5 Relations and Operations
CHAPTER 3 SETS, BOOLEAN ALGEBRA & LOGIC CIRCUITS
Set, Combinatorics, Probability & Number Theory
Chapter 2 Sets and Functions.
Rough Sets.
The aim of education is to teach students how to think rather than what to think. Sets The set is the fundamental discrete structure on which all other.
MCS680: Foundations Of Computer Science
Presentation transcript:

Rough Set Strategies to Data with Missing Attribute Values Jerzy W. Grzymala-Busse Department of Electrical Engineering and Computer Science University of Kansas, Lawrence, KS 66045, USA and Institute of Computer Science Polish Academy of Sciences, Warsaw, Poland

There are two main reasons why an attribute value is missing: either the value was lost (e.g., was erased) or the value was not important (such values are also called "do not care" conditions). The first rough set approach to missing attribute values, when all missing values were lost, was described in 1997 where two algorithms for rule induction, LEM1 and LEM2, modified to deal with such missing attribute values, were presented. The second rough set approach to missing attribute values, in which the missing attribute value is interpreted as a "do not care" condition, was used for the first time in A method for rule induction was introduced in which each missing attribute value was replaced by all possible values.

In this paper a more general rough set approach to missing attribute values is presented: in the same decision table, some missing attribute values are assumed to be lost and some are "do not care" conditions. The characteristic relation for a completely specified decision table is reduced to the ordinary indiscernibility relation. The set of all characteristic relations, defined by all possible decision tables with missing attribute values being one of the two types, together with two defined operations on relations, forms a lattice. Furthermore, three different definitions of lower and upper approximations are introduced.

Table 1. An example of a completely specified decision table Attributes Decision LocationBasementFireplaceValue 1goodyes high 2badno small 3goodno medium 4badyesnomedium 5goodnoyesmedium Obviously, any decision table defines a function  that maps the set of ordered pairs (case, attribute) into the set of all values. For example,  (1, Location) = good. Rough set theory is based on the idea of an indiscernibility relation.

Let B be a nonempty subset of the set A of all attributes. The indiscernibility relation IND(B) is a relation on U defined for x, y  U as follows (x, y)  IND(B) if and only if  (x, a) =  (y, a) for all a  B. For completely specified decision tables the indiscernibility relation IND(B) is an equivalence relation. Equivalence classes of IND(B) are called elementary sets of B. For example, for Table 1, elementary sets of IND({Location, Basement}) are {1}, {2}, {3, 5} and {4}. Function  describing Table 1 is completely specified (total).

We will assume that all decision values are specified, i.e., are not missing. Also, we will assume that all missing attribute values are denoted either by "?" or by "*", lost values will be denoted by "?", "do not care" conditions will be denoted by "*". Additionally, we will assume that for each case at least one attribute value is specified. Incompletely specified tables are described by characteristic relations instead of indiscernibility relations.

Table 2. An example of an incompletely specified decision table, in which all missing attribute values are lost Attributes Decision LocationBasementFireplaceValue 1goodyes high 2bad?nosmall 3goodno?medium 4badyesnomedium 5??yesmedium For decision tables, in which all missing attribute values are lost, a special characteristic relation was defined by J. Stefanowski and A. Tsoukias. In this paper that characteristic relation will be denoted by LV(B), where B is a nonempty subset of the set A of all attributes.

For x, y  U characteristic relation LV(B) is defined as follows: (x, y)  LV(B) if and only if  (x, a) =  (y, a) for all a  B such that  (x, a) ­ ?. For any case x, the characteristic relation LV(B) may be presented by the characteristic set I B (x), where I B (x) = {y | (x, y)  LV(B)}. For any decision table in which all missing attribute values are lost, characteristic relation LV(B) is reflexive, but—in general—does not need to be symmetric or transitive.

Table 3. An example of an incompletely specified decision table, in which all missing attribute values are "do not care" conditions Attributes Decision LocationBasementFireplaceValue 1goodyes high 2bad*nosmall 3goodno*medium 4badyesnomedium 5**yesmedium For decision tables where all missing attribute values are "do not care" conditions a special characteristic relation, in this paper denoted by DCC(B), was defined by M. Kryszkiewicz. For x, y  U characteristic relation LV(B) is defined as follows: (x, y)  DCC(B) if and only if  (x, a) =  (y, a) or  (x, a) = * or  (y, a) = * for all a  B. Similarly, for a case x, the characteristic relation DCC(B) may be presented by the characteristic set J B (x), where J B (x) = {y | (x, y)  DCC(B)}. Relation DCC(B) is reflexive and symmetric but—in general—not transitive.

Table 4. An example of an incompletely specified decision table, in which some missing attribute values are lost and some are "do not care" conditions Attributes Decision LocationBasementFireplaceValue 1goodyes high 2bad?nosmall 3goodno?medium 4badyesnomedium 5**yesmedium A characteristic relation R(B) on U for an incompletely specified decision table with both types of missing attribute values: lost values and "do not care" conditions: (x, y)  R(B) if and only if  (x, a) =  (y, a) or  (x, a) = * or  (y, a) = * for all a  B such that  (x, a) = ?, where x, y  U and B is a nonempty subset of the set A of all attributes. For a case x, the characteristic relation R(B) may be also presented by its characteristic set KB(x), where K B (x) = {y | (x, y)  R(B)}. Characteristic relations LV(B) and DCC(B) are special cases of the characteristic relation R(B). For a completely specified decision table, the characteristic relation R(B) is reduced to IND(B). The characteristic relation R(B) is reflexive but—in general—does not need to be symmetric or transitive.

Computing characteristic relations The characteristic relation R(B) is known if we know characteristic sets K(x) for all x  U. For completely specified decision tables if t = (a, v) is an attribute-value pair a block of t, denoted [t], is a set of all cases from U that for attribute a have value v. If an attribute a there exists a case x such that  (x, a) = ?, then the case x is not included in the block [(a, v)] for any value v of attribute a. If for an attribute a there exists a case x such that  (x, a) = *, then the corresponding case x should be included in blocks [(a, v)] for all values v of attribute a. The characteristic set K B (x) is the intersection of blocks of attribute-value pairs (a, v) for all attributes a from B for which  (x, a) is specified and  (x, a) = v.

Lattice of characteristic relations In this section all characteristic relations will be defined for the entire set A of attributes instead of its subset B and we will write R instead of R(A). In characteristic sets K A (x), the subscript A will be omitted. Two decision tables with the same set U of all cases, the same attribute set A, the same decision d, and the same specified attribute values will be called congruent. Two congruent decision tables may differ only by missing attribute values * and ?. Decision tables from Tables 2, 3, and 4 are all pairwise congruent. Two congruent decision tables that have the same characteristic relations will be called indistinguishable.

Table 5. Decision table indistinguishable from decision table presented in Table 6 Attributes Decision LocationBasementFireplaceValue 1goodyes high 2bad*nosmall 3goodno*medium 4badyesnomedium 5?*yesmedium Table 6. Decision table indistinguishable from decision table presented in Table 5 Attributes Decision LocationBasementFireplaceValue 1goodyes high 2bad*nosmall 3goodno*medium 4badyesnomedium 5*?yesmedium

On the other hand, if the characteristic relations for two congruent decision tables are different, the decision tables will be called distinguishable. Obviously, there is 2n congruent decision tables, where n is the total number of all missing attribute values in a decision table. Let D 1 and D 2 be two congruent decision tables, let R 1 and R 2 be their characteristic relations, and let K 1 (x) and K 2 (x) be their characteristic sets for some x  U, respectively. We say that R 1  R 2 if and only if K 1 (x)  K 2 (x) for all x  U. For two congruent decision tables D 1 and D 2, D 1  D 2 if for every missing attribute value"?" in D 2, say  2 (x, a), the missing attribute value for D 1 is also "?", i.e.,  1 (x, a), where  1 and  2 are functions defined by D 1 and D 2, respectively.

Two subsets of the set of all congruent decision tables are special: set E of n decision tables such that every decision table from E has exactly one missing attribute value "?" and all remaining attribute values equal to "*" and the set F of n decision tables such that every decision table from E has exactly one missing attribute value "*" and all remaining attribute values equal to "?". In our example, decision tables presented in Tables 5 and 6 belong to the set E. Let G be the set of all characteristic relations associated with the set E and let H be the set of all characteristic relations associated with the set F.

Let D and D' be two congruent decision tables with characteristic relations R and R', and with characteristic sets K(x) and K'(x), respectively, where x  U. We define a characteristic relation R + R' as defined by characteristic sets K(x)  K'(x), for x  U, and a characteristic relation R  R' as defined by characteristic sets K(x)  K'(x). The set of all characteristic relations for the set of all congruent tables, together with operations + and , is a lattice L (i.e., operations + and  satisfy the four postulates of idempotent, commutativity, associativity, and absorption laws). Each characteristic relation from L can be represented (using the lattice operations + and  ) in terms of characteristic relations from G (and, similarly for H). Thus G and H are sets of generators of L.

The diagram of the lattice of all characteristic relations

Lower and upper approximations For completely specified decision tables lower and upper approximations are defined on the basis of the indiscernibility relation. An equivalence class of IND(B) containing x is denoted by [x]B. Any finite union of elementary sets of B is called a B-definable set. Let U be the set of all cases, called an universe. Let X be any subset of U. The set X is called concept and is usually defined as the set of all cases defined by specific value of the decision. In general, X is not a B-definable set.

However, set X may be approximated by two B-definable sets, the first one is called a B-lower approximation of X and defined as follows {x  U | [x] B  X }. The second set is called an B-upper approximation of X and defined as follows {x  U | [x] B  X   }. The B-lower approximation of X is the greatest B-definable set, contained in X. The B-upper approximation of X is the least B-definable set containing X. For incompletely specified decision tables lower and upper approximations may be defined in a few different ways.

Let X be a concept, let B be a subset of the set A of all attributes, and let R(B) be the characteristic relation of the incompletely specified decision table with characteristic sets K(x), where x  U. Our first definition uses a similar idea as in the previous articles on incompletely specified decision tables, i.e., lower and upper approximations are sets of singletons from the universe U satisfying some properties. We will call these definitions singleton. A singleton B-lower approximation of X is defined as follows: {x  U | K B (x)  X }. A singleton B-upper approximation of X is {x  U | [x] B  X   }.

The second definition uses another idea: lower and upper approximations are unions of characteristic sets, subsets of U. We will call these definitions subset. A subset B-lower approximation of X is defined as follows:  {K B (x) | x  U, K B (x)  X }. A subset B-upper approximation of X is  {K B (x) | x  U, K B (x)  X   }.

The next possibility is to modify the subset definition of upper approximation by replacing the universe U from the previous definition by a concept X. A concept B-lower approximation of the concept X is defined as follows:  {K B (x) | x  X, K B (x)  X }. Obviously, the subset B-lower approximation of X is the same set as the concept B-lower approximation of X. A concept B-upper approximation of the concept X is defined as follows:  {K B (x) | x  X, K B (x)  X   }.

Some properties that hold for singleton lower and upper approximations do not hold—in general—for subset lower and upper approximations and for concept lower and upper approximations. For example, for singleton lower and upper approximations {x  U | I B (x)  X } {x  U | J B (x)  X } and {x  U | I B (x)  X ­ ¯ }  {x  U | J B (x)  X ­ ¯ }, where I B (x) is a characteristic set of LV(B) and J B (X) is a characteristic set of DCC(B).

In our example, for the subset definition of A-upper approximation, X = {3, 4, 5}, and the characteristic relation LV(A) (see Table 2)  {I B (x) | I B (x)  X } = {3, 4} while for the subset definition of A-upper approximation, X = {3, 4, 5}, and the characteristic relation DCC(A) (see Table 3)  {J B (x) | J B (x)  X } = {3, 5}, so neither the former set is a subset of the latter nor vice versa

Rule induction For example, for Table 2, i.e., for the characteristic relation LV(A), the certain rules, induced from the concept lower A-approximations are (Location, good) & (Basement, yes) -> (Value, high), (Basement, no) -> (Value, medium), (Location, bad) & (Basement, yes) -> (value, medium). The possible rules, induced from the concept upper A- approximations, for the same characteristic relation LV(A) are (Location, good) & (Basement, yes) -> (Value, high), (Location, bad) -> (Value, small), (Location, good) -> (Value, medium), (Basement, yes) -> (Value, medium), (Fireplace, yes) -> (Value, medium).

For the attribute Basement from our example, we may introduce a special, new value, say maybe, for case 2 and we may consider that the missing attribute value for case 5 should be no. Neither of these two cases falls into the category of lost values or "do not care" conditions. More specifically, for attribute Basement, new blocks will be [(Basement, maybe)] = {2}, [(Basement, yes)] = {1, 3}, and [(Basement, no)} = {3, 5}.

Conclusions The existing two approaches to missing attribute values, interpreted as a lost value or as a "do not care" condition are generalized by interpreting every missing attribute value separately as a lost value or as a "do not care" condition. Characteristic relations are introduced to describe incompletely specified decision tables. Lower and upper approximations for incompletely specified decision tables may be defined in a variety of different ways.