Presentation is loading. Please wait.

Presentation is loading. Please wait.

Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007.

Similar presentations


Presentation on theme: "Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007."— Presentation transcript:

1 Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007

2 Introduction  Comparison of two theories for rules induction.  Different methodologies  Same results?

3  Set of objects described by attributes.  Each object belongs to a class.  We want decision rules. Generalities

4  There are two approaches:  Rough Sets Theory (RST)  Logical Analysis of Data (LAD)  Goal : compare them Approaches

5 Contents 1.Rough Sets Theory 2.Logical Analysis Of data 3.Comparison 4.Inconsistencies

6  Two examples having the exact same values in all attributes, but belonging to two different classes.  Example: two sick people have the same symptomas but different disease. Inconsistencies

7  RST doesn’t correct or aggregate inconsistencies.  For each class : determination of lower and upper approximations. Covered by RST

8  Lower : objects we are sure they belong to the class.  Upper : objects than can belong to the class. Approximations

9  Lower approximation → certain rules  Upper approximation → possible rules Impact on rules

10  Rules induction on numerical data → poor rules → too many rules.  Need of pretreatment. Pretreatment

11  Goal : convert numerical data into discrete data.  Principle : determination of cut points in order to divide domains into successive intervals. Discretization

12  First algorithm: LEM2  Improved algorithms:  Include the pretreatment  MLEM2, MODLEM, … Algorithms

13  Induction of certain rules from the lower approximation.  Induction of possible rules from the upper approximation.  Same procedure LEM2

14  For an attribute x and its value v, a block [(x,v)] of attribute-value pair (x,v) is all the cases where the attribute x has the value v.  Ex : [(Age,21)]=[Martha] [(Age,22)]=[David ; Audrey] Definitions (1)

15  Let B be a non-empty lower or upper approximation of a concept represented by a decision-value pair (d,w).  Ex : (level,middle)→B=[obj1 ; obj5 ; obj7] Definitions (2)

16  Let T be a set of pairs attribute-value (a,v).  Set B depends on set T if and only if: Definitions (3)

17  A set T is minimal complex of B if and only if B depends on T and there is no subset T’ of T such as B depends on T’. Definitions (4)

18  Let T be a non-empty collection of non- empty set of attribute-value pairs.  T is a set of T.  T is a set of (a,v). Definitions (5)

19  T is a local cover of B if and only if:  Each member T of T is a minimal complex of B.   T is minimal Definitions (6)

20  LEM2’s output is a local cover for each approximation of the decision table concept.  It then convert them into decision rules. Algorithmprinciple

21 Algorithm

22 Among the possible blocks, we choose the one:  With the highest priority  With the highest intersection  With the smallest cardinal Heuristics details

23  As long as it is not a minimal complex, pairs are added.  As long as there is not a local cover, minimal complexes are added. Heuristics details

24  Illustration through an example.  We consider that the pretreatment has already been done. Illustration

25 Data set AttributesDécision CaseHeight (cm)HairAttraction 1160Blond- 2170Blond+ 3160Red+ 4180Black- 5160Black- 6170Black-

26  For the attribute Height, we have the values 160, 170 and 180.  The pretreatment gives us two cut points: 165 and 175. Cut points

27  [(Height, 160..165)]={1,3,5}  [(Height, 165..180)]={2,4}  [(Height, 160..175)]={1,2,3,5}  [(Height, 175..180)]={4}  [(Hair, Blond)]={1,2}  [(Hair, Red)]={3}  [(Hair, Black)]={4,5,6} Blocks [(a,v)]

28  G = B = [(Attraction,-)] = {1,4,5,6}  Here there is no inconsistencies. If there were some, it’s at this point that we have to chose between the lower and the upper approximation. First concept

29  Pair (a,v) such as [(a,v)]∩[(Attraction,-)]≠Ø  (Height,160..165)  (Height,165..180)  (Height,160..175)  (Height,175..180)  (Hair,Blond)  (Hair,Black) Eligible pairs

30  We chose the most appropriate, which is to say (a,v) for which | [(a,v)] ∩ [(Attraction,-)] | is the highest.  Here : (Hair, Black) Choice of a pair

31  The pair (Hair, Black) is a minimal complex because: Minimal complex

32  B = [(Attraction,-)] – [(Hair,Black)] = {1,4,5,6} - {4,5,6} = {1} New concept

33  Through the pairs (Height,160..165), (Height,160..175) and (Hair, Blond).  Intersections having the same cardinality, we chose the pair having the smallest cardinal: (Hair, Blond) Choice of a pair (1)

34  Problem :  (Hair, Blond) is non a minimal complex.  We chose the following pair: (Height,160..165). Choice of a pair (2)

35  {(Hair, Blond),(Height,160..165)} is a second minimal complex. Minimal Complex

36  {{(Hair, Black)}, {(Hair, Blond), (Height, 160..165)}} is a local cover of [(Attraction,-)]. End of the concept

37  (Hair, Red) → (Attraction,+)  (Hair, Blond) & (Height,165..180 ) → (Attraction,+)  (Hair, Black) → (Attraction,-)  (Hair, Blond) & (Height,160..165 ) → (Attraction,-) Rules

38 Contents 1.Rough Sets Theory 2.Logical Analysis Of data 3.Comparison 4.Inconsistencies

39  Work on binary data.  Extension of boolean approach on non- binary case. Principle

40  Let S be the set of all observations.  Each observation is described by n attributes.  Each observation belongs to a class. Definitions (1)

41  The classification can be considered as a partition into two sets  An archiveis represented by a boolean function Φ : Definitions (2)

42  A literal is a boolean variable or its negation:  A term is a conjunction of literals :  The degree of a term is the number of literals. Definitions (3)

43  A term T covers a point if T(p)=1.  A characteristic term of a point p is the unique term of degree n covering p.  Ex : Definitions (4)

44  A term T is an implicant of a boolean function f if T(p) ≤ f(p) for all  An implicant is called prime if it is minimal (its degree). Definitions (5)

45  A positive prime pattern is a term covering at least one positive example and no negative example.  A negative prime pattern is a term covering at least one negative example and no positive example. Definitions (6)

46 Example 110 010 101 100 001 000

47  is a positive pattern :  There is no negative example such as  There is one positive example : the 3rd line.  It's a positive prime pattern :  covers one negative example : 4th line.  covers one negative example : 5th line. Example

48  symmetry between positive and negative patterns.  Two approaches :  Top-down  Bottom-up Pattern generation

49  we associate each positive example to its characteristic term→ it’s a pattern.  we take out the literals one by one until having a prime pattern. Top-down

50  we begin with terms of degree one:  if it does not cover a negative example, it is a pattern  If not, we add literals until having a pattern. Bottom-up

51  We prefer short pattern → simplicity principle.  we also want to cover the maximum of examples with only one model → globality principle.  hybrid approach bottom-up – top-down. Objectives

52 Hybrid approach  We fix a degree D.  We start by a bottom-up approach to generate the models of degree lower or equal to D.  For all the points which are not covered by the 1 st phase, we proceed to the top-down approach.

53  Extension from binary case : binerization.  Two types of data :  quantitative : age, height, …  qualitative : color, shape, … Extension to the non binary case

54  For each value v that a qualitative attribute x can be, we associate a boolean variable b(x,v) :  b(x,v) = 1 if x = v  b(x,v) = 0 otherwise Qualitative data

55  there are two types of associated variables:  Level variables  Interval variables Quantitative data

56  For each attribute x and each cut point t, we introduce a boolean variable b(x,t) :  b(x,t) = 1 if x ≥ t  b(x,t) = 0 if x < t Level variables

57  For each attribute x and each pair of cut points t’, t’’ (t’<t’’), we introduce a boolean variable b(x,t’,t’’) :  b(x,t’,t’’) = 1 if t’ ≤ x < t’’  b(x,t’,t’’) = 0 otherwise Intervals variables

58 Example 1greenyes31 4blueno29 2blueyes20 4redno22 3redyes20 2greenno14 4greenno7

59 Example 1 4 2 4 3 2 4 a000 b111 c100 d111 e110 f100 g111

60 Example green blue red green a100 b010 c010 d001 e001 f100 g100

61 Example yes no yes no yes no a1 b0 c1 d0 e1 f0 g0

62 Example 31 29 20 22 20 14 17 a11 b11 c10 d11 e10 f00 g00

63 Example 1 4 2 4 3 2 4 a000 b000 c110 d000 e011 f110 g000

64 Example 31 29 20 22 20 14 17 a0 b0 c1 d0 e1 f0 g0

65 Example a0001001110000 b1110100110000 c1000101101101 d1110010110000 e1100011100111 f1001000001100 g1111000000000

66  A set of binary attributes is called supporting set if the archive obtained by the elimination of all the other attributes will remained "contradiction-free".  A supporting set is irredundant if there is no subset of it which is a supporting set. Supporting set

67  We associate to the attributea variable such as if the attribute belongs to the supporting set.  Application : elements a and e are different on attributes 1, 2, 4, 6, 9, 11, 12 and 13 : Variables

68  We do the same for all pairs of true and false observations :  Exponential number of solutions : we choose the smallest set : Linear program

69  Positive patterns :   Negative patterns :  Solution of our example

70 Contents 1.Rough Sets Theory 2.Logical Analysis Of data 3.Comparison 4.Inconsistencies

71  LAD more flexible than RST  Linear program -> modification of parameters Basic idea

72  RST : couples (attribute, value)  LAD : binary variables  Correspondence? Comparison blocks / variables

73  For an attribute a taking the values : Qualitative data RSTLAD

74  Discretization : convert numerical data into discrete data.  Principle : determination of cut points in order to divide domains into successive intervals : Quantitative data

75  RST : for each cut point, we have two blocks :  Quantitative data

76  LAD : for each cut point, we have a level variable :  ... Quantitative data

77  LAD : for each pair of cut points, we have a interval variable :  ... Quantitative data

78  Correspondence :  Level variable :  Quantitative data

79  Correspondence :  Interval variable : 

80  Three parameters can change :  Right hand side of constraints:  coefficients of the objective function:  coefficients of the left hand side of the constraints: Variation of LP parameters

81  We try to adapt the three heuristics :  The highest priority  The highest intersection with the concept  The smallest cardinality Heuristics adaptation

82  Priority on blocks -> priority on attributes  Introduction as weights in the objective function  Minimization : choice of pairs with first priorities The highest priority

83  Pb : in LAD, no notion of concept ; everything is done symmetrically, the same time. The highest intersection

84  Modification of the heuristic : difference between the intersection with a concept and the intersection with the other.  The highest, the better. The highest intersection

85  Goal of RST : find minimal complexes:  Find blocks covering the most examples of the concept : highest possible intersection with the concept  Find blocks covering the less examples of the other concept : difference of intersections The highest intersection

86  For LAD : difference between the number of times a variable takes the value 1 in and in.  Introduction as weights in the constraints : we choose first the variable with the highest difference. The highest intersection

87  Simple : number of times a variable takes the value 1.  Introduction as weight in the constraints. The smallest cardinality

88  Two calculations to be introduced :  The highest difference  The smallest cardinality  Difference of the two calculations Weight of the constraints

89  Before : everything is 1.  Pb : modification of the weights of the left hand side has no signification. Right hand side of the constraints

90  Average of compared to the number of attributes.  Average of in each constraint  Inconvenient : not a real signification Ideas of modification

91  Not touch the weight in the constraints: introduce everything in the coefficients of the objective function: Ideas of modification

92 Contents 1.Rough Sets Theory 2.Logical Analysis Of data 3.Comparison 4.Inconsistencies

93  Use of two approximations : lower and upper.  Rules generation: sure and possible. For RST

94  Classification mistakes: positive point classified as negative or the other way.  Two different cases. For LAD

95  All other points are well classify : our point will not be covered.  If the number of non covered points is high: generation of longer patterns.  If this number is small : erroneous classification and we forgot the points for the following. Pos. Point classified as neg.

96  Terms covering a lot of positive points : also some negative points.  Probably wrongly classified : not taken into account for the evaluation of candidates terms. Neg. Point classified as pos.

97  We introduce a ratio.  A term is still candidate if the ratio between negative and positive points is smallest than: Ratio

98  An inconsistence can be considered as a mistake of classification  Inconsistence : two « identical » objects differently classified.  One of them is wrongly classified (approximations) Inconsistencies and mistakes

99  Let consider an inconsistence in LAD :  two points :  two classes :  There are two possibilities :  is not covered by small degree patterns  is covered by patterns of Equivalence?

100  We have only one inconsistence.  The covered point is isolated ; it’s not taken into account.  Patterns of will be generated without the inconsistence point -> lower approximation 1 st case

101  A point covered by the other concept patterns is wrongly classified.  It’s not taken into account for the candidate terms.  It’s not taken into account for the pattern generation of -> lower approximation 2 nd case

102  Not taken into account for but not a problem for  For : upper approximation 2 nd case

103  According to a ratio, LAD decide if a point is well classified or not.  For an inconsistence, it’s the same as consider:  The upper approximation of a class  The lower approximation of the other  On more than 1 inconsistence : we re- classify the points. Equivalence?

104 Conclusion  Complete data : we can try to match LAD and RST.  Inconsistencies : classification mistakes of LAD can correspond to approximations.  Missing data : different management

105  Jerzy W. Grzymala-Busse, MLEM2 - Discretization During Rule Induction, Proceedings of the IIPWM'2003, International Conference on Intelligent Information Processing and WEB Mining Systems, Zakopane, Poland, June 2-5, 2003, 499-508. Springer-Verlag.  Jerzy W. Grzymala-Busse, Jerzy Stefanowski, Three Discretization Methods for Rule Induction, International Journal of Intelligent Systems, 2001.  Endre Boros, Peter L. Hammer, Toshihide Ibaraki, Alexander Kogan, Eddy Mayoraz, Ilya Muchnik, An Implementation of Logical Analysis of Data, Rutcor Research Raport 22-96, 1996. Sources (1)

106  Endre Boros, Peter L. Hammer, Toshihide Ibaraki, Alexander Kogan, Logical Analysis of Numerical Data, Rutcor Research Raport 04-97, 1997.  Jerzy W. Grzymala-Busse, Rough Set Strategies to Data with Missing Attribute Values,Proceedings of theWorkshop on Foundation and New Directions in Data Mining, Melbourne, FL, USA. 2003.  Jerzy W. Grzymala-Busse, Sachin Siddhaye, Rough Set Approaches to Rule Induction from Incomplete Data, Proceedings of the IPMU'2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based System[C],Perugia,Italy, July 4, 2004 2 : 923- 930. Sources (2)

107  Jerzy Stefanowski, Daniel Vanderpooten, Induction of Decision Rules in Classi_cation and Discovery-Oriented Perspectives, International Journal of Intelligent Systems, 16 (1), 2001, 13-28.  Jerzy Stefanowski, The Rough Set based Rule Induction Technique for Classification Problems, Proceedings of 6th European Conference on Intelligent Techniques and Soft Computing EUFIT 98, Aachen 7-10 Sept., (1998) 109.113.  Roman Slowinski, Jerzy Stefanowski, Salvatore Greco, Benedetto Matarazzo, Rough Sets Processing of Inconsistent Information in Decision Analysis, Control and Cybernetics 29, 379±404, 2000. Sources (3)


Download ppt "Johanna GOLD Rough Sets Theory Logical Analysis of Data. Monday, November 26, 2007."

Similar presentations


Ads by Google