Download presentation
Presentation is loading. Please wait.
1
CLASS INHERITANCE TREE (CIT)
A New Efficient Association Rules Mining Method Using CLASS INHERITANCE TREE (CIT) Speaker: Tzu-Chuen Lu Good afternoon my dear teacher and classmates. Today I’m very pleasure to present my research paper. This paper will be published in international Conference of Information Management, year 2001. The paper title is “a new efficient association rules mining methods using class inheritance tree.”
2
Out line Association rules mining Rough set theory
Problem of rough set A new methods – CIT Conclusions Out line Here is outline of this paper. First I will describe the means of association rules mining. And then introduce the association rule mining methods – rough set theory. The rough set theory has some problem in it. So we propose a new methods to improve the rough set theory that can reduce the time and space for database association rules mining. Final is our conclusions. 朝陽科技大學資訊管理系
3
ASSOCIATION RULES MINING
Data Mining tasks Clustering , Classification, Association Rules… Association rules X Y X means a cause statement and Y means a result statement. For example, in a retail, the user can figure out what items are most frequently being sold together. Milk Bread (confidence 80%) ASSOCIATION RULES MINING As we all know data mining contain several tasks, such as clustering, classification, association rules, prediction and so on. This paper focus on association rules mining task. The association rules can be presented as this sentence. Where x means a cause statement and y means a result statement. The rule present some condition appear in database. For example, in a retail, the can use mining algorithm to analyze their database and find some interesting rules, such as figure out what items are most frequently being sold together. The rule in slice is means about 80% customers, when they put the milk on their basket then they will also take bread. Mining association rules have attracted a significant amount of researchers. One of the data mining algorithm is rough set theory. Rough set theory is used for approximation of a concept that uses lower and upper sets of the concepts. 朝陽科技大學資訊管理系
4
Rough Set Theory -equivalence classes
Systolic pressure (SP) Diastolic pressure (DP) High blood pressure (HBP) Cause equivalence classes: Equivalence Class U/{S}={{P(1), P(2), P(3)} {P(5), P(6)} {P(4), P(7)}} U/{D}={{P(1), P(7)} {P(2), P(3), P(5), P(6)} {P(4)}} Here is a high blood pressure table that records the patients’ systolic pressure, diastolic pressure. The column HBP records the patient have high blood pressure or not. For example, patient one, his SP is normal and DP is normal too, so his high blood pressure is normal. Patient 2 has normal SP and high DP then his HBP is high. We use rough set theory to find the association rules from table. The first concept of rough set theory is finding equivalence class. Some of attributes which have the same value in it. For instance, patient 1,2,3 they both have the same SP. Their SP are Normal. So SP is normal contain patient 1 to 3. We call this set of objects is equivalence class. Another equivalence class SP is equal to high and so on. We also create equivalence class from HBP. But here we want to know what attributes make patient have high blood pressure. So we set the attribute HBP be result equivalence class. The others are cause equivalence classes. Then match each result equivalence class with cause equivalence class to find the association rules. Result equivalence classes: HBP = H ={P(2), P(5), P(6)} HBP = N ={P(1), P(3)} HBP = L ={P(4), P(7)} 朝陽科技大學資訊管理系
5
Rough Set Theory (Cont.)
Rough set contain two kind of rules: First: Lower approximation rule The set of objects in a cause equivalence class is fully contained by the set of objects in a result equivalence class. Present with: IF C = 1 Then R = 1. Confidence 100% Rough Set Theory (Cont.) C=1: cause equivalence class R=1: result equivalence class obj1 obj2 obj3 obj4 obj1 obj2 obj4 The rough set theory contain two kind of rules. The first is lower approximation rules. It means the set of objects in a cause equivalence class is fully contained by the set of objects in a result equivalence class. We present the rule with if c 1 then r 1 with confidence 100 percent.
6
Rough Set Theory (Cont.)
Second: Upper approximation rule The set of objects in a cause equivalence class is partially contained by the set of objects in a result equivalence class. Present with: IF C = 2 Then R = 1. Confidence 50% Rough Set Theory (Cont.) C=2: cause equivalence class R=1: result equivalence class obj3 obj4 obj5 obj6 obj1 obj2 obj3 obj4 Second rule is upper approximation rules. It mean the set of objects in a cause equivalence class is partially contained by the set of objects in a result equivalence class. We present the rule with if c2 then r1 that confidence is 50 percent.
7
Rough Set Theory (Cont.)
HBP = H: {P(2), P(5), P(6)} U/{SP = N & DP =H } ={P(2), P(3)} Lower approximation rule: If SP = H Then HBP= H (1/1) Upper approximation rules: If SP = N Then HBP = H (1/3) We use lower and upper approximation concept to create rules. For example, here is cause equivalence classes. The first cause equivalence means there are two patients have high SP. Next we analyze the first result equivalence class, HBP is High. There are three objects in the class P2, P5 and P6. First step is compare the result equivalence class with cause equivalence classes. First cause equivalence class have two objects P5 and P6. All of the objects are contained in result equivalence class. So we create a lower approximation rules “If SP is high then HBP is High” with 100% confidence. Then compare with next class- SP is Normal. There are only only object contained in result equivalence class so we create a upper approximation rule “If sp is Normal then HBP is high ” with 1/3 confidence. Next cause equivalence class has no object contained by the result equivalence class, so we skip this class to next cause equivalence class. Continue until all of the cause equivalence class are finished. We can find all of the rules only have one attribute. But cause equivalence classes have relationship with each other. So we combine any two or more equivalence classes to create new combinatorial classes. For instance, we combine Normal SP and High DP to create a new cause equivalence class with two objects P2 and P3 that existing in two equivalence classes. And also compare with result equivalence classes to create new combinatorial approximation rule “If Sp is normal and DP is high then HBP is high ” with 1/3 confidence. If DP = H Then HBP = H (3/4) Combinatorial approximation rules: If SP = N and DP = H Then HBP = H (1/2) 朝陽科技大學資訊管理系
8
Rough Set Theory (Cont.)
Time It analyzes all of the attributes in table and maps each cause equivalence class onto each result equivalence class. Take a lot of time to match classes. Space Relationships exist on different attributes. In order to find the combinatorial rules, it joins every two equivalence classes to create new combinatorial equivalence classes. A great number of attributes and values will generate more and more equivalence classes, but only few combinatorial equivalence classes may have rules inherent, and hence it wastes disk space. There are two problems in rough set theory finding association rules. First one is time problem, Rough set analyzes all of the attributes in table and maps each cause equivalence class onto each result equivalence class. Take a lot of time to match classes. The second is space problem. Relationships exist on different attributes. In order to find the combinatorial rules, it joins every two equivalence classes to create new combinatorial equivalence classes. A great number of attributes and values will generate more and more equivalence classes, but only few combinatorial equivalence classes may have rules inherent, and hence it wastes disk space. We develop a more efficient mining algorithm – Class Inheritance Tree (CIT) for databases association rules mining. It is based on the concept of rough set and improves the algorithm for association rules mining. The new algorithm can both raise mining speed and reduce search spaces. 朝陽科技大學資訊管理系
9
Class Inheritance Tree (CIT)
Cause : A-F Result: X-Y Here is a table. Assume attribute from A to Z are cause attributes. Other attributes X, Y and Z are result equivalence attributes. New method : Class Inheritance Tree (CIT) Reduce the time and space 朝陽科技大學資訊管理系
10
Class Inheritance Tree (CIT)
Cause equivalence classes Result equivalence classes First we create cause equivalence classes and result equivalence class from table. Then using those cause equivalence class to create CIT structure.
11
Class Inheritance Tree (CIT)
step 1: insert attribute ‘A’ N2=1 N4=4,8,9 A2 N5=4 N7=9 N6=8 N8=2,7 A3 N9=2 N10=7 N11=3,5,6 A4 N12=3 N14=6 N13=5 N1=1,10 A1 Step one is insert equivalence class in attribute “A”. Insert first equivalence class A1 that contains two object 1 and 10. Because there are no node existing in CIT. We create a new node 1 with object 1 and 10 which contain attribute A1. Next creates a parent node 2 with first object 1. And third node with object 10 relatives to node 1. Next equivalence class A2 has three objects 4, 8 and 9. We also create a new node with object 4, 8 and 9 contain A2. And node 6 and node 7 relative to node 4. Continue insert each equivalence class in attribute A. N3=10 朝陽科技大學資訊管理系
12
Class Inheritance Tree (CIT)
step 2: insert attribute ‘B’ B3 B1 B2 N15=1,3,10 B2 N2=1 N3=10 N1=1,10 A1 N12=3 N1=1,10 A1 N2=1 N3=10 N13=5 N17=5,8 B4 N6=8 N8=2,7 A3 N9=2 N10=7 N8=2,7 A3 B1 N9=2 N10=7 N16=4,6,9 B3 N5=4 N7=9 N14=6 N4=4,8,9 A2 N6=8 N4=4,8,9 A2 N5=4 N7=9 N6=8 Next is insert equivalence class in “B”. The first equivalence class B1 with object 2 and 7. Because there existing a node start with object 2. And objects in B1 is exactly the same with node 8. So we insert B1 into node 8. Where node 8 contain two equivalence classes. Equivalence class b2 with three objects 1, 3 and 10. Because node 2 has the same start object with b2 and second object 3 in b2 is smaller then object 10 in node 1. So we create a new node 15 with object 1,3 and 10 be a brother node of node 1. And then continue insert other equivalence classes until all of the attributes are finished. 朝陽科技大學資訊管理系
13
Y3 ={1,4,6,8,9} If B =3 Then Y =3 (1/1) If A =2 Then Y =3 (1/1)
There is a complete CIT structure after inserted all of the cause equivalence classes. We use this CIT structure to create association rules. For example, there is a result equivalence Y3. Traditional rough set compare Y3 with all of the cause equivalence classes. In CIT we only search sub-tree that relative with result equivalence class. First we find the lower approximation rules. Starting with the first object 1. Node 11 has object 1, we compare Y3 with children nodes and brother nodes beyond node 11. First child node 20 not fully contained by Y3. The second object 2 is smaller then second object 4 in Y3 so go to next brother node. Node 12 also can’t fully contained by Y3. Go to next brother node. But the second object 7 in node 19 is larger then 4. So we need not go to next brother node. We can not find any lower approximation rules from first tree. So we go to next sub-tree 4. Where node 15 is fully contained by Y3 so we can create a rule “if b is 3 the Y is 3 ” with 100% confidence. Other rules can be found in brother nodes and children nodes. Next sub-tree 8 have no children node. So go to next sub-tree 9. Here is a rule “if d is 1 then y is 3” create from sub-tree 9. Y3 ={1,4,6,8,9} If B =3 Then Y =3 (1/1) If A =2 Then Y =3 (1/1) If C =1 Then Y =3 (1/1) If D =1 Then Y =3 (1/1)
14
Y3 ={1,4,6,8,9} Upper rules: If D =3 Then Y =3 (2/4)
Next is finding upper approximation rules. We also start with sub-tree 1. All of the node in sub-tree 1 are partial contained by Y3. For instance node 20 has objects 1 and 6, so we create a upper rules “if d is 3 then y is 3” with 50% confidence. Continue to find all of the sub-tree. There is one point for finding upper approximation rules. Because node 4 with object 8 has no children node so we can’t create any rules. But node 15 has same object 8 with node 4. It is a upper approximation rule. Because node 4 have a relative link to node 15, we can create a new relative rules “If b is 4 then y is 3” We only scan partial of sub-tree that can reduce the comparing time and searching spaces. Y3 ={1,4,6,8,9} Upper rules: If D =3 Then Y =3 (2/4) If B =2 Then Y =3 (1/3) Relative rules: If B =4 Then Y =3 (1/2) If D =4 Then Y =3 (2/3)
15
Class Inheritance Tree (CIT)
Another improve of this paper is combinatorial rules mining. Because we have relative links with node. We only combine the node that have relative to create new equivalence class. For example, node 5 has relative link to node 15 and node 3. The three nodes have the common object 9. So we only combine those node to create new classes. The new equivalence class is combining D1 and B3 that with object 9. Others are combining d1 and a2, b3 and a2 and so on. We need not search all of the equivalence classes that will reduce the combining time and reduce the space of saving combinatorial equivalence classes. Combine Class 1: (D1 & B3)= {9}. Combine Class 2: (D1 & A2)= {9}. Combine Class 3: (B3 & A2)= {4,9}.
16
Conclusions A CIT structure for association rules mining
Appling in supply chain distributed database mining Appling in Object-oriented database mining Conclusions 朝陽科技大學資訊管理系
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.