CLASS INHERITANCE TREE (CIT)

Slides:



Advertisements
Similar presentations
What is a Database By: Cristian Dubon.
Advertisements

Mining Multiple-level Association Rules in Large Databases
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Implementation of Graph Decomposition and Recursive Closures Graph Decomposition and Recursive Closures was published in 2003 by Professor Chen. The project.
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Chapter 8 File organization and Indices.
Basic Data Mining Techniques Chapter Decision Trees.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.
Basic Data Mining Techniques
Inductive learning Simplest form: learn a function from examples
Ch5 Mining Frequent Patterns, Associations, and Correlations
McGraw-Hill Technology Education © 2004 by the McGraw-Hill Companies, Inc. All rights reserved. Office Access 2003 Lab 3 Analyzing Data and Creating Reports.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
B-Trees And B+-Trees Jay Yim CS 157B Dr. Lee.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
Relational Theory and Design
Association Rule Mining
Slides for “Data Mining” by I. H. Witten and E. Frank.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Temporal Database Paper Reading R 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Converting ER/EER to logical schema; physical design issues 1.
Fast Mining Frequent Patterns with Secondary Memory Kawuu W. Lin, Sheng-Hao Chung, Sheng-Shiung Huang and Chun-Cheng Lin Department of Computer Science.
Mining Association Rules in Large Database This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed.
Partitioning & Creating Hardware Tablespaces for Performance
Microsoft Office Access 2010 Lab 3
Data Mining Find information from data data ? information.
Mining Dependent Patterns
Chapter 4 Logical Database Design and the Relational Model
Chapter 4: Logical Database Design and the Relational Model
Event-driven accounting information systems
INLS 623– Database Systems II– File Structures, Indexing, and Hashing
Indexing Structures for Files and Physical Database Design
CHP - 9 File Structures.
Record Storage, File Organization, and Indexes
Physical Changes That Don’t Change the Logical Design
Rule Induction for Classification Using
Data Mining: Concepts and Techniques
Scientific Inquiry and the Scientific Method
Parallel Density-based Hybrid Clustering
Database Performance Tuning and Query Optimization
Chapter Trees and B-Trees
Chapter Trees and B-Trees
Market Basket Analysis and Association Rules
File organization and Indexing
Association Rule Mining
Trees Part 2!!! By JJ Shepherd.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Lec 3: Object-Oriented Data Modeling
12/2/2018.
Advance Database System
Market Basket Analysis and Association Rules
Practical Issues of Data Placement
Introduction to Data Structures
Chapter 11 Database Performance Tuning and Query Optimization
Applying principles of computer science in a biological context
Dangerous Driving Condition Analysis in Driver Assistance Systems
B-Trees.
Chapter 11: Indexing and Hashing
General Trees A general tree T is a finite set of one or more nodes such that there is one designated node r, called the root of T, and the remaining nodes.
Presentation transcript:

CLASS INHERITANCE TREE (CIT) A New Efficient Association Rules Mining Method Using CLASS INHERITANCE TREE (CIT) Speaker: Tzu-Chuen Lu Good afternoon my dear teacher and classmates. Today I’m very pleasure to present my research paper. This paper will be published in international Conference of Information Management, year 2001. The paper title is “a new efficient association rules mining methods using class inheritance tree.”

Out line Association rules mining Rough set theory Problem of rough set A new methods – CIT Conclusions Out line Here is outline of this paper. First I will describe the means of association rules mining. And then introduce the association rule mining methods – rough set theory. The rough set theory has some problem in it. So we propose a new methods to improve the rough set theory that can reduce the time and space for database association rules mining. Final is our conclusions. 朝陽科技大學資訊管理系

ASSOCIATION RULES MINING Data Mining tasks Clustering , Classification, Association Rules… Association rules X Y X means a cause statement and Y means a result statement. For example, in a retail, the user can figure out what items are most frequently being sold together. Milk Bread (confidence 80%) ASSOCIATION RULES MINING As we all know data mining contain several tasks, such as clustering, classification, association rules, prediction and so on. This paper focus on association rules mining task. The association rules can be presented as this sentence. Where x means a cause statement and y means a result statement. The rule present some condition appear in database. For example, in a retail, the can use mining algorithm to analyze their database and find some interesting rules, such as figure out what items are most frequently being sold together. The rule in slice is means about 80% customers, when they put the milk on their basket then they will also take bread. Mining association rules have attracted a significant amount of researchers. One of the data mining algorithm is rough set theory. Rough set theory is used for approximation of a concept that uses lower and upper sets of the concepts. 朝陽科技大學資訊管理系

Rough Set Theory -equivalence classes Systolic pressure (SP) Diastolic pressure (DP) High blood pressure (HBP) Cause equivalence classes: Equivalence Class U/{S}={{P(1), P(2), P(3)} {P(5), P(6)} {P(4), P(7)}} U/{D}={{P(1), P(7)} {P(2), P(3), P(5), P(6)} {P(4)}} Here is a high blood pressure table that records the patients’ systolic pressure, diastolic pressure. The column HBP records the patient have high blood pressure or not. For example, patient one, his SP is normal and DP is normal too, so his high blood pressure is normal. Patient 2 has normal SP and high DP then his HBP is high. We use rough set theory to find the association rules from table. The first concept of rough set theory is finding equivalence class. Some of attributes which have the same value in it. For instance, patient 1,2,3 they both have the same SP. Their SP are Normal. So SP is normal contain patient 1 to 3. We call this set of objects is equivalence class. Another equivalence class SP is equal to high and so on. We also create equivalence class from HBP. But here we want to know what attributes make patient have high blood pressure. So we set the attribute HBP be result equivalence class. The others are cause equivalence classes. Then match each result equivalence class with cause equivalence class to find the association rules. Result equivalence classes: HBP = H ={P(2), P(5), P(6)} HBP = N ={P(1), P(3)} HBP = L ={P(4), P(7)} 朝陽科技大學資訊管理系

Rough Set Theory (Cont.) Rough set contain two kind of rules: First: Lower approximation rule The set of objects in a cause equivalence class is fully contained by the set of objects in a result equivalence class. Present with: IF C = 1 Then R = 1. Confidence 100% Rough Set Theory (Cont.) C=1: cause equivalence class R=1: result equivalence class obj1 obj2 obj3 obj4 obj1 obj2 obj4 The rough set theory contain two kind of rules. The first is lower approximation rules. It means the set of objects in a cause equivalence class is fully contained by the set of objects in a result equivalence class. We present the rule with if c 1 then r 1 with confidence 100 percent.

Rough Set Theory (Cont.) Second: Upper approximation rule The set of objects in a cause equivalence class is partially contained by the set of objects in a result equivalence class. Present with: IF C = 2 Then R = 1. Confidence 50% Rough Set Theory (Cont.) C=2: cause equivalence class R=1: result equivalence class obj3 obj4 obj5 obj6 obj1 obj2 obj3 obj4 Second rule is upper approximation rules. It mean the set of objects in a cause equivalence class is partially contained by the set of objects in a result equivalence class. We present the rule with if c2 then r1 that confidence is 50 percent.

Rough Set Theory (Cont.) HBP = H: {P(2), P(5), P(6)} U/{SP = N & DP =H } ={P(2), P(3)} Lower approximation rule: If SP = H Then HBP= H (1/1) Upper approximation rules: If SP = N Then HBP = H (1/3) We use lower and upper approximation concept to create rules. For example, here is cause equivalence classes. The first cause equivalence means there are two patients have high SP. Next we analyze the first result equivalence class, HBP is High. There are three objects in the class P2, P5 and P6. First step is compare the result equivalence class with cause equivalence classes. First cause equivalence class have two objects P5 and P6. All of the objects are contained in result equivalence class. So we create a lower approximation rules “If SP is high then HBP is High” with 100% confidence. Then compare with next class- SP is Normal. There are only only object contained in result equivalence class so we create a upper approximation rule “If sp is Normal then HBP is high ” with 1/3 confidence. Next cause equivalence class has no object contained by the result equivalence class, so we skip this class to next cause equivalence class. Continue until all of the cause equivalence class are finished. We can find all of the rules only have one attribute. But cause equivalence classes have relationship with each other. So we combine any two or more equivalence classes to create new combinatorial classes. For instance, we combine Normal SP and High DP to create a new cause equivalence class with two objects P2 and P3 that existing in two equivalence classes. And also compare with result equivalence classes to create new combinatorial approximation rule “If Sp is normal and DP is high then HBP is high ” with 1/3 confidence. If DP = H Then HBP = H (3/4) Combinatorial approximation rules: If SP = N and DP = H Then HBP = H (1/2) 朝陽科技大學資訊管理系

Rough Set Theory (Cont.) Time It analyzes all of the attributes in table and maps each cause equivalence class onto each result equivalence class. Take a lot of time to match classes. Space Relationships exist on different attributes. In order to find the combinatorial rules, it joins every two equivalence classes to create new combinatorial equivalence classes. A great number of attributes and values will generate more and more equivalence classes, but only few combinatorial equivalence classes may have rules inherent, and hence it wastes disk space. There are two problems in rough set theory finding association rules. First one is time problem, Rough set analyzes all of the attributes in table and maps each cause equivalence class onto each result equivalence class. Take a lot of time to match classes. The second is space problem. Relationships exist on different attributes. In order to find the combinatorial rules, it joins every two equivalence classes to create new combinatorial equivalence classes. A great number of attributes and values will generate more and more equivalence classes, but only few combinatorial equivalence classes may have rules inherent, and hence it wastes disk space. We develop a more efficient mining algorithm – Class Inheritance Tree (CIT) for databases association rules mining. It is based on the concept of rough set and improves the algorithm for association rules mining. The new algorithm can both raise mining speed and reduce search spaces. 朝陽科技大學資訊管理系

Class Inheritance Tree (CIT) Cause : A-F Result: X-Y Here is a table. Assume attribute from A to Z are cause attributes. Other attributes X, Y and Z are result equivalence attributes. New method : Class Inheritance Tree (CIT) Reduce the time and space 朝陽科技大學資訊管理系

Class Inheritance Tree (CIT) Cause equivalence classes Result equivalence classes First we create cause equivalence classes and result equivalence class from table. Then using those cause equivalence class to create CIT structure.

Class Inheritance Tree (CIT) step 1: insert attribute ‘A’ N2=1 N4=4,8,9 A2 N5=4 N7=9 N6=8 N8=2,7 A3 N9=2 N10=7 N11=3,5,6 A4 N12=3 N14=6 N13=5 N1=1,10 A1 Step one is insert equivalence class in attribute “A”. Insert first equivalence class A1 that contains two object 1 and 10. Because there are no node existing in CIT. We create a new node 1 with object 1 and 10 which contain attribute A1. Next creates a parent node 2 with first object 1. And third node with object 10 relatives to node 1. Next equivalence class A2 has three objects 4, 8 and 9. We also create a new node with object 4, 8 and 9 contain A2. And node 6 and node 7 relative to node 4. Continue insert each equivalence class in attribute A. N3=10 朝陽科技大學資訊管理系

Class Inheritance Tree (CIT) step 2: insert attribute ‘B’ B3 B1 B2 N15=1,3,10 B2 N2=1 N3=10 N1=1,10 A1 N12=3 N1=1,10 A1 N2=1 N3=10 N13=5 N17=5,8 B4 N6=8 N8=2,7 A3 N9=2 N10=7 N8=2,7 A3 B1 N9=2 N10=7 N16=4,6,9 B3 N5=4 N7=9 N14=6 N4=4,8,9 A2 N6=8 N4=4,8,9 A2 N5=4 N7=9 N6=8 Next is insert equivalence class in “B”. The first equivalence class B1 with object 2 and 7. Because there existing a node start with object 2. And objects in B1 is exactly the same with node 8. So we insert B1 into node 8. Where node 8 contain two equivalence classes. Equivalence class b2 with three objects 1, 3 and 10. Because node 2 has the same start object with b2 and second object 3 in b2 is smaller then object 10 in node 1. So we create a new node 15 with object 1,3 and 10 be a brother node of node 1. And then continue insert other equivalence classes until all of the attributes are finished. 朝陽科技大學資訊管理系

Y3 ={1,4,6,8,9} If B =3 Then Y =3 (1/1) If A =2 Then Y =3 (1/1) There is a complete CIT structure after inserted all of the cause equivalence classes. We use this CIT structure to create association rules. For example, there is a result equivalence Y3. Traditional rough set compare Y3 with all of the cause equivalence classes. In CIT we only search sub-tree that relative with result equivalence class. First we find the lower approximation rules. Starting with the first object 1. Node 11 has object 1, we compare Y3 with children nodes and brother nodes beyond node 11. First child node 20 not fully contained by Y3. The second object 2 is smaller then second object 4 in Y3 so go to next brother node. Node 12 also can’t fully contained by Y3. Go to next brother node. But the second object 7 in node 19 is larger then 4. So we need not go to next brother node. We can not find any lower approximation rules from first tree. So we go to next sub-tree 4. Where node 15 is fully contained by Y3 so we can create a rule “if b is 3 the Y is 3 ” with 100% confidence. Other rules can be found in brother nodes and children nodes. Next sub-tree 8 have no children node. So go to next sub-tree 9. Here is a rule “if d is 1 then y is 3” create from sub-tree 9. Y3 ={1,4,6,8,9} If B =3 Then Y =3 (1/1) If A =2 Then Y =3 (1/1) If C =1 Then Y =3 (1/1) If D =1 Then Y =3 (1/1)

Y3 ={1,4,6,8,9} Upper rules: If D =3 Then Y =3 (2/4) Next is finding upper approximation rules. We also start with sub-tree 1. All of the node in sub-tree 1 are partial contained by Y3. For instance node 20 has objects 1 and 6, so we create a upper rules “if d is 3 then y is 3” with 50% confidence. Continue to find all of the sub-tree. There is one point for finding upper approximation rules. Because node 4 with object 8 has no children node so we can’t create any rules. But node 15 has same object 8 with node 4. It is a upper approximation rule. Because node 4 have a relative link to node 15, we can create a new relative rules “If b is 4 then y is 3” We only scan partial of sub-tree that can reduce the comparing time and searching spaces. Y3 ={1,4,6,8,9} Upper rules: If D =3 Then Y =3 (2/4) If B =2 Then Y =3 (1/3) Relative rules: If B =4 Then Y =3 (1/2) If D =4 Then Y =3 (2/3)

Class Inheritance Tree (CIT) Another improve of this paper is combinatorial rules mining. Because we have relative links with node. We only combine the node that have relative to create new equivalence class. For example, node 5 has relative link to node 15 and node 3. The three nodes have the common object 9. So we only combine those node to create new classes. The new equivalence class is combining D1 and B3 that with object 9. Others are combining d1 and a2, b3 and a2 and so on. We need not search all of the equivalence classes that will reduce the combining time and reduce the space of saving combinatorial equivalence classes. Combine Class 1: (D1 & B3)= {9}. Combine Class 2: (D1 & A2)= {9}. Combine Class 3: (B3 & A2)= {4,9}.

Conclusions A CIT structure for association rules mining Appling in supply chain distributed database mining Appling in Object-oriented database mining Conclusions 朝陽科技大學資訊管理系