Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Rule-Based Attribute-Oriented Induction for Data Mining Authors: Cheung et al. Graduate: Yu-Wei Su Advisor: Dr. Hsu.

Similar presentations


Presentation on theme: "Efficient Rule-Based Attribute-Oriented Induction for Data Mining Authors: Cheung et al. Graduate: Yu-Wei Su Advisor: Dr. Hsu."— Presentation transcript:

1 Efficient Rule-Based Attribute-Oriented Induction for Data Mining Authors: Cheung et al. Graduate: Yu-Wei Su Advisor: Dr. Hsu

2 Outline  Motivation  Objective  Introduction  Basic attribute-oriented induction  Rule-based concept generalization  Rule-based attribute-oriented induction  Path relation  An efficient rule-based AOI  Performance study  Conclusion  Opinion

3 Motivation  AOI induction capability is limited by the unconditional concept generalization  Basic AOI has the problem of time complexity

4 Objective  Extending the concept generalization to rule-based concept hierarchy  Developing an efficient algorithm to facilitate induction

5 Introduction  The growth in size of existing databases has far exceeded the human abilities to analyze  AOI has been implemented in DBMiner  The engine for concept generalization in basic AOI is the concept ascension  To further enhance the capability of AOI, need to replace concept ascension

6 Introduction( cont.)  Rule-based Concept Graph can be generalized to more than one higher level concept  Induction anomaly would occur, when a concept tree is applied directly to a concept graph  Instead multi-dimensional data cube to generalized relation can improve performance

7 Basic attribute-oriented induction  The purpose of AOI is to discover rules from relations  The primary is concept generalization

8 Basic attribute-oriented induction( cont.)  By merging the records which have the same generalized values, characteristics of the data can be captured  Primitives in AOI Task-relevant data  Data that want to be analyzed, called initial relation Background knowledge  To support concept hierarchies

9 Basic attribute-oriented induction( cont.) Representation of learning results  Concept generalization Notation  Desirable level: an attribute contain a small number of distinct values  Desirable attribute threshold: to control the number of distinct values of an attribute  Minimum desirable level: when generalized to a level lower than the current one, the distinct number would more than desirable attribute threshold

10 Basic attribute-oriented induction( cont.)  Prime relation: every attribute is at minimum desirable level R ’ 1. Inducing the prime relation R ’ from initial relation 2. Progressive generalization

11 Basic attribute-oriented induction( cont.)

12 Rule-based concept generalization  Rule-based concept generalization is a more general scheme and a concept can ascend via more than one path  Generalization rule can be determine whether a concept can be generalized along one path

13 Rule-based concept generalization( cont.)  Deductive rule generalization The rule associated with a generalization path Concept hierarchy associated with deduction generalization rules is called deduction-rule- based concept graph

14 Rule-based attribute-oriented induction  General model definition (DB, CH, DS, KR, t a ) DB: underlying database CH: a set of rule-based concept hierarchies DS: deduction system supporting CH KR: representation scheme of learned result Ta: desirable attribute threshold  The generalization and rule creation process is fundamentally the same as basic AOI

15 Rule-based attribute-oriented induction( cont.)  Concept ascension require additional information and it may not available in the prime relation, called Induction anomaly An depending attribute has been removed An depending attribute can be generalized too high to match the condition An depending condition can only be evaluated against the initial relation

16 Rule-based attribute-oriented induction( cont.)

17 Path relation  Further generalization in the rule-based case has to be started again from the initial relation and it is costly and wasteful  Using a path relation to capture the generalization result from one of the rules on the initial relation  A tuple in initial relation can only be generalized via a unique path to the root

18 Path relation( cont.)  Before induction starts, a preprocessing is used to identify and label the generalization paths of all the attributes  Every tuple can be transformed into a tuple of ids of the associated generalization paths  Path relation has captured completely the generalization result of the initial relation

19 Path relation( cont.)

20  Another issue in rule-based generalization is the cyclic dependency Generalization is depended on another attribute, if dependency is cyclic, the deadlock will happened

21 Path relation( cont.)

22  Data structure for generalization

23 An efficient rule-based AOI

24 An efficient rule-based AOI( cont.)

25 Performance study  The time complexity of relation path algorithm is O(n) and backtracking algorithm is O(n log n)  The difference between two algorithms Backtracking uses generalized relation as data structure Further generalization after prime relation has been generated is based in the prime relation

26 Performance study( cont.)  Experiment data is a synthesized student data  Attribute name: name, status, sex, age, GPA  Conditions Graduate students are at least 22 years Ph.D. student are at least 25 years Graduate student ’ s GPA are at least 3  Generalization order

27 Performance study( cont.)

28

29 Conclusion  Concept hierarchies may contain unconditional and conditional rules, which enlarges the application domain  Path relation algorithm has an improved complexity of O(n)  Rule-based induction can extend the capability of DBMiner

30 Opinion  The numerical attribute problem still exist  Experiments are to few and can ’ t see the improved results


Download ppt "Efficient Rule-Based Attribute-Oriented Induction for Data Mining Authors: Cheung et al. Graduate: Yu-Wei Su Advisor: Dr. Hsu."

Similar presentations


Ads by Google