Efficient Rule-Based Attribute-Oriented Induction for Data Mining Authors: Cheung et al. Graduate: Yu-Wei Su Advisor: Dr. Hsu.

Slides:



Advertisements
Similar presentations
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Advertisements

gSpan: Graph-based substructure pattern mining
Mining Multiple-level Association Rules in Large Databases
2001/12/181/50 Discovering Robust Knowledge from Databases that Change Author: Chun-Nan Hsu, Craig A. Knoblock Advisor: Dr. Hsu Graduate: Yu-Wei Su.
Graduate : Sheng-Hsuan Wang
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Spatial Data Mining CSE 6331, Fall 1999 Ajay Gupta
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 The Enhanced Entity- Relationship (EER) Model.
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
Concept Description and Data Generalization (baseado nos slides do livro: Data Mining: C & T)
August 2005RSFDGrC 2005, Regina, Canada 1 Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han 1, Ricardo Sanchez.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 1 Data Mining I Jagdish Gangolly State University of New York at Albany.
Selectivity Estimation of XPath for Cyclic Graphs Yun Peng.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Discovering Outlier Filtering Rules from Unlabeled Data Author: Kenji Yamanishi & Jun-ichi Takeuchi Advisor: Dr. Hsu Graduate: Chia- Hsien Wu.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Li Yi, APSEC ‘12 Constructing Feature Models Us­­ing a Cross-Join Merging Operator.
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.
Author: Zhexue Huang Advisor: Dr. Hsu Graduate: Yu-Wei Su
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng 國立雲林科技大學 National.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Querying Structured Text in an XML Database By Xuemei Luo.
Intelligent Database Systems Lab 1 Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Silvia Nittel Kelvin T.Leung Amy Braverman 國立雲林科技大學 National Yunlin.
Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.
MINING MULTI-LABEL DATA BY GRIGORIOS TSOUMAKAS, IOANNIS KATAKIS, AND IOANNIS VLAHAVAS Published on July, 7, 2010 Team Members: Kristopher Tadlock, Jimmy.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Mining various kinds of Association Rules
1 CS599 Spatial & Temporal Database Spatial Data Mining: Progress and Challenges Survey Paper appeared in DMKD96 by Koperski, K., Adhikary, J. and Han,
1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
CS690L Data Mining: Classification
Computer Science 1 Mining Likely Properties of Access Control Policies via Association Rule Mining JeeHyun Hwang 1, Tao Xie 1, Vincent Hu 2 and Mine Altunay.
A Fuzzy k-Modes Algorithm for Clustering Categorical Data
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
SQL Based Knowledge Representation And Knowledge Editor UMAIR ABDULLAH AFTAB AHMED MOHAMMAD JAMIL SAWAR (Presented by Lei Jiang)
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan Mark W. Isken 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS
Discovering Interesting Patterns for Investment Decision Making with GLOWER-A Genetic Learner Overlaid With Entropy Reduction Advisor : Dr. Hsu Graduate.
Rigorous Testing by Merging Structural and Behavioral UML Representations Presented by Chin-Yi Tsai.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Comparing Association Rules and Decision Trees for Disease.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Data Preprocessing: Data Reduction Techniques Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.
Safety Guarantee of Continuous Join Queries over Punctuated Data Streams Hua-Gang Li *, Songting Chen, Junichi Tatemura Divykant Agrawal, K. Selcuk Candan.
Best-first search is a search algorithm which explores a graph by expanding the most promising node chosen according to a specified rule.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Mining Top-n Local Outliers in Large Databases Author: Wen Jin, Anthony K. H. Tung, Jiawei Han Advisor: Dr. Hsu Graduate: Chia- Hsien Wu.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Gspan: Graph-based Substructure Pattern Mining
Efficient SOM Learning by Data Order Adjustment
The Enhanced Entity- Relationship (EER) Model
Talk Outline Motivation and Background. Policy Contexts.
CACTUS-Clustering Categorical Data Using Summaries
Rule Induction for Classification Using
Knowledge Representation
Data Mining Concept Description
Data Mining: Characterization
Group 9 – Data Mining: Data
Database EER.
Presentation transcript:

Efficient Rule-Based Attribute-Oriented Induction for Data Mining Authors: Cheung et al. Graduate: Yu-Wei Su Advisor: Dr. Hsu

Outline  Motivation  Objective  Introduction  Basic attribute-oriented induction  Rule-based concept generalization  Rule-based attribute-oriented induction  Path relation  An efficient rule-based AOI  Performance study  Conclusion  Opinion

Motivation  AOI induction capability is limited by the unconditional concept generalization  Basic AOI has the problem of time complexity

Objective  Extending the concept generalization to rule-based concept hierarchy  Developing an efficient algorithm to facilitate induction

Introduction  The growth in size of existing databases has far exceeded the human abilities to analyze  AOI has been implemented in DBMiner  The engine for concept generalization in basic AOI is the concept ascension  To further enhance the capability of AOI, need to replace concept ascension

Introduction( cont.)  Rule-based Concept Graph can be generalized to more than one higher level concept  Induction anomaly would occur, when a concept tree is applied directly to a concept graph  Instead multi-dimensional data cube to generalized relation can improve performance

Basic attribute-oriented induction  The purpose of AOI is to discover rules from relations  The primary is concept generalization

Basic attribute-oriented induction( cont.)  By merging the records which have the same generalized values, characteristics of the data can be captured  Primitives in AOI Task-relevant data  Data that want to be analyzed, called initial relation Background knowledge  To support concept hierarchies

Basic attribute-oriented induction( cont.) Representation of learning results  Concept generalization Notation  Desirable level: an attribute contain a small number of distinct values  Desirable attribute threshold: to control the number of distinct values of an attribute  Minimum desirable level: when generalized to a level lower than the current one, the distinct number would more than desirable attribute threshold

Basic attribute-oriented induction( cont.)  Prime relation: every attribute is at minimum desirable level R ’ 1. Inducing the prime relation R ’ from initial relation 2. Progressive generalization

Basic attribute-oriented induction( cont.)

Rule-based concept generalization  Rule-based concept generalization is a more general scheme and a concept can ascend via more than one path  Generalization rule can be determine whether a concept can be generalized along one path

Rule-based concept generalization( cont.)  Deductive rule generalization The rule associated with a generalization path Concept hierarchy associated with deduction generalization rules is called deduction-rule- based concept graph

Rule-based attribute-oriented induction  General model definition (DB, CH, DS, KR, t a ) DB: underlying database CH: a set of rule-based concept hierarchies DS: deduction system supporting CH KR: representation scheme of learned result Ta: desirable attribute threshold  The generalization and rule creation process is fundamentally the same as basic AOI

Rule-based attribute-oriented induction( cont.)  Concept ascension require additional information and it may not available in the prime relation, called Induction anomaly An depending attribute has been removed An depending attribute can be generalized too high to match the condition An depending condition can only be evaluated against the initial relation

Rule-based attribute-oriented induction( cont.)

Path relation  Further generalization in the rule-based case has to be started again from the initial relation and it is costly and wasteful  Using a path relation to capture the generalization result from one of the rules on the initial relation  A tuple in initial relation can only be generalized via a unique path to the root

Path relation( cont.)  Before induction starts, a preprocessing is used to identify and label the generalization paths of all the attributes  Every tuple can be transformed into a tuple of ids of the associated generalization paths  Path relation has captured completely the generalization result of the initial relation

Path relation( cont.)

 Another issue in rule-based generalization is the cyclic dependency Generalization is depended on another attribute, if dependency is cyclic, the deadlock will happened

Path relation( cont.)

 Data structure for generalization

An efficient rule-based AOI

An efficient rule-based AOI( cont.)

Performance study  The time complexity of relation path algorithm is O(n) and backtracking algorithm is O(n log n)  The difference between two algorithms Backtracking uses generalized relation as data structure Further generalization after prime relation has been generated is based in the prime relation

Performance study( cont.)  Experiment data is a synthesized student data  Attribute name: name, status, sex, age, GPA  Conditions Graduate students are at least 22 years Ph.D. student are at least 25 years Graduate student ’ s GPA are at least 3  Generalization order

Performance study( cont.)

Conclusion  Concept hierarchies may contain unconditional and conditional rules, which enlarges the application domain  Path relation algorithm has an improved complexity of O(n)  Rule-based induction can extend the capability of DBMiner

Opinion  The numerical attribute problem still exist  Experiments are to few and can ’ t see the improved results