ICMLC2007, Aug. 19~22, 2007, Hong Kong 1 Incremental Maintenance of Ontology- Exploiting Association Rules Ming-Cheng Tseng 1, Wen-Yang Lin 2 and Rong.

Slides:



Advertisements
Similar presentations
Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
A distributed method for mining association rules
Data Mining Techniques Association Rule
Mining Multiple-level Association Rules in Large Databases
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Chase Repp.  knowledge discovery  searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Mining Generalized Association Rules Ramkrishnan Strikant Rakesh Agrawal Data Mining Seminar, spring semester, 2003 Prof. Amos Fiat Student: Idit Haran.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Rakesh Agrawal Ramakrishnan Srikant
IncSpan: Incremental Mining of Sequential Patterns in Large Databases Hong Cheng,Xifeng Yan,Jiawei Han University of Illinois at Urbana-Champaign.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
Fast Algorithms for Mining Association Rules * CS401 Final Presentation Presented by Lin Yang University of Missouri-Rolla * Rakesh Agrawal, Ramakrishnam.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
Fast Algorithms for Association Rule Mining
Research Project Mining Negative Rules in Large Databases using GRD.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.
Mining Association Rules
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
What Is Sequential Pattern Mining?
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Ch5 Mining Frequent Patterns, Associations, and Correlations
VLDB 2012 Mining Frequent Itemsets over Uncertain Databases Yongxin Tong 1, Lei Chen 1, Yurong Cheng 2, Philip S. Yu 3 1 The Hong Kong University of Science.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Mining High Utility Itemset in Big Data
Web Usage Mining for Semantic Web Personalization جینی شیره شعاعی زهرا.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
Data Mining Find information from data data ? information.
1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation.
『 Personalization of Supermarket Product Recommendations 』 김용수.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Mining Frequent Patterns. What Is Frequent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs.
A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
Optimization of Association Rules Extraction Through Exploitation of Context Dependent Constraints Arianna Gallo, Roberto Esposito, Rosa Meo, Marco Botta.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Association rule mining
Byung Joon Park, Sung Hee Kim
Targeted Association Mining in Time-Varying Domains
DIRECT HASHING AND PRUNING (DHP) ALGORITHM
Association Rule Mining
Presentation transcript:

ICMLC2007, Aug. 19~22, 2007, Hong Kong 1 Incremental Maintenance of Ontology- Exploiting Association Rules Ming-Cheng Tseng 1, Wen-Yang Lin 2 and Rong Jeng 3 1, 3 Institute of Information Engineering, I-Shou University, Taiwan 2 Dept. of Comp. Sci. & Info. Eng., National University of Kaohsiung, Taiwan August 20, 2007

ICMLC2007, Aug. 19~22, 2007, Hong Kong 2 Outline Introduction Problem description The proposed algorithm Performance evaluation Conclusions

ICMLC2007, Aug. 19~22, 2007, Hong Kong 3 Introduction Motivation In general, there exist lots of semantic relationships (domain knowledge) among items It is natural to incorporate domain ontology into the process of data mining to explore more innovative rules The source databases are changing over time E.g., insertion, deletion, modification The discovered knowledge (rules) has to be updated to reflect new situation

ICMLC2007, Aug. 19~22, 2007, Hong Kong 4 Introduction (cont.) Association rules Given: A database of customer transactions Each transaction is a set of items Find all rules X  Y that correlate the presence of one set of items X with another set of items Y Example: Sony VAIO  HP LaserJet 1300 (Sup.  30%, Conf.  60%)

ICMLC2007, Aug. 19~22, 2007, Hong Kong 5 Introduction (cont.) Strong association rules Given: User’s specified constraints  Minimum support (min_sup)  minimum confidence (min_conf) Finding rules X  Y with support and confidence larger than the user’s specified minimum values Example:  min_sup = 25%, min_conf = 50% Sony VAIO  HP LaserJet 1300 (Sup.  30%, Conf.  60%)

ICMLC2007, Aug. 19~22, 2007, Hong Kong 6 Introduction (cont.) Frequent itemsets (patterns) mining The association mining problem can be reduced to the problem of mining frequent itemsets, i.e., itemsets with support larger than min_sup Example min_sup = 25%, min_conf = 50% Sony VAIO  HP LaserJet 1300 (Sup.  30%, Conf.  60%) sup({Sony VAIO, HP LaserJet 1300}) = 30% sup({Sony VAIO}) = 50%

ICMLC2007, Aug. 19~22, 2007, Hong Kong 7 Introduction (cont.) Ontology W3C Web Ontology Working Group “An ontology formally defines a common set of terms that are used to describe and represent a domain knowledge.” e.g., taxonomy: a kind of ontology presenting classification relationship among objects

ICMLC2007, Aug. 19~22, 2007, Hong Kong 8 Introduction (cont.) Ontology-exploiting association rules IBM 60GB HD => HP DeskJet

ICMLC2007, Aug. 19~22, 2007, Hong Kong 9 Problem Description Incremental maintenance of ontology-exploiting association rules Given: A database of customer transactions DB An incremental database db An item ontology T Discovered frequent itemsets in DB, L minimum support, ms, and minimum confidence, mc Find all frequent itemsets in UD = DB + db w.r.t. ms Construct all strong rules from the frequent itemsets w.r.t. mc

ICMLC2007, Aug. 19~22, 2007, Hong Kong 10 Problem Description (cont.) -- Example TIDPurchased Items 1IBM TP, Epson EPL, Toner Cartridge 2Sony VAIO, IBM TP, Epson EPL 3IBM TP, HP DeskJet, Ink Cartridge 4HP DeskJet 5IBM TP, HP DeskJet, Ink Cartridge 6Sony VAIO, Ink Cartridge Customer transactions DB L1L1 CountL 2 & L 3 Count {Printer} {PC} {IBM TP} {RAM 256MB*} {IBM 60GB*} {Printer, PC} {Printer, IBM TP} {Printer, RAM 256MB*} {Printer, IBM 60GB*} {RAM 256MB*, IBM 60GB*} {Printer, RAM 256MB*, IBM 60GB*} Discovered frequent itemsets L Item ontology G minsup = 70% (algorithms AROC, AROS)

ICMLC2007, Aug. 19~22, 2007, Hong Kong 11 Problem Description (cont.) Example TIDPurchased Items 1IBM TP, Epson EPL, Toner Cartridge 2Sony VAIO, IBM TP, Epson EPL 3IBM TP, HP DeskJet, Ink Cartridge 4HP DeskJet 5IBM TP, HP DeskJet, Ink Cartridge 6Sony VAIO, Ink Cartridge TIDItems Purchased 7Toner Cartridge 8IBM TP, HP DeskJet, IBM 60GB, Toner Cartridge 9IBM 60GB, Toner Cartridge Customer transactions DB Incremental transactions db Item ontology G minsup = 70% Updated frequent itemsets L’ ??

ICMLC2007, Aug. 19~22, 2007, Hong Kong 12 Basic scheme An Apriori-based maintenance algorithm Employing a bottom-up, level-wise searching strategy Starting from frequent 1-itemset, L 1, then L 2, …, L k, etc. ABCD ABCABDBCDACD ABCD ABAC ADBCBDCD The Proposed Algorithm – IMARO

ICMLC2007, Aug. 19~22, 2007, Hong Kong 13 NotationDefinition DBOriginal database dbIncremental database UD Updated database UD  DB + db TItem ontology ED Extension of DB with extended items in T ed Extension of db with extended items in T UE Updated extended database UE  ED + ed The Proposed Algorithm – IMARO (cont.) Terminology

ICMLC2007, Aug. 19~22, 2007, Hong Kong 14 Example The Proposed Algorithm – IMARO (cont.)

ICMLC2007, Aug. 19~22, 2007, Hong Kong 15 Note on database extension A component item may exist as a primitive item itself To clarify the meaning of associations involving such an item, we have to differentiate the role this item play e.g., IBM TP => Ink Cartridge buy an IBM TP notebook, also buy an Ink Cartridge buy an IBM TP notebook, also buy an product composed of Ink Cartridge The Proposed Algorithm – IMARO (cont.) TIDPurchased Items 5IBM TP, HP DeskJet, Ink Cartridge TIDPrimitive ItemsExtended Items 5IBM TP, HP DeskJet, Ink Cartridge* PC, RAM 256MB, IBM 60GB, Printer, Ink Cartridge

ICMLC2007, Aug. 19~22, 2007, Hong Kong 16 The Proposed Algorithm – IMARO (cont.) Process flow for updating frequent k-itemsets e.g., AROC or AROS

ICMLC2007, Aug. 19~22, 2007, Hong Kong 17 Frequent/infrequent itemsets inference The Proposed Algorithm – IMARO (cont.) ConditionsResults L ED L ed UEUEActionCase  freq.no1  undetd.compare sup UD (A) with ms2  undetd.scan DB3  infreq.no4

ICMLC2007, Aug. 19~22, 2007, Hong Kong 18 The Proposed Algorithm – IMARO (cont.) Optimization 1: Candidate pruning Any candidate itemset that contains both an item and anyone of its extensions (generalized item or component) is pruned. {Epson EPL, Printer} {Epson EPL, Toner Cartridge*}

ICMLC2007, Aug. 19~22, 2007, Hong Kong 19 The Proposed Algorithm – IMARO (cont.) The extension of an item can be added only if that item does appear in at least one candidate itemset being counted currently Photo Conductor Toner Cartridge HP DeskJet Printer Epson EPL - Ink Cartridge - RAM 256MB IBM 60GB Sony VAIO PC IBM TP S 60GB - Optimization 2: Extension filtering

ICMLC2007, Aug. 19~22, 2007, Hong Kong 20 Performance Evaluation Compared with applying our proposed algorithms, AROC and AROS, to the whole database DB+db with T Test data A synthetic dataset generated by the IBM data generator with artificially–built ontology ParameterDefault value |DB|Number of original transactions200,000 |t||t|Average size of transactions20 NNumber of items362 RNumber of groups30 LNumber of levels4 FFanout5

ICMLC2007, Aug. 19~22, 2007, Hong Kong 21 Performance Evaluation (cont.) Varying minimum supports |db| = 40,000

ICMLC2007, Aug. 19~22, 2007, Hong Kong 22 Performance Evaluation (cont.) Varying incremental transaction size ms = 1.5%

ICMLC2007, Aug. 19~22, 2007, Hong Kong 23 Conclusions We have investigated the problem of updating ontology- exploiting association rules when new transactions are inserted into the database An Apriori-based algorithm is proposed Other issues More complicated semantic relationships and knowledge More complicated semantic relationships and knowledge Non-uniform minimum support Generalized item or composite item occurs more frequently Towards a total solution for evolving environments Ontology evolution, database update Interactive refinement of support constraints …

ICMLC2007, Aug. 19~22, 2007, Hong Kong 24 Thanks for your attention!

ICMLC2007, Aug. 19~22, 2007, Hong Kong 25 Conclusions (cont.) Taxonomy of semantic relationships *source: 1993, Veda C. Storey, VLDB journal

ICMLC2007, Aug. 19~22, 2007, Hong Kong 26 Related Work Comparison with previous work ContributorsModel of incremental maintenance of association rules Type of database updateType of ontology Srikant & Agrawal, 1995noneclassification Han & Fu, 1995noneclassification Cheung et al., 1996insertionclassification Cheung et al., 1997insertion, deletion and modification none Jea et al., 2003nonecomposition Chien et al., 2005noneclassification & composition