TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003.

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

A distributed method for mining association rules
Data Mining Techniques Association Rule
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Fast Algorithms For Hierarchical Range Histogram Constructions
LOGO Association Rule Lecturer: Dr. Bo Yuan
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Clustering Prof. Navneet Goyal BITS, Pilani
Data Mining Association Analysis: Basic Concepts and Algorithms
Rakesh Agrawal Ramakrishnan Srikant
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Cluster Analysis.
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Association Analysis: Basic Concepts and Algorithms.
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
SAC’06 April 23-27, 2006, Dijon, France Towards Value Disclosure Analysis in Modeling General Databases Xintao Wu UNC Charlotte Songtao Guo UNC Charlotte.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
Spatial Temporal Data Mining
Mining Association Rules
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
What Is Sequential Pattern Mining?
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Sequential PAttern Mining using A Bitmap Representation
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Mining High Utility Itemset in Big Data
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
Sequential Pattern Mining
Mining various kinds of Association Rules
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Clustering by Pattern Similarity in Large Data Sets Haixun Wang, Wei Wang, Jiong Yang, Philip S. Yu IBM T. J. Watson Research Center Presented by Edmond.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.
Mining Quantitative Association Rules in Large Relational Tables ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Sasi Sekhar.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Temporal Database Paper Reading R 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang.
Presented by Ho Wai Shing
Spatial-Temporal Data Mining Wei Wang Data Mining Lab Computer Science Department UCLA.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Clustering High-Dimensional Data. Clustering high-dimensional data – Many applications: text documents, DNA micro-array data – Major challenges: Many.
CLUSTERING HIGH-DIMENSIONAL DATA Elsayed Hemayed Data Mining Course.
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
1 Discovering Calendar-based Temporal Association Rules SHOU Yu Tao May. 21 st, 2003 TIME 01, 8th International Symposium on Temporal Representation and.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Differential Analysis on Deep Web Data Sources Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal December.
QED : An Efficient Framework for Temporal Region Query Processing Yi-Hong Chu 朱怡虹 Network Database Laboratory Dept. of Electrical Engineering National.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Transformations targeted at minimizing experimental variance
Group 9 – Data Mining: Data
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Presentation Outline  Introduction  Problem Definition  Mining Algorithms  Performance Evaluation  Conclusions

Introduction  Association rule mining X  Y (itemsets) Existence of X implies existence of Y  Earlier work focused on binary attributes and intra-transaction relationships E.g. “ham  bread”: means “A customer who buys ham is likely to buy bread as well”

Introduction  Cannot describe relationships such as: If price of item A falls below $1, then monthly sales of item B rise by a margin between 10K and 20K. People between 35 and 45 with salary between 80K and 120K are likely to buy a house whose price is between 300K and 400K within 2 years of marriage.  Goal: to mine ARs involving numerical attributes and temporal evolution

Problem Definition  Each object has a set of numerical attributes  Database: a sequence of snapshots S 1, S 2,.. S t of objects  Evolution: temporal changes of values of some attribute of some object E.g. Evolution of “salary” attr. with 3 snapshots (salary  [40000,45000])  (salary  [47500,55000])  (salary  [60000,70000])

Problem Definition  TARs (on evolving numerical attributes): ARs that capture correlations among attr. evolutions  Scope of paper: only consider correlations of simultaneous evolutions (i.e. attr. evolutions over same set of snapshots)

Mining Quantitative ARs  Srikant and Agrawal (SIGMOD’96) Divide domain of each quantitative attr. into intervals Combine intervals as long as their support is less than max-sup threshold A set of items: original and combined intervals Apply traditional AR mining algorithm

 BitOp (Lent et al., ICDE’97) Rule form: A  B  C quantitative categorical Partition attribute domain into 2-D grids For each value of attr. C  Examine data in each grid cell to see if AR applies  Represent result by a bit in a 2-D bitmap  Combine ARs with adjacent LHS attr. values to form a clustered AR Smoothing: to cover “small holes” in a big cluster Mining Quantitative ARs b4 b3 b2 b1 a1a2 a3a4a5a6 x x x x x x x x x x x x x x x x x x x x x x x

Mining TARs  SR algorithm (based on Srikant et al., 1996) Map numerical attribute evolutions to binary attrs. Apply any traditional AR mining algorithm Transform binary attr. values in rules to numerical ranges Complexity  For a numerical attr. quantized to b intervals Need O(b 2 ) items to represent all possible sub-ranges For t snapshots, need O(b 2 t) items to encode all possible evolutions  Huge number of items, very inefficient

Mining TARs  LE algorithm (based on BitOp) Quantize domains Map each possible evolution of RHS attr. into an item For each rule form, generate clustered rules for each possible value of each possible RHS attr. Complexity  For a RHS attr. quantized to b intervals, consider its evolution over t snapshots There could be b 2t distinct evolutions  Total no. of possible evolutions increases exponentially with no. of attrs. and no. of snapshots

Mining TARs  TAR algorithm

The Model: Evolution and Its Space  Given attr. A i and m snapshots Evolution E(A i ) = (A i  [l 1, u 1 ])  (A i  [l 2, u 2 ])  …  (A i  [l m, u m ])  Length of evolution = m Evolution space of A i : m dimensional space (jth dimension associated with value of A i at jth snapshot)

The Model: Evolution and Its Space  E.g. E 1 = (salary  [40000,45000])  (salary  [47500,55000])  (salary  [60000,70000])

The Model: Evolution Conj. and Its Space  Given n attrs A 1, A 2, …, A n (length m) Evolution conjunction: E(A 1 )  E(A 2 )  …  E(A n ) Evolution space: n x m dimensional space (each dimension associated with value of one attr. at one snapshot)

The Model: TAR  TAR: X  Y (evolution conjunctions) Symmetric relationship Assumption: Y only contains evolution of one attr. E(A 1 )  E(A 2 )  …  E(A k-1 )  E(A k+1 )  …  E(A n )  E(A k )

The Model: Window  Window Subsequence of m consecutive snapshots For t available snapshots S 1, S 2, …, S t, there are t-m+1 windows of width m

The Model: Object History  Object history of an object o over a window W The sequence of changes of o over W Follows an evolution E(A i ) iff, for each snapshot in the window, the value of A i in the object history falls into corr. interval specified in E(A i ) Follows an evolution conjunction E(A 1 )  E(A 2 )  …  E(A n ) iff it follows every evolution in it o satisfies the TAR X  Y iff, it has an object history that follows X and Y

The Model: TAR as Hypercube  Each object history can be mapped to a point in evolution space of involved attributes  TAR: a hypercube in this space, which contains the set of object histories satisfying the rule  Support, density & strength thresholds: constraints on number & distribution of object histories in hypercube

The Model: Rule Set  Rule set : set of all rules r s.t. r is a specialization of r max and a generalization of r min  Each rule set can summarize a large no. of valid rules

Mining TARs: TAR algorithm  Find density-based (subspace) clusters  Find all valid rule sets

Mining TARs: TAR algorithm  Find density-based (subspace) clusters Create base intervals for each attribute Form base cubes from base intervals: n=1, m=1 Bottom-up clustering algorithm Density of an evolution cube: object history concentration of the sparsest base cube in it The Apriori property holds on density  Find all valid rule sets

Mining TARs: TAR algorithm  Find density-based (subspace) clusters  Find all valid rule sets Make use of the strength and support metrics  For rule X  Y, strength = Sup(X  Y) / (Sup(X) x Sup(Y)) Strength is used to prune search space

Pruning with the Strength Threshold  Property 1: For any rule r,  a base rule br i which is a specialization of r and with strength  that of r.  Implication Only have to examine rules which are generalizations of BR (set of base rules) whose strength  thres.

Pruning with the Strength Threshold  Property 2 For any two rules r and r’ where r’ is a specialization of r, and strength of r’ < strength of r,  another base rule br i which is a specialization of r but not r’ and strength of br i > strength of r.  Implication Can skip rules which are generalizations of r’ but which do not contain any other base rule in BR.

Finding Rule Sets from Each Cluster  Find BR  For each subset of BR, explore corr. search region from rule r (min. bounding box of rules in BR’) If strength of r < thres., ignore region min-rule  If sup of r  thres., min-rule  r  If sup of r < thres., search for its valid generalizations within region. Stop when strength < thres. max-rule  Search similarly until a rule is found s.t. all of its generalizations either violate strength requirement or another base rule is included  There can be multiple max-rules for a min-rule

Performance Evaluation  300MHz CPU with 128MB memory  Three synthetic datasets 100,000 objects with 5 attributes 100 snapshots Embedded 500 rules of length 5 or less User-specified thresholds  Density: 2 (2 times the average density)  Support: 5%  Strength: 1.3

Performance Evaluation  Precision: 100% for all algorithms Recall

Performance Evaluation  Observations TAR is faster than SR and LE  Strength is used to prune the search space in TAR  Search a smaller set of candidate rules Response time of TAR increases at a slower pace w.r.t. number of base intervals

Performance Evaluation  Real dataset 20,000 objects (persons) 5 attributes: age, title, salary, family status (single, married, head of household), distance between person’s house and a major city 10 snapshots (one per year) No. of base intervals 100; support 3%, density 2, strength 1.3

Performance Evaluation  Performance of TAR alg. on real dataset Time taken: 260s to mine 347 rule sets Examples of TARs  People receiving a salary raise tend to move further away from city center.  If people with a salary in the range 70K and 100K get a raise, the range of the raise will likely be from 7K to 15K.

Conclusions  A TAR model is proposed to represent correlations among numerical attribute evolutions.  A novel approach to mine TARs by first discovering clusters and then efficiently constructing rule sets is introduced.  Empirical evaluation shows TAR algorithm outperforms alternative algs. by a large margin.

References  W. Wang, J. Yang, and R. Muntz. TAR: Temporal association rules on evolving numerical attributes, ICDE’01.  R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables, SIGMOD’96.  B. Lent, A. Swami, and J. Widom. Clustering association rules, ICDE’97.  R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining application, SIGMOD’98.