Download presentation
Presentation is loading. Please wait.
Published byRoberta French Modified over 9 years ago
1
TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003
2
Presentation Outline Introduction Problem Definition Mining Algorithms Performance Evaluation Conclusions
3
Introduction Association rule mining X Y (itemsets) Existence of X implies existence of Y Earlier work focused on binary attributes and intra-transaction relationships E.g. “ham bread”: means “A customer who buys ham is likely to buy bread as well”
4
Introduction Cannot describe relationships such as: If price of item A falls below $1, then monthly sales of item B rise by a margin between 10K and 20K. People between 35 and 45 with salary between 80K and 120K are likely to buy a house whose price is between 300K and 400K within 2 years of marriage. Goal: to mine ARs involving numerical attributes and temporal evolution
5
Problem Definition Each object has a set of numerical attributes Database: a sequence of snapshots S 1, S 2,.. S t of objects Evolution: temporal changes of values of some attribute of some object E.g. Evolution of “salary” attr. with 3 snapshots (salary [40000,45000]) (salary [47500,55000]) (salary [60000,70000])
6
Problem Definition TARs (on evolving numerical attributes): ARs that capture correlations among attr. evolutions Scope of paper: only consider correlations of simultaneous evolutions (i.e. attr. evolutions over same set of snapshots)
7
Mining Quantitative ARs Srikant and Agrawal (SIGMOD’96) Divide domain of each quantitative attr. into intervals Combine intervals as long as their support is less than max-sup threshold A set of items: original and combined intervals Apply traditional AR mining algorithm
8
BitOp (Lent et al., ICDE’97) Rule form: A B C quantitative categorical Partition attribute domain into 2-D grids For each value of attr. C Examine data in each grid cell to see if AR applies Represent result by a bit in a 2-D bitmap Combine ARs with adjacent LHS attr. values to form a clustered AR Smoothing: to cover “small holes” in a big cluster Mining Quantitative ARs b4 b3 b2 b1 a1a2 a3a4a5a6 x x x x x x x x x x x x x x x x x x x x x x x
9
Mining TARs SR algorithm (based on Srikant et al., 1996) Map numerical attribute evolutions to binary attrs. Apply any traditional AR mining algorithm Transform binary attr. values in rules to numerical ranges Complexity For a numerical attr. quantized to b intervals Need O(b 2 ) items to represent all possible sub-ranges For t snapshots, need O(b 2 t) items to encode all possible evolutions Huge number of items, very inefficient
10
Mining TARs LE algorithm (based on BitOp) Quantize domains Map each possible evolution of RHS attr. into an item For each rule form, generate clustered rules for each possible value of each possible RHS attr. Complexity For a RHS attr. quantized to b intervals, consider its evolution over t snapshots There could be b 2t distinct evolutions Total no. of possible evolutions increases exponentially with no. of attrs. and no. of snapshots
11
Mining TARs TAR algorithm
12
The Model: Evolution and Its Space Given attr. A i and m snapshots Evolution E(A i ) = (A i [l 1, u 1 ]) (A i [l 2, u 2 ]) … (A i [l m, u m ]) Length of evolution = m Evolution space of A i : m dimensional space (jth dimension associated with value of A i at jth snapshot)
13
The Model: Evolution and Its Space E.g. E 1 = (salary [40000,45000]) (salary [47500,55000]) (salary [60000,70000])
14
The Model: Evolution Conj. and Its Space Given n attrs A 1, A 2, …, A n (length m) Evolution conjunction: E(A 1 ) E(A 2 ) … E(A n ) Evolution space: n x m dimensional space (each dimension associated with value of one attr. at one snapshot)
15
The Model: TAR TAR: X Y (evolution conjunctions) Symmetric relationship Assumption: Y only contains evolution of one attr. E(A 1 ) E(A 2 ) … E(A k-1 ) E(A k+1 ) … E(A n ) E(A k )
16
The Model: Window Window Subsequence of m consecutive snapshots For t available snapshots S 1, S 2, …, S t, there are t-m+1 windows of width m
17
The Model: Object History Object history of an object o over a window W The sequence of changes of o over W Follows an evolution E(A i ) iff, for each snapshot in the window, the value of A i in the object history falls into corr. interval specified in E(A i ) Follows an evolution conjunction E(A 1 ) E(A 2 ) … E(A n ) iff it follows every evolution in it o satisfies the TAR X Y iff, it has an object history that follows X and Y
18
The Model: TAR as Hypercube Each object history can be mapped to a point in evolution space of involved attributes TAR: a hypercube in this space, which contains the set of object histories satisfying the rule Support, density & strength thresholds: constraints on number & distribution of object histories in hypercube
19
The Model: Rule Set Rule set : set of all rules r s.t. r is a specialization of r max and a generalization of r min Each rule set can summarize a large no. of valid rules
20
Mining TARs: TAR algorithm Find density-based (subspace) clusters Find all valid rule sets
21
Mining TARs: TAR algorithm Find density-based (subspace) clusters Create base intervals for each attribute Form base cubes from base intervals: n=1, m=1 Bottom-up clustering algorithm Density of an evolution cube: object history concentration of the sparsest base cube in it The Apriori property holds on density Find all valid rule sets
22
Mining TARs: TAR algorithm Find density-based (subspace) clusters Find all valid rule sets Make use of the strength and support metrics For rule X Y, strength = Sup(X Y) / (Sup(X) x Sup(Y)) Strength is used to prune search space
23
Pruning with the Strength Threshold Property 1: For any rule r, a base rule br i which is a specialization of r and with strength that of r. Implication Only have to examine rules which are generalizations of BR (set of base rules) whose strength thres.
24
Pruning with the Strength Threshold Property 2 For any two rules r and r’ where r’ is a specialization of r, and strength of r’ < strength of r, another base rule br i which is a specialization of r but not r’ and strength of br i > strength of r. Implication Can skip rules which are generalizations of r’ but which do not contain any other base rule in BR.
25
Finding Rule Sets from Each Cluster Find BR For each subset of BR, explore corr. search region from rule r (min. bounding box of rules in BR’) If strength of r < thres., ignore region min-rule If sup of r thres., min-rule r If sup of r < thres., search for its valid generalizations within region. Stop when strength < thres. max-rule Search similarly until a rule is found s.t. all of its generalizations either violate strength requirement or another base rule is included There can be multiple max-rules for a min-rule
26
Performance Evaluation 300MHz CPU with 128MB memory Three synthetic datasets 100,000 objects with 5 attributes 100 snapshots Embedded 500 rules of length 5 or less User-specified thresholds Density: 2 (2 times the average density) Support: 5% Strength: 1.3
27
Performance Evaluation Precision: 100% for all algorithms Recall
28
Performance Evaluation Observations TAR is faster than SR and LE Strength is used to prune the search space in TAR Search a smaller set of candidate rules Response time of TAR increases at a slower pace w.r.t. number of base intervals
29
Performance Evaluation Real dataset 20,000 objects (persons) 5 attributes: age, title, salary, family status (single, married, head of household), distance between person’s house and a major city 10 snapshots (one per year) No. of base intervals 100; support 3%, density 2, strength 1.3
30
Performance Evaluation Performance of TAR alg. on real dataset Time taken: 260s to mine 347 rule sets Examples of TARs People receiving a salary raise tend to move further away from city center. If people with a salary in the range 70K and 100K get a raise, the range of the raise will likely be from 7K to 15K.
31
Conclusions A TAR model is proposed to represent correlations among numerical attribute evolutions. A novel approach to mine TARs by first discovering clusters and then efficiently constructing rule sets is introduced. Empirical evaluation shows TAR algorithm outperforms alternative algs. by a large margin.
32
References W. Wang, J. Yang, and R. Muntz. TAR: Temporal association rules on evolving numerical attributes, ICDE’01. R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables, SIGMOD’96. B. Lent, A. Swami, and J. Widom. Clustering association rules, ICDE’97. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining application, SIGMOD’98.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.