TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003.

TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003

Presentation Outline  Introduction  Problem Definition  Mining Algorithms  Performance Evaluation  Conclusions

Introduction  Association rule mining X  Y (itemsets) Existence of X implies existence of Y  Earlier work focused on binary attributes and intra-transaction relationships E.g. “ham  bread”: means “A customer who buys ham is likely to buy bread as well”

Introduction  Cannot describe relationships such as: If price of item A falls below $1, then monthly sales of item B rise by a margin between 10K and 20K. People between 35 and 45 with salary between 80K and 120K are likely to buy a house whose price is between 300K and 400K within 2 years of marriage.  Goal: to mine ARs involving numerical attributes and temporal evolution

Problem Definition  Each object has a set of numerical attributes  Database: a sequence of snapshots S 1, S 2,.. S t of objects  Evolution: temporal changes of values of some attribute of some object E.g. Evolution of “salary” attr. with 3 snapshots (salary  [40000,45000])  (salary  [47500,55000])  (salary  [60000,70000])

Problem Definition  TARs (on evolving numerical attributes): ARs that capture correlations among attr. evolutions  Scope of paper: only consider correlations of simultaneous evolutions (i.e. attr. evolutions over same set of snapshots)

Mining Quantitative ARs  Srikant and Agrawal (SIGMOD’96) Divide domain of each quantitative attr. into intervals Combine intervals as long as their support is less than max-sup threshold A set of items: original and combined intervals Apply traditional AR mining algorithm

 BitOp (Lent et al., ICDE’97) Rule form: A  B  C quantitative categorical Partition attribute domain into 2-D grids For each value of attr. C  Examine data in each grid cell to see if AR applies  Represent result by a bit in a 2-D bitmap  Combine ARs with adjacent LHS attr. values to form a clustered AR Smoothing: to cover “small holes” in a big cluster Mining Quantitative ARs b4 b3 b2 b1 a1a2 a3a4a5a6 x x x x x x x x x x x x x x x x x x x x x x x

Mining TARs  SR algorithm (based on Srikant et al., 1996) Map numerical attribute evolutions to binary attrs. Apply any traditional AR mining algorithm Transform binary attr. values in rules to numerical ranges Complexity  For a numerical attr. quantized to b intervals Need O(b 2 ) items to represent all possible sub-ranges For t snapshots, need O(b 2 t) items to encode all possible evolutions  Huge number of items, very inefficient

Mining TARs  LE algorithm (based on BitOp) Quantize domains Map each possible evolution of RHS attr. into an item For each rule form, generate clustered rules for each possible value of each possible RHS attr. Complexity  For a RHS attr. quantized to b intervals, consider its evolution over t snapshots There could be b 2t distinct evolutions  Total no. of possible evolutions increases exponentially with no. of attrs. and no. of snapshots

Mining TARs  TAR algorithm

The Model: Evolution and Its Space  Given attr. A i and m snapshots Evolution E(A i ) = (A i  [l 1, u 1 ])  (A i  [l 2, u 2 ])  …  (A i  [l m, u m ])  Length of evolution = m Evolution space of A i : m dimensional space (jth dimension associated with value of A i at jth snapshot)

The Model: Evolution and Its Space  E.g. E 1 = (salary  [40000,45000])  (salary  [47500,55000])  (salary  [60000,70000])

The Model: Evolution Conj. and Its Space  Given n attrs A 1, A 2, …, A n (length m) Evolution conjunction: E(A 1 )  E(A 2 )  …  E(A n ) Evolution space: n x m dimensional space (each dimension associated with value of one attr. at one snapshot)

The Model: TAR  TAR: X  Y (evolution conjunctions) Symmetric relationship Assumption: Y only contains evolution of one attr. E(A 1 )  E(A 2 )  …  E(A k-1 )  E(A k+1 )  …  E(A n )  E(A k )

The Model: Window  Window Subsequence of m consecutive snapshots For t available snapshots S 1, S 2, …, S t, there are t-m+1 windows of width m

The Model: Object History  Object history of an object o over a window W The sequence of changes of o over W Follows an evolution E(A i ) iff, for each snapshot in the window, the value of A i in the object history falls into corr. interval specified in E(A i ) Follows an evolution conjunction E(A 1 )  E(A 2 )  …  E(A n ) iff it follows every evolution in it o satisfies the TAR X  Y iff, it has an object history that follows X and Y

The Model: TAR as Hypercube  Each object history can be mapped to a point in evolution space of involved attributes  TAR: a hypercube in this space, which contains the set of object histories satisfying the rule  Support, density & strength thresholds: constraints on number & distribution of object histories in hypercube

The Model: Rule Set  Rule set : set of all rules r s.t. r is a specialization of r max and a generalization of r min  Each rule set can summarize a large no. of valid rules

Mining TARs: TAR algorithm  Find density-based (subspace) clusters  Find all valid rule sets

Mining TARs: TAR algorithm  Find density-based (subspace) clusters Create base intervals for each attribute Form base cubes from base intervals: n=1, m=1 Bottom-up clustering algorithm Density of an evolution cube: object history concentration of the sparsest base cube in it The Apriori property holds on density  Find all valid rule sets

Mining TARs: TAR algorithm  Find density-based (subspace) clusters  Find all valid rule sets Make use of the strength and support metrics  For rule X  Y, strength = Sup(X  Y) / (Sup(X) x Sup(Y)) Strength is used to prune search space

Pruning with the Strength Threshold  Property 1: For any rule r,  a base rule br i which is a specialization of r and with strength  that of r.  Implication Only have to examine rules which are generalizations of BR (set of base rules) whose strength  thres.

Pruning with the Strength Threshold  Property 2 For any two rules r and r’ where r’ is a specialization of r, and strength of r’ < strength of r,  another base rule br i which is a specialization of r but not r’ and strength of br i > strength of r.  Implication Can skip rules which are generalizations of r’ but which do not contain any other base rule in BR.

Finding Rule Sets from Each Cluster  Find BR  For each subset of BR, explore corr. search region from rule r (min. bounding box of rules in BR’) If strength of r < thres., ignore region min-rule  If sup of r  thres., min-rule  r  If sup of r < thres., search for its valid generalizations within region. Stop when strength < thres. max-rule  Search similarly until a rule is found s.t. all of its generalizations either violate strength requirement or another base rule is included  There can be multiple max-rules for a min-rule

Performance Evaluation  300MHz CPU with 128MB memory  Three synthetic datasets 100,000 objects with 5 attributes 100 snapshots Embedded 500 rules of length 5 or less User-specified thresholds  Density: 2 (2 times the average density)  Support: 5%  Strength: 1.3

Performance Evaluation  Precision: 100% for all algorithms Recall

Performance Evaluation  Observations TAR is faster than SR and LE  Strength is used to prune the search space in TAR  Search a smaller set of candidate rules Response time of TAR increases at a slower pace w.r.t. number of base intervals

Performance Evaluation  Real dataset 20,000 objects (persons) 5 attributes: age, title, salary, family status (single, married, head of household), distance between person’s house and a major city 10 snapshots (one per year) No. of base intervals 100; support 3%, density 2, strength 1.3

Performance Evaluation  Performance of TAR alg. on real dataset Time taken: 260s to mine 347 rule sets Examples of TARs  People receiving a salary raise tend to move further away from city center.  If people with a salary in the range 70K and 100K get a raise, the range of the raise will likely be from 7K to 15K.

Conclusions  A TAR model is proposed to represent correlations among numerical attribute evolutions.  A novel approach to mine TARs by first discovering clusters and then efficiently constructing rule sets is introduced.  Empirical evaluation shows TAR algorithm outperforms alternative algs. by a large margin.

References  W. Wang, J. Yang, and R. Muntz. TAR: Temporal association rules on evolving numerical attributes, ICDE’01.  R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables, SIGMOD’96.  B. Lent, A. Swami, and J. Widom. Clustering association rules, ICDE’97.  R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining application, SIGMOD’98.

TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003.

Similar presentations

Presentation on theme: "TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003.

Similar presentations

Presentation on theme: "TAR: Temporal Association Rules on Evolving Numerical Attributes Wei Wang, Jiong Yang, and Richard Muntz Speaker: Sarah Chan CSIS DB Seminar May 7, 2003."— Presentation transcript:

Similar presentations

About project

Feedback