1 Mining surprising patterns using temporal description length Soumen Chakrabarti (IIT Bombay) Sunita Sarawagi (IIT Bombay) Byron Dom (IBM Almaden)

2 Market basket mining algorithms l Find prevalent rules that hold over large fractions of data l Useful for promotions and store arrangement l Intensively researched 1990 Milk and cereal sell together!

3 Prevalent  Interesting l Analysts already know about prevalent rules l Interesting rules are those that deviate from prior expectation l Mining’s payoff is in finding surprising phenomena 1995 1998 Milk and cereal sell together! Zzzz... Milk and cereal sell together!

4 What makes a rule surprising? l Does not match prior expectation n Correlation between milk and cereal remains roughly constant over time l Cannot be trivially derived from simpler rules n Milk 10%, cereal 10% n Milk and cereal 10% … surprising n Eggs 10% n Milk, cereal and eggs 0.1% … surprising! n Expected 1%

5 Two views on data mining Mining Program Data Discovery Mining Program Data Model of Analyst’s Knowledge of the Data Discovery Analyst

6 Our contributions l A new notion of surprising patterns n Detect changes in correlation along time n Filter out steady, uninteresting correlations l Algorithms to mine for surprising patterns n Encode data into bit streams using two models n Surprise = difference in number of bits needed l Experimental results n Demonstrate superiority over prevalent patterns

7 A simpler problem: one item l Milk-buying habits modeled by biased coin l Customer tosses this coin to decide whether to buy milk n Head or “1” denotes “basket contains milk” n Coin bias is Pr[milk] l Analyst wants to study Pr[milk] along time n Single coin with fixed bias is not interesting n Changes in bias are interesting

8 The coin segmentation problem l Players A and B l A has a set of coins with different biases l A repeatedly n Picks arbitrary coin n Tosses it arbitrary number of times l B observes H/T l Guesses transition points and biases Pick Toss Return A B

9 How to explain the data l Given n head/tail observations n Can assume n different coins with bias 0 or 1 Data fits perfectly (with probability one) Many coins needed n Or assume one coin May fit data poorly l “Best explanation” is a compromise 1/4 5/71/3

10 Coding examples l Sequence of k zeroes n Naïve encoding takes k bits n Run length takes about log k bits l 1000 bits, 10 randomly placed 1’s, rest 0’s n Posit a coin with bias 0.01 n Data encoding cost is (Shannon’s theorem):

11 How to find optimal segments Sequence of 17 tosses: Derived graph with 18 nodes: Edge cost = model cost + data cost Model cost = one node ID + one Pr[head] Data cost for Pr[head] = 5/7, 5 heads, 2 tails Shortest path

12 Approximate shortest path l Suppose there are T tosses l Make T 1–  chunks each with T  nodes (tune  ) l Find shortest paths within chunks l Some nodes are chosen in each chunk l Solve a shortest path with all chosen nodes

13 Two or more items l “Unconstrained” segmentation n k items induce a 2 k sided coin n “milk and cereal” = 11, “milk, not cereal” = 10, “neither” = 00, etc. l Shortest path finds significant shift in any of the coin face probabilities l Problem: some of these shifts may be completely explained by lower order marginal

14 Example l Drop in joint sale of milk and cereal is completely explained by drop in sale of milk l Pr[milk & cereal] / (Pr[milk] Pr[cereal]) remains constant over time l Call this ratio 

15 Constant-  segmentation l Compute global  over all time l All coins must have this common value of  l Segment by constrained optimization l Compare with unconstrained coding cost Observed support Independence

16 Is all this really needed? l Simpler alternative n Aggregate data into suitable time windows n Compute support, correlation, , etc. in each window n Use variance threshold to choose itemsets l Pitfalls n Choices: windows, thresholds n May miss fine detail n Over-sensitive to outliers

17 … but no simpler Smoothing leads to an estimated trend that is descriptive rather than analytic or explanatory. Because it is not based on an explicit probabilistic model, the method cannot be treated rigorously in terms of mathematical statistics. The Statistical Analysis of Time Series T. W. Anderson

18 Experiments l 2.8 million baskets over 7 years, 1987-93 l 15800 items, average 2.62 items per basket l Two algorithms n Complete MDL approach n MDL segmentation + statistical tests (MStat) l Anecdotes n MDL effective at penalizing obvious itemsets

19 Quality of approximation

20 Little agreement in itemset ranks l Simpler methods do not approximate MDL

21 MDL has high selectivity l Score of best itemsets stand out from the rest using MDL

22 Three anecdotes l  against time l High MStat score n Small marginals n Polo shirt & shorts l High correlation n Small % variation n Bedsheets & pillow cases l High MDL score n Significant gradual drift n Men’s & women’s shorts

23 Conclusion l New notion of surprising patterns based on n Joint support expected from marginals n Variation of joint support along time l Robust MDL formulation l Efficient algorithms n Near-optimal segmentation using shortest path n Pruning criteria l Successful application to real data

1 Mining surprising patterns using temporal description length Soumen Chakrabarti (IIT Bombay) Sunita Sarawagi (IIT Bombay) Byron Dom (IBM Almaden)

Similar presentations

Presentation on theme: "1 Mining surprising patterns using temporal description length Soumen Chakrabarti (IIT Bombay) Sunita Sarawagi (IIT Bombay) Byron Dom (IBM Almaden)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Mining surprising patterns using temporal description length Soumen Chakrabarti (IIT Bombay) Sunita Sarawagi (IIT Bombay) Byron Dom (IBM Almaden)

Similar presentations

Presentation on theme: "1 Mining surprising patterns using temporal description length Soumen Chakrabarti (IIT Bombay) Sunita Sarawagi (IIT Bombay) Byron Dom (IBM Almaden)"— Presentation transcript:

Similar presentations

About project

Feedback