The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP 790-90 Seminar GNET 713 BCB Module Spring 2007.

Slides:



Advertisements
Similar presentations
Mining Frequent Item Sets by Opportunistic Projection
Advertisements

ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
May 20, 2015Data Mining: Concepts and Techniques1 Data Mining: Concepts and Techniques — Chapter 5 — Jiawei Han Department of Computer Science University.
Rakesh Agrawal Ramakrishnan Srikant
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.
Multi-dimensional Sequential Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Yao Meng Hongli Li Database II Fall 2002.
Sequence Databases & Sequential Patterns
Mining Sequential Patterns Dimitrios Gunopulos, UCR.
Data Mining: Association. Mining Association Rules in Large Databases zAssociation rule mining zMining single-dimensional Boolean association rules from.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, and Jiawei Han SIGMOD 2002 Presented by: Eddie Date: 2002/12/23.
Pattern-growth Methods for Sequential Pattern Mining: Principles and Extensions Jiawei Han (UIUC) Jian Pei (Simon Fraser Univ.)
1 1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 6 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Performance and Scalability: Apriori Implementation.
August 20, 2015Data Mining: Concepts and Techniques1 Chapter 5: Mining Frequent Patterns, Association and Correlations Basic concepts and a road map Efficient.
A Short Introduction to Sequential Data Mining
What Is Sequential Pattern Mining?
September 11, 2015Data Mining: Concepts and Techniques1 Data Mining: Concepts and Techniques — Chapter 5 — Jianlin Cheng Department of Computer Science.
September 11, 2015Data Mining: Concepts and Techniques1 Data Mining: Concepts and Techniques — Chapter 5 —
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
©Jiawei Han and Micheline Kamber
October 6, 2015Data Mining: Concepts and Techniques1 Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 6 — ©Jiawei Han and Micheline.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
1 Data Mining and Warehousing: Session 6 Association Analysis Jia-wei Han
1 Multi-dimensional Sequential Pattern Mining Helen Pinto, Jiawei Han, Jian Pei, Ke Wang, Qiming Chen, Umeshwar Dayal ~From: 10th ACM Intednational Conference.
1 Knowledge discovery & data mining Association rules and market basket analysis --introduction A EDBT2000 Fosca Giannotti and Dino Pedreschi.
Association Rule Mining ARM
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Sequential Pattern Mining COMP Seminar BCB 713 Module Spring 2011.
Han: Association Rule Mining; modified & extended by Ch. Eick 1 Association Rule Mining — Slides for Textbook — — Chapter 6 — ©Jiawei Han and Micheline.
Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining III COMP Seminar GNET 713 BCB Module Spring 2007.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
March 14, 2016Data Mining: Concepts and Techniques1 Data Mining: Concepts and Techniques — Chapter 5 — Richong Zhang Office: New Main Building, G521
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
1 1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 6 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
Jian Pei and Runying Mao (Simon Fraser University)
Data Mining: Concepts and Techniques — Chapter 5 —
Data Mining: Concepts and Techniques — Chapter 5 —
Data Mining: Concepts and Techniques — Chapter 5 —
Information Management course
Association rule mining
Data Mining: Concepts and Techniques — Chapter 5 —
CS6220: Data Mining Techniques
Association Rule Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
به نام خداوند جان و خرد الگوکاوي در پايگاه‌هاي تراکنش بسيار بزرگ با استفاده از رويکرد تقسيم وحل Frequent Pattern Mining on Very Large Transaction Databases.
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques — Chapter 5 —
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining: Concepts and Techniques — Chapter 5 —
Data Mining: Concepts and Techniques — Chapter 5 —
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —
Association Rule Mining
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
Presentation transcript:

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar GNET 713 BCB Module Spring 2007

COMP Data Mining: Concepts, Algorithms, and Applications 2 Approximate match When you observe d1 Spread count as d1: 90%, d2: 5%, d3: 5% Compatibility Matrix

COMP Data Mining: Concepts, Algorithms, and Applications 3 Match The degree to which pattern P is retained/reflected in S M(P,S) = P(P|S)=  C(p,s) when when l S =l P M(P,S) = max over all possible when l S >l P Example PSM d1d1d1d30.9*0 d1d2 0.9*0.8 d1d2d1d30.9*0.05 d1d2d2d30.1*0.05 d1d2d1d2d30.9*0.8

COMP Data Mining: Concepts, Algorithms, and Applications 4 Calculate Max over all Dynamic Programming M(p 1 p 2..p i, s 1 s 2 …s j )= Max of M(p 1 p 2..p i-1, s 1 s 2 …s j-1 ) * C(p i,s j ) M(p 1 p 2..p i, s 1 s 2 …s j-1 ) O(l P *l S ) When compatibility Matrix is sparse O(l S )

COMP Data Mining: Concepts, Algorithms, and Applications 5 Match in D Average over all sequences in D

COMP Data Mining: Concepts, Algorithms, and Applications 6 Spread of match If compatibility matrix is identity matrix Match = support

COMP Data Mining: Concepts, Algorithms, and Applications 7 Anti-Monotone The match of a pattern P in a symbol sequence S is less than or equal to the match of any subpattern of P in S The match of a pattern P in a sequence database D is less than or equal to the match of any subpattern of P in D Can use any support based algorithm More patterns match so require efficient solution Sample based algorithms Border collapsing of ambiguous patterns

COMP Data Mining: Concepts, Algorithms, and Applications 8 Chernoff Bound Given sample size=n, range R, with probability 1-  true value:    = sqrt([R 2 ln(1/  )]/2n) Distribution free More conservative Sample size : fit in memory Restricted spread : For pattern P= p 1 p 2..p L R=min (match[p i ]) for all 1  i  L Frequent Patterns min_match +  min_match -  Infrequent patterns

COMP Data Mining: Concepts, Algorithms, and Applications 9 Algorithm Scan DB: O(N*min (L s *m, L s +m 2 )) Find the match of each individual symbol Take a random sample of sequences Identify borders that embrace the set of ambiguous patterns O(m Lp * |S| * Lp * n) Min_match   existing methods for association rule mining Locate the border of frequent patterns in the entire DB via border collapsing

COMP Data Mining: Concepts, Algorithms, and Applications 10 Border Collapsing If memory can not hold the counters for all ambiguous counters Probe-and-collapse : binary search Probe patterns with highest collapsing power until memory is filled If memory can hold all patterns up to the 1/x layer the space of of ambiguous patterns can be narrowed to at least 1/x of the original one where x is a power of 2 If it takes a level-wise search y scans of the DB, only O(log x y) scans are necessary when the border collapsing technique is employed

COMP Data Mining: Concepts, Algorithms, and Applications 12 Periodic Pattern Full periodic pattern ABC ABC ABC Partial periodic pattern ABC ADC ACC ABC Pattern hierarchy ABC ABC ABC DE DE DE DE ABC ABC ABC DE DE DE DE ABC ABC ABC DE DE DE DE

COMP Data Mining: Concepts, Algorithms, and Applications 13 Periodic Pattern Recent Achievements Partial Periodic Pattern Asynchronous Periodic Pattern Meta Pattern InfoMiner/InfoMiner+/STAMP

COMP Data Mining: Concepts, Algorithms, and Applications 14 Clustering Sequential Data CLUSEQ ApproxMAP

COMP Data Mining: Concepts, Algorithms, and Applications 15 References: Frequent-pattern Mining Methods R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of frequent itemsets. In Journal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining), R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. SIGMOD'93, , Washington, D.C. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB' , Santiago, Chile. J. Han, J. Pei, and Y. Yin: “Mining frequent patterns without candidate generation”. In Proc. ACM-SIGMOD’2000, pp. 1-12, Dallas, TX, May H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering association rules. KDD'94, , Seattle, WA, July 1994.

COMP Data Mining: Concepts, Algorithms, and Applications 16 References: Frequent-pattern Mining Methods A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. VLDB'95, , Zurich, Switzerland. C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining causal structures. VLDB'98, , New York, NY. R. Srikant and R. Agrawal. Mining generalized association rules. VLDB'95, , Zurich, Switzerland, Sept R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. SIGMOD'96, 1-12, Montreal, Canada. H. Toivonen. Sampling large databases for association rules. VLDB'96, , Bombay, India, Sept M.J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. KDD’97. August 1997.

COMP Data Mining: Concepts, Algorithms, and Applications 17 References: Performance Improvements S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket analysis. SIGMOD'97, Tucson, Arizona, May D.W. Cheung, J. Han, V. Ng, and C.Y. Wong. Maintenance of discovered association rules in large databases: An incremental updating technique. ICDE'96, New Orleans, LA. T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining using two-dimensional optimized association rules: Scheme, algorithms, and visualization. SIGMOD'96, Montreal, Canada. E.-H. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association rules. SIGMOD'97, Tucson, Arizona. J.S. Park, M.S. Chen, and P.S. Yu. An effective hash-based algorithm for mining association rules. SIGMOD'95, San Jose, CA, May 1995.

COMP Data Mining: Concepts, Algorithms, and Applications 18 References: Performance Improvements G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro and W. J. Frawley, Knowledge Discovery in Databases,. AAAI/MIT Press, J.S. Park, M.S. Chen, and P.S. Yu. An effective hash-based algorithm for mining association rules. SIGMOD'95, San Jose, CA. S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: Alternatives and implications. SIGMOD'98, Seattle, WA. K. Yoda, T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Computing optimized rectilinear regions for association rules. KDD'97, Newport Beach, CA, Aug M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithm for discovery of association rules. Data Mining and Knowledge Discovery, 1: , 1997.

COMP Data Mining: Concepts, Algorithms, and Applications 19 References. Extensions of Scopes S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association rules to correlations. SIGMOD'97, , Tucson, Arizona. J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. VLDB'95, , Zurich, Switzerland. M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo. Finding interesting rules from large sets of discovered association rules. CIKM'94, , Gaithersburg, Maryland. F. Korn, A. Labrinidis, Y. Kotidis, and C. Faloutsos. Ratio rules: A new paradigm for fast, quantifiable data mining. VLDB'98, , New York, NY. Wei Wang, Jiong Yang, Philip S. Yu: Efficient mining of weighted association rules (WAR). KDD 2000: Wei Wang, Jiong Yang, Richard R. Muntz: TAR: Temporal Association Rules on Evolving Numerical Attributes. ICDE 2001:

COMP Data Mining: Concepts, Algorithms, and Applications 20 References: Extensions of Scopes B. Lent, A. Swami, and J. Widom. Clustering association rules. ICDE'97, , Birmingham, England. R. Meo, G. Psaila, and S. Ceri. A new SQL-like operator for mining association rules. VLDB'96, , Bombay, India. R.J. Miller and Y. Yang. Association rules over interval data. SIGMOD'97, , Tucson, Arizona. A. Savasere, E. Omiecinski, and S. Navathe. Mining for strong negative associations in a large database of customer transactions. ICDE'98, , Orlando, FL, Feb D. Tsur, J. D. Ullman, S. Abitboul, C. Clifton, R. Motwani, and S. Nestorov. Query flocks: A generalization of association-rule mining. SIGMOD'98, 1-12, Seattle, Washington. J. Pei, A.K.H. Tung, J. Han. Fault-Tolerant Frequent Pattern Mining: Problems and Challenges. SIGMOD DMKD’01, Santa Barbara, CA.

COMP Data Mining: Concepts, Algorithms, and Applications 21 References: Mining Max- patterns and Closed Itemsets R. J. Bayardo. Efficiently mining long patterns from databases. SIGMOD'98, 85-93, Seattle, Washington. J. Pei, J. Han, and R. Mao, "CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets", Proc ACM-SIGMOD Int. Workshop on Data Mining and Knowledge Discovery (DMKD'00), Dallas, TX, May N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. ICDT'99, , Jerusalem, Israel, Jan M. Zaki. Generating Non-Redundant Association Rules. KDD'00. Boston, MA. Aug M. Zaki. CHARM: An Efficient Algorithm for Closed Association Rule Mining, TR99-10, Department of Computer Science, Rensselaer Polytechnic Institute. M. Zaki, Fast Vertical Mining Using Diffsets, TR01-1, Department of Computer Science, Rensselaer Polytechnic Institute.

COMP Data Mining: Concepts, Algorithms, and Applications 22 References: Constraint-base Frequent-pattern Mining G. Grahne, L. Lakshmanan, and X. Wang. Efficient mining of constrained correlated sets. ICDE'00, , San Diego, CA, Feb Y. Fu and J. Han. Meta-rule-guided mining of association rules in relational databases. KDOOD'95, 39-46, Singapore, Dec J. Han, L. V. S. Lakshmanan, and R. T. Ng, "Constraint-Based, Multidimensional Data Mining", COMPUTER (special issues on Data Mining), 32(8): 46-50, L. V. S. Lakshmanan, R. Ng, J. Han and A. Pang, "Optimization of Constrained Frequent Set Queries with 2-Variable Constraints", SIGMOD’99

COMP Data Mining: Concepts, Algorithms, and Applications 23 References: Constraint-base Frequent-pattern Mining R. Ng, L.V.S. Lakshmanan, J. Han & A. Pang. “Exploratory mining and pruning optimizations of constrained association rules.” SIGMOD’98 J. Pei, J. Han, and L. V. S. Lakshmanan, "Mining Frequent Itemsets with Convertible Constraints", Proc Int. Conf. on Data Engineering (ICDE'01), April J. Pei and J. Han "Can We Push More Constraints into Frequent Pattern Mining?", Proc Int. Conf. on Knowledge Discovery and Data Mining (KDD'00), Boston, MA, August R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. KDD'97, 67-73, Newport Beach, California.

COMP Data Mining: Concepts, Algorithms, and Applications 24 References: Sequential Pattern Mining Methods R. Agrawal and R. Srikant. Mining sequential patterns. ICDE'95, 3-14, Taipei, Taiwan. R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. EDBT’96. J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, M.-C. Hsu, "FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining", Proc Int. Conf. on Knowledge Discovery and Data Mining (KDD'00), Boston, MA, August H. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1: , J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, "PrefixSpan: Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth", Proc Int. Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001.

COMP Data Mining: Concepts, Algorithms, and Applications 25 References: Sequential Pattern Mining Methods B. Ozden, S. Ramaswamy, and A. Silberschatz. Cyclic association rules. ICDE'98, , Orlando, FL. S. Ramaswamy, S. Mahajan, and A. Silberschatz. On the discovery of interesting patterns in association rules. VLDB'98, , New York, NY. M.J. Zaki. Efficient enumeration of frequent sequences. CIKM’98. Novermber M.N. Garofalakis, R. Rastogi, K. Shim: SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. VLDB 1999: , Edinburgh, Scotland. Wei Wang, Jiong Yang, Philip S. Yu: Mining Patterns in Long Sequential Data with Noise. SIGKDD Explorations 2(2): (2000) Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han: Mining Long Sequential Patterns in a Noisy Environment. SIGMOD Conference 2002

COMP Data Mining: Concepts, Algorithms, and Applications 26 References: Periodic Pattern Mining Methods Jiawei Han, Wan Gong, Yiwen Yin: Mining Segment-Wise Periodic Patterns in Time-Related Databases. KDD 1998: Jiawei Han, Guozhu Dong, Yiwen Yin: Efficient Mining of Partial Periodic Patterns in Time Series Database. ICDE 1999: Jiong Yang, Wei Wang, Philip S. Yu: Mining asynchronous periodic patterns in time series data. KDD 2000: Wei Wang, Jiong Yang, Philip S. Yu: Meta-patterns: Revealing Hidden Periodic Patterns. ICDM 2001: Jiong Yang, Wei Wang, Philip S. Yu: Infominer: mining surprising periodic patterns. KDD 2001:

COMP Data Mining: Concepts, Algorithms, and Applications 27 References: Mining Various Databases K. Koperski, J. Han, and G. B. Marchisio, "Mining Spatial and Image Data through Progressive Refinement Methods", Revue internationale de gomatique (European Journal of GIS and Spatial Analysis), 9(4): , A. K. H. Tung, H. Lu, J. Han, and L. Feng, "Breaking the Barrier of Transactions: Mining Inter-Transaction Association Rules", Proc Int. Conf. on Knowledge Discovery and Data Mining (KDD'99), San Diego, CA, Aug. 1999, pp

COMP Data Mining: Concepts, Algorithms, and Applications 28 References: Mining Various Databases H. Lu, L. Feng, and J. Han, "Beyond Intra-Transaction Association Analysis:Mining Multi-Dimensional Inter-Transaction Association Rules", ACM Transactions on Information Systems (TOIS’00), 18(4): , O. R. Zaiane, M. Xin, J. Han, "Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs," Proc. Advances in Digital Librar ies Conf. (ADL'98), Santa Barbara, CA, April 1998, pp O. R. Zaiane, J. Han, and H. Zhu, "Mining Recurrent Items in Multimedia with Progressive Resolution Refinement", Proc Int. Conf. on Data Engineering (ICDE'00), San Diego, CA, Feb. 2000, pp

COMP Data Mining: Concepts, Algorithms, and Applications 29 References: Frequent-pattern Mining for Classification and Data Cube K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. SIGMOD'99, , Philadelphia, PA, June M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing iceberg queries efficiently. VLDB'98, , New York, NY, Aug J. Han, J. Pei, G. Dong, and K. Wang, “Computing Iceberg Data Cubes with Complex Measures”, Proc. ACM-SIGMOD’2001, Santa Barbara, CA, May M. Kamber, J. Han, and J. Y. Chiang. Metarule-guided mining of multi-dimensional association rules using data cubes. KDD'97, , Newport Beach, California. K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. SIGMOD’99 T. Imielinski, L. Khachiyan, and A. Abdulghani. Cubegrades: Generalizing association rules. Technical Report, Aug. 2000

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL THANK YOU!