ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery.

Slides:



Advertisements
Similar presentations
Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25.
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical.
Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
FP-Growth algorithm Vasiljevic Vladica,
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
FP-growth. Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Yao Meng Hongli Li Database II Fall 2002.
Association Analysis: Basic Concepts and Algorithms.
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
Data Mining Association Analysis: Basic Concepts and Algorithms
Pattern Lattice Traversal by Selective Jumps Osmar R. Zaïane and Mohammad El-Hajj Department of Computing Science, University of Alberta Edmonton, AB,
2001 Dimitrios Katsaros Panhellenic Conference on Informatics (ΕΠΥ’8) 1 Efficient Maintenance of Semistructured Schema Katsaros Dimitrios Aristotle University.
Mining Negative Rules in Large Databases using GRD Dhananjay R Thiruvady Supervisor: Professor Geoffrey Webb.
SEG Tutorial 2 – Frequent Pattern Mining.
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
林俊宏 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Sequential PAttern Mining using A Bitmap Representation
Chapter 1 Introduction to Data Mining
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Mining High Utility Itemset in Big Data
Mining Frequent Patterns without Candidate Generation.
Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一.
Implementation of “A New Two-Phase Sampling Based Algorithm for Discovering Association Rules” Tokunbo Makanju Adan Cosgaya Faculty of Computer Science.
SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Dept. of CSIE, National Taipei University.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
LCM ver.3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
Data Warehousing Lecture-30 What can Data Mining do? Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
Text Document Categorization by Term Association Maria-luiza Antonie Osmar R. Zaiane University of Alberta, Canada 2002 IEEE International Conference on.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.
Optimization of Association Rules Extraction Through Exploitation of Context Dependent Constraints Arianna Gallo, Roberto Esposito, Rosa Meo, Marco Botta.
Tallahassee, Florida, 2016 CIS4930 Introduction to Data Mining Midterm Review Peixiang Zhao.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Fast Mining Frequent Patterns with Secondary Memory Kawuu W. Lin, Sheng-Hao Chung, Sheng-Shiung Huang and Chun-Cheng Lin Department of Computer Science.
Mining Dependent Patterns
Sequential Pattern Mining Using A Bitmap Representation
Frequent Pattern Mining
Byung Joon Park, Sung Hee Kim
CARPENTER Find Closed Patterns in Long Biological Datasets
A Parameterised Algorithm for Mining Association Rules
Farzaneh Mirzazadeh Fall 2007
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
FP-Growth Wenlong Zhang.
DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004
Finding Frequent Itemsets by Transaction Mapping
15-826: Multimedia Databases and Data Mining
Promising “Newer” Technologies to Cope with the
Presentation transcript:

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining Mohammad El-HajjOsmar R. Zaïane KDD 2003 Department of Computing Science University of Alberta, Canada

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Outline Introduction Pre-Processing Phase Mining Phase Transactional Layouts Building COFI-trees Mining COFI-trees Experimental Studies Conclusion and Future work Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Interactive Mining Cleaning and Integration Data warehouse Databases Selection and Transformation Data Mining Evaluation and Presentation Patterns Knowledge Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Association rule mining is crucial in many applications and plays an essential role in many important mining tasks. Association Rule Mining Frequent Itemset MiningAssociation Rules Generation 12 FIM Introduction Pre-processing Mining Phase Experiments Conclusion Antecedent  Consequent Body  Head

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Expensive candidacy generation step OR Huge Memory based Data structures Challenges for FIM 3. Non interactive mining 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Support > 4 Challenges for FIM 3. Non interactive mining 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) Introduction Pre-processing Mining Phase Experiments Conclusion Frequent 1-itemsets {A, B, C, D, E, F} Non frequent items {G, H, I, J, K, L, M, N, O, P, Q, R}

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Support > 9 Challenges for FIM 3. Non interactive mining 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) Introduction Pre-processing Mining Phase Experiments Conclusion Frequent 1-itemsets {A, B, C} Non frequent items {D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R}

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Changing the support level means expensive steps (whole process is redone) Knowledge Data warehouse Databases Selection and Transformation Data Mining Evaluation and Presentation Patterns Challenges for FIM 3. Non interactive mining 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada New association Rule mining algorithm that has the following features 1. Low Memory Dependency 2. Remove Superfluous Processing 3. Interactive Mining Ready Motivation Introduction Pre-processing Mining Phase Experiments Conclusion Without compromising scalability

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Horizontal Layout Candidacy generation can be removed (FP-Growth) Superfluous Processing Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Candidacy generation is required Minimize Superfluous Processing Transactional Layouts Vertical Layout Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Suggested Layout Inverted Matrix Layout: Combines the horizontal and vertical layouts 2 I/O passes Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Inverted Matrix Layout Pass 1, generates sorted item list (based on frequency) Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Pass 2, Generate the transactional array of the IM Transactional Layouts Inverted Matrix Layout Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Inverted Matrix Layout Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Inverted Matrix Layout There is no minimum support involved in building the Inverted Matrix. Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Support > 4 Border Support Transactional Layouts Inverted Matrix Layout Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Inverted Matrix Layout Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Inverted Matrix Layout Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Sub transactions generated from IM Introduction Pre-processing Mining Phase Experiments Conclusion Frequent sub-transaction with item F Frequent sub-transaction with item D Frequent sub-transaction with item E Frequent sub-transaction with item B Frequent sub-transaction with item C

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Frequency Count Participation Count Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Mining COFI-trees E-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada E-COFI-tree Support = Frequency count – Participation count Mining COFI-trees Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Mining COFI-trees E-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Mining COFI-trees E-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Mining COFI-trees E-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Mining COFI-trees D-COFI-tree DBA:5 DA:5 DB:8 C-COFI-tree CA:6 B-COFI-tree BA:6 Introduction Pre-processing Mining Phase Experiments Conclusion

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Experimental Studies Time needed to mine 1M transactions with different support levels Introduction Pre-processing Mining Phase Experiments Conclusion Pentium 700Mhz with 256 MB of RAM

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Experimental Studies Accumulated time needed to mine 1M transactions using 4 different support levels Introduction Pre-processing Mining Phase Experiments Conclusion Pentium 700Mhz with 256 MB of RAM Time needed in seconds to mine different transaction sizes

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Conclusion and Future work Updateable Inverted Matrix for native storage of transactions Compressing the size of Inverted Matrix Introduction Pre-processing Mining Phase Experiments Conclusion Future work New AR algorithm 1.Low memory dependency 2.No Superfluous processing 3.Interactive mining ready 4.scalable Parallelizing the mining process as well as the construction of the Inverted Matrix