Download presentation
Presentation is loading. Please wait.
1
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining Mohammad El-HajjOsmar R. Zaïane KDD 2003 Department of Computing Science University of Alberta, Canada
2
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Outline Introduction Pre-Processing Phase Mining Phase Transactional Layouts Building COFI-trees Mining COFI-trees Experimental Studies Conclusion and Future work Introduction Pre-processing Mining Phase Experiments Conclusion
3
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Interactive Mining Cleaning and Integration Data warehouse Databases Selection and Transformation Data Mining Evaluation and Presentation Patterns Knowledge Introduction Pre-processing Mining Phase Experiments Conclusion
4
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Association rule mining is crucial in many applications and plays an essential role in many important mining tasks. Association Rule Mining Frequent Itemset MiningAssociation Rules Generation 12 FIM Introduction Pre-processing Mining Phase Experiments Conclusion Antecedent Consequent Body Head
5
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Expensive candidacy generation step OR Huge Memory based Data structures Challenges for FIM 3. Non interactive mining 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) Introduction Pre-processing Mining Phase Experiments Conclusion
6
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Support > 4 Challenges for FIM 3. Non interactive mining 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) Introduction Pre-processing Mining Phase Experiments Conclusion Frequent 1-itemsets {A, B, C, D, E, F} Non frequent items {G, H, I, J, K, L, M, N, O, P, Q, R}
7
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Support > 9 Challenges for FIM 3. Non interactive mining 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) Introduction Pre-processing Mining Phase Experiments Conclusion Frequent 1-itemsets {A, B, C} Non frequent items {D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R}
8
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Changing the support level means expensive steps (whole process is redone) Knowledge Data warehouse Databases Selection and Transformation Data Mining Evaluation and Presentation Patterns Challenges for FIM 3. Non interactive mining 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) Introduction Pre-processing Mining Phase Experiments Conclusion
9
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada New association Rule mining algorithm that has the following features 1. Low Memory Dependency 2. Remove Superfluous Processing 3. Interactive Mining Ready Motivation Introduction Pre-processing Mining Phase Experiments Conclusion Without compromising scalability
10
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Horizontal Layout Candidacy generation can be removed (FP-Growth) Superfluous Processing Introduction Pre-processing Mining Phase Experiments Conclusion
11
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Candidacy generation is required Minimize Superfluous Processing Transactional Layouts Vertical Layout Introduction Pre-processing Mining Phase Experiments Conclusion
12
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Suggested Layout Inverted Matrix Layout: Combines the horizontal and vertical layouts 2 I/O passes Introduction Pre-processing Mining Phase Experiments Conclusion
13
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Inverted Matrix Layout Pass 1, generates sorted item list (based on frequency) Introduction Pre-processing Mining Phase Experiments Conclusion
14
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Pass 2, Generate the transactional array of the IM Transactional Layouts Inverted Matrix Layout Introduction Pre-processing Mining Phase Experiments Conclusion
15
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Inverted Matrix Layout Introduction Pre-processing Mining Phase Experiments Conclusion
16
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Inverted Matrix Layout There is no minimum support involved in building the Inverted Matrix. Introduction Pre-processing Mining Phase Experiments Conclusion
17
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Support > 4 Border Support Transactional Layouts Inverted Matrix Layout Introduction Pre-processing Mining Phase Experiments Conclusion
18
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Inverted Matrix Layout Introduction Pre-processing Mining Phase Experiments Conclusion
19
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Inverted Matrix Layout Introduction Pre-processing Mining Phase Experiments Conclusion
20
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Sub transactions generated from IM Introduction Pre-processing Mining Phase Experiments Conclusion Frequent sub-transaction with item F Frequent sub-transaction with item D Frequent sub-transaction with item E Frequent sub-transaction with item B Frequent sub-transaction with item C
21
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Frequency Count Participation Count Introduction Pre-processing Mining Phase Experiments Conclusion
22
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion
23
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion
24
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion
25
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion
26
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion
27
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion
28
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Introduction Pre-processing Mining Phase Experiments Conclusion
29
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Mining COFI-trees E-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion
30
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada E-COFI-tree Support = Frequency count – Participation count Mining COFI-trees Introduction Pre-processing Mining Phase Experiments Conclusion
31
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Mining COFI-trees E-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion
32
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Mining COFI-trees E-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion
33
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Mining COFI-trees E-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion
34
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Mining COFI-trees D-COFI-tree DBA:5 DA:5 DB:8 C-COFI-tree CA:6 B-COFI-tree BA:6 Introduction Pre-processing Mining Phase Experiments Conclusion
35
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Experimental Studies Time needed to mine 1M transactions with different support levels Introduction Pre-processing Mining Phase Experiments Conclusion Pentium 700Mhz with 256 MB of RAM
36
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Experimental Studies Accumulated time needed to mine 1M transactions using 4 different support levels Introduction Pre-processing Mining Phase Experiments Conclusion Pentium 700Mhz with 256 MB of RAM Time needed in seconds to mine different transaction sizes
37
ACM SIGKDD Aug. 2003 – Washington, DC M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Conclusion and Future work Updateable Inverted Matrix for native storage of transactions Compressing the size of Inverted Matrix Introduction Pre-processing Mining Phase Experiments Conclusion Future work New AR algorithm 1.Low memory dependency 2.No Superfluous processing 3.Interactive mining ready 4.scalable Parallelizing the mining process as well as the construction of the Inverted Matrix
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.