Presentation is loading. Please wait.

Presentation is loading. Please wait.

ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery.

Similar presentations


Presentation on theme: "ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery."— Presentation transcript:

1 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining Mohammad El-HajjOsmar R. Zaïane KDD 2003 Department of Computing Science University of Alberta, Canada

2 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Outline Introduction Pre-Processing Phase Mining Phase Transactional Layouts Building COFI-trees Mining COFI-trees Experimental Studies Conclusion and Future work Introduction Pre-processing Mining Phase Experiments Conclusion

3 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Interactive Mining Cleaning and Integration Data warehouse Databases Selection and Transformation Data Mining Evaluation and Presentation Patterns Knowledge Introduction Pre-processing Mining Phase Experiments Conclusion

4 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Association rule mining is crucial in many applications and plays an essential role in many important mining tasks. Association Rule Mining Frequent Itemset MiningAssociation Rules Generation 12 FIM Introduction Pre-processing Mining Phase Experiments Conclusion Antecedent  Consequent Body  Head

5 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Expensive candidacy generation step OR Huge Memory based Data structures Challenges for FIM 3. Non interactive mining 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) Introduction Pre-processing Mining Phase Experiments Conclusion

6 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Support > 4 Challenges for FIM 3. Non interactive mining 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) Introduction Pre-processing Mining Phase Experiments Conclusion Frequent 1-itemsets {A, B, C, D, E, F} Non frequent items {G, H, I, J, K, L, M, N, O, P, Q, R}

7 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Support > 9 Challenges for FIM 3. Non interactive mining 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) Introduction Pre-processing Mining Phase Experiments Conclusion Frequent 1-itemsets {A, B, C} Non frequent items {D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R}

8 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Changing the support level means expensive steps (whole process is redone) Knowledge Data warehouse Databases Selection and Transformation Data Mining Evaluation and Presentation Patterns Challenges for FIM 3. Non interactive mining 1. High memory dependency 2. Repetitive tasks, (I/O) readings (Superfluous Processing) Introduction Pre-processing Mining Phase Experiments Conclusion

9 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada New association Rule mining algorithm that has the following features 1. Low Memory Dependency 2. Remove Superfluous Processing 3. Interactive Mining Ready Motivation Introduction Pre-processing Mining Phase Experiments Conclusion Without compromising scalability

10 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Horizontal Layout Candidacy generation can be removed (FP-Growth) Superfluous Processing Introduction Pre-processing Mining Phase Experiments Conclusion

11 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Candidacy generation is required Minimize Superfluous Processing Transactional Layouts Vertical Layout Introduction Pre-processing Mining Phase Experiments Conclusion

12 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Suggested Layout Inverted Matrix Layout: Combines the horizontal and vertical layouts 2 I/O passes Introduction Pre-processing Mining Phase Experiments Conclusion

13 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Inverted Matrix Layout Pass 1, generates sorted item list (based on frequency) Introduction Pre-processing Mining Phase Experiments Conclusion

14 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Pass 2, Generate the transactional array of the IM Transactional Layouts Inverted Matrix Layout Introduction Pre-processing Mining Phase Experiments Conclusion

15 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Inverted Matrix Layout Introduction Pre-processing Mining Phase Experiments Conclusion

16 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Inverted Matrix Layout There is no minimum support involved in building the Inverted Matrix. Introduction Pre-processing Mining Phase Experiments Conclusion

17 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Support > 4 Border Support Transactional Layouts Inverted Matrix Layout Introduction Pre-processing Mining Phase Experiments Conclusion

18 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Inverted Matrix Layout Introduction Pre-processing Mining Phase Experiments Conclusion

19 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Transactional Layouts Inverted Matrix Layout Introduction Pre-processing Mining Phase Experiments Conclusion

20 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Sub transactions generated from IM Introduction Pre-processing Mining Phase Experiments Conclusion Frequent sub-transaction with item F Frequent sub-transaction with item D Frequent sub-transaction with item E Frequent sub-transaction with item B Frequent sub-transaction with item C

21 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Frequency Count Participation Count Introduction Pre-processing Mining Phase Experiments Conclusion

22 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

23 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

24 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

25 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

26 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

27 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Building F-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

28 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Co-Occurrences Frequent Item tree Introduction Pre-processing Mining Phase Experiments Conclusion

29 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Mining COFI-trees E-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

30 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada E-COFI-tree Support = Frequency count – Participation count Mining COFI-trees Introduction Pre-processing Mining Phase Experiments Conclusion

31 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Mining COFI-trees E-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

32 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Mining COFI-trees E-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

33 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Mining COFI-trees E-COFI-tree Introduction Pre-processing Mining Phase Experiments Conclusion

34 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Mining COFI-trees D-COFI-tree DBA:5 DA:5 DB:8 C-COFI-tree CA:6 B-COFI-tree BA:6 Introduction Pre-processing Mining Phase Experiments Conclusion

35 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Experimental Studies Time needed to mine 1M transactions with different support levels Introduction Pre-processing Mining Phase Experiments Conclusion Pentium 700Mhz with 256 MB of RAM

36 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Experimental Studies Accumulated time needed to mine 1M transactions using 4 different support levels Introduction Pre-processing Mining Phase Experiments Conclusion Pentium 700Mhz with 256 MB of RAM Time needed in seconds to mine different transaction sizes

37 ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Conclusion and Future work Updateable Inverted Matrix for native storage of transactions Compressing the size of Inverted Matrix Introduction Pre-processing Mining Phase Experiments Conclusion Future work New AR algorithm 1.Low memory dependency 2.No Superfluous processing 3.Interactive mining ready 4.scalable Parallelizing the mining process as well as the construction of the Inverted Matrix


Download ppt "ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery."

Similar presentations


Ads by Google