Download presentation
Presentation is loading. Please wait.
Published byChristal Doyle Modified over 8 years ago
1
Fuzzy Set Approach for Improving Web Log Mining Sajitha Naduvil-Vadukootu Csc 8810 : Computational Intelligence Instructor: Dr. Yanqing Zhang Dec 4, 2006
2
Agenda Introduction to Web Log Mining Episode Identification : Existing techniques Improvement: Fuzzy Set Approach Simulations & Results Challenges & Future Work Questions
3
Web Log Mining: Introduction Site Structure Access Log Web Crawler Association Mining Association Rules Extracting& Filtering, User Identification, Session Identification, Path Completion, Episode Identification
4
Episode Identification: Maximal Forward Reference {A,B,C,D,C,B,A,E,F} Episodes: {A,B,C,D} {A,E,F} Rules generated :{B->A,C->A,D->A,…} Maximal Reference Length {(A,1),(B,1),(C,20),(D,80),(C,1),(B,1),(A,1),(E,30),(F,6 0)} Episodes: {A,B,C} {D} {A,E} {F} Rules: {B->A,C->A,…}
5
Page Request Classification Navigational requests and Content Requests Request Time Interval as a classification aid Maximal Reference Length Method for Episode Identification What should be the cut off time interval ?
6
Fuzzy Set Approach Consider Request Time Interval as linguistic variable. We define two linguistic values : High and Low for request time interval. High => Request is Content Low => Request is Navigational “High” Member ship function is triangular. Slope=3.33e-6 0300000 1.0 Navigational Content
7
Fuzzy Set Approach Consider “content” function value as support weight for that request. To calcuate page 7447’s support: Select avg(support) where targetid = 7447 support ({7447,7448}) = max(support(7447)+ support(7448)) ID TIME INTERVA LTARGETIDSUPPORT 47330000074400.99 4746100074410.2013 47530000074420.99 476174410 47730000074430.99 478174410 47930000074440.99 4806200074450.2046 48130000074460.99 4826000074470.198 48330000074480.99 484174470 48530000074490.99 486174470
8
Simulation & Results Configuration: Support Count = 5 Confidence = 0.001 DataS et size Number of Rules DiscoveredRunning Time (seconds)Relevant Rules (limit = 10 sec) Maximal Forward Referenc e Max Referenc e Length (cut off = 1 sec) Fuzzy Hybrid Maximal Forward Referen ce Max Reference Length (cut off = 1 sec) Fuzzy Hybrid Maximal Forward Reference Max Reference Length (cut off = 1 sec) Fuzzy Hybrid 50002800401212032600 100003820992404593800 2000056241534488505424 500007724607214123253229377039272
9
Challenges & Future Work Improved Metrics for measuring “Relevance” / “Interestingness” Determining a more suitable membership function Performance on Very Large Datasets
10
References 1) J. Srivastava, R. Cooley, M. Deshpande, P-T. Tan. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. 2) R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1), February 1999. http://citeseer.ist.psu.edu/cooley99data.htmlhttp://citeseer.ist.psu.edu/cooley99data.html 3) Rakesh Agrawal and Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules. In Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, September 1994. http://citeseer.ist.psu.edu/agrawal94fast.html http://citeseer.ist.psu.edu/agrawal94fast.html 4) Rakesh Agrawal and Ramakrishnan Srikant. Mining Sequential Patterns. In Proc. of the 11th Int'l Conference on Data Engineering, Taipei, Taiwan, March 1995. http://citeseer.ist.psu.edu/agrawal95mining.htmlhttp://citeseer.ist.psu.edu/agrawal95mining.html 5) R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1), February 1999. http://citeseer.ist.psu.edu/article/cooley99data.htmlhttp://citeseer.ist.psu.edu/article/cooley99data.html 6) Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava. Grouping web page references into transactions for mining world wide web browsing patterns. Technical Report TR 97-021, Dept. of Computer Science, Univ. of Minnesota, Minneapolis, USA, June 1997. http://citeseer.ist.psu.edu/cooley97grouping.htmlhttp://citeseer.ist.psu.edu/cooley97grouping.html 7) Myra Spiliopoulou and Lukas Faulstich, C. WUM: A Tool for Web Utilization Analysis. In EDBT Workshop WebDB'98, Valencia, Spain, Mar. 1998. http://citeseer.ist.psu.edu/article/spiliopoulou98wum.htmlhttp://citeseer.ist.psu.edu/article/spiliopoulou98wum.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.