Fuzzy Set Approach for Improving Web Log Mining Sajitha Naduvil-Vadukootu Csc 8810 : Computational Intelligence Instructor: Dr. Yanqing Zhang Dec 4, 2006
Agenda Introduction to Web Log Mining Episode Identification : Existing techniques Improvement: Fuzzy Set Approach Simulations & Results Challenges & Future Work Questions
Web Log Mining: Introduction Site Structure Access Log Web Crawler Association Mining Association Rules Extracting& Filtering, User Identification, Session Identification, Path Completion, Episode Identification
Episode Identification: Maximal Forward Reference {A,B,C,D,C,B,A,E,F} Episodes: {A,B,C,D} {A,E,F} Rules generated :{B->A,C->A,D->A,…} Maximal Reference Length {(A,1),(B,1),(C,20),(D,80),(C,1),(B,1),(A,1),(E,30),(F,6 0)} Episodes: {A,B,C} {D} {A,E} {F} Rules: {B->A,C->A,…}
Page Request Classification Navigational requests and Content Requests Request Time Interval as a classification aid Maximal Reference Length Method for Episode Identification What should be the cut off time interval ?
Fuzzy Set Approach Consider Request Time Interval as linguistic variable. We define two linguistic values : High and Low for request time interval. High => Request is Content Low => Request is Navigational “High” Member ship function is triangular. Slope=3.33e Navigational Content
Fuzzy Set Approach Consider “content” function value as support weight for that request. To calcuate page 7447’s support: Select avg(support) where targetid = 7447 support ({7447,7448}) = max(support(7447)+ support(7448)) ID TIME INTERVA LTARGETIDSUPPORT
Simulation & Results Configuration: Support Count = 5 Confidence = DataS et size Number of Rules DiscoveredRunning Time (seconds)Relevant Rules (limit = 10 sec) Maximal Forward Referenc e Max Referenc e Length (cut off = 1 sec) Fuzzy Hybrid Maximal Forward Referen ce Max Reference Length (cut off = 1 sec) Fuzzy Hybrid Maximal Forward Reference Max Reference Length (cut off = 1 sec) Fuzzy Hybrid
Challenges & Future Work Improved Metrics for measuring “Relevance” / “Interestingness” Determining a more suitable membership function Performance on Very Large Datasets
References 1) J. Srivastava, R. Cooley, M. Deshpande, P-T. Tan. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. 2) R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1), February ) Rakesh Agrawal and Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules. In Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, September ) Rakesh Agrawal and Ramakrishnan Srikant. Mining Sequential Patterns. In Proc. of the 11th Int'l Conference on Data Engineering, Taipei, Taiwan, March ) R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1), February ) Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava. Grouping web page references into transactions for mining world wide web browsing patterns. Technical Report TR , Dept. of Computer Science, Univ. of Minnesota, Minneapolis, USA, June ) Myra Spiliopoulou and Lukas Faulstich, C. WUM: A Tool for Web Utilization Analysis. In EDBT Workshop WebDB'98, Valencia, Spain, Mar