Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Web usage mining: extracting unexpected periods from web logs Presenter: Hsin-Yi Huang Authors: F.Masseglia, P. Poncelet, M. Teisseire, A. Marascu DMKD.27.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 2008/05/27 2 Outline Motivation Objective Methodology Stable Period PERIO Heuristic Sequence alignment Experiment Conclusion Comments
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 2008/05/27 3 Motivation Existing Web usage mining techniques are currently based on an arbitrary division of the data or guided by presumed results. First, they depend on the above-mentioned arbitrary organization of data. Second, they cannot automatically extract “seasonal peaks” from among the stored data. Request 網頁 01/31~02/01 one log per month 200,000 navigations
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 2008/05/27 4 Objective The authors propose a specific data mining process to reveal “dense periods” in the short range automatically. Furthermore, their method can extracts the frequent sequential patterns related to the extracted periods.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab Stable Period 2008/05/27 5 ?
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 2008/05/27 6 PERIO heuristic … PnPn C1C2C3CnC1C2C3Cn DB=C Pn FI=Frequent Items(C Pn ) FIXFI Candidates Evaluated Candidates Frequents Operators 1 2 Log
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 2008/05/27 7 Sequence alignment
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 2008/05/27 8 Experiment Dataset INRIA Sophia Antipolis From January 2004 to March 2005 3.5 million sequences (users)
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 2008/05/27 9 Experiment (cont.)
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 2008/05/27 10 Conclusion The method can handle a Web log of any size, with no need to divide it and identify interesting periods in the log. The method can extract a frequent even if it is frequent only for a very short period, or frequent over a period that is not included in a standard division of time.
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 2008/05/27 11 Comments Advantage a lot of illustrations Drawback … Application Web usage mining