Web Personalization Based on Static Information and Dynamic User Behavior Center for E-Business Technology Seoul National University Seoul, Korea Nam,

Slides:



Advertisements
Similar presentations
ADBIS 2007 Aggregating Multiple Instances in Relational Database Using Semi-Supervised Genetic Algorithm-based Clustering Technique Rayner Alfred Dimitar.
Advertisements

Center for E-Business Technology Seoul National University Seoul, Korea Socially Filtered Web Search: An approach using social bookmarking tags to personalize.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
An Unsupervised Framework for Extracting and Normalizing Product Attributes from Multiple Web Sites Center for E-Business Technology Seoul National University.
Experiments on Query Expansion for Internet Yellow Page Services Using Log Mining Summarized by Dongmin Shin Presented by Dongmin Shin User Log Analysis.
Adaptive Fastest Path Computation on a Road Network : A Traffic Mining Approach Hector Gonzalez Jiawei Han Xiaolei Li Margaret Myslinska John Paul Sondag.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
Two e-Learning elective seminars in Novi Sad Putnik Z., Komlenov Ž., Budimac Z. DMI, Faculty of Science University of Novi Sad.
Looking at both the Present and the Past to Efficiently Update Replicas of Web Content Luciano Barbosa * Ana Carolina Salgado ! Francisco Tenorio ! Jacques.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Web Usage Mining: Processes and Applications
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Neural Technology and Fuzzy Systems in Network Security Project Progress 2 Group 2: Omar Ehtisham Anwar Aneela Laeeq
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Recommender systems Ram Akella November 26 th 2008.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Overview of Web Data Mining and Applications Part I
Evaluating Performance for Data Mining Techniques
A Survey on Context-Aware Computing Center for E-Business Technology Seoul National University Seoul, Korea 이상근, 이동주, 강승석, Babar Tareen Intelligent Database.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
Supporting the Automatic Construction of Entity Aware Search Engines Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti Dipartimento di Informatica.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Gradual Adaption Model for Estimation of User Information Access Behavior J. Chen, R.Y. Shtykh and Q. Jin Graduate School of Human Sciences, Waseda University,
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Mining the Structure of User Activity using Cluster Stability Jeffrey Heer, Ed H. Chi Palo Alto Research Center, Inc – SIAM Web Analytics Workshop.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
A service-oriented middleware for building context-aware services Center for E-Business Technology Seoul National University Seoul, Korea Tao Gu, Hung.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
No. 1 Classification and clustering methods by probabilistic latent semantic indexing model A Short Course at Tamkang University Taipei, Taiwan, R.O.C.,
Sunilkumar S. Manvi and P. Venkataram Protocol Engineering and Technology Unit, ECE Dept. Indian Institute of Science Bangalore, , INDIA
南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber CIDR 2007) Conference on Innovative Data Systems.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Job scheduling algorithm based on Berger model in cloud environment Advances in Engineering Software (2011) Baomin Xu,Chunyan Zhao,Enzhao Hua,Bin Hu 2013/1/251.
Personalized Course Navigation Based on Grey Relational Analysis Han-Ming Lee, Chi-Chun Huang, Tzu- Ting Kao (Dept. of Computer Science and Information.
Center for E-Business Technology Seoul National University Seoul, Korea Social Ranking: Uncovering Relevant Content Using Tag-based Recommender Systems.
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Design of On-Demand Analysis for Cloud Service Configuration using Related-Annotation Hyogun Yoon', Hanku Lee' 2 `, ' Center for Social Media Cloud Computing,
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Research Academic Computer Technology Institute (RACTI) Patras Greece1 An Algorithmic Framework for Adaptive Web Content Christos Makris, Yannis Panagis,
Chaoyang University of Technology Clustering web transactions using rough approximation Source : Fuzzy Sets and Systems 148 (2004) 131–138 Author : Supriya.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Is Context-Aware Computing Taking Control Away from the User? Three Levels of Interactivity Examined Louise Barkhuus and Anind Dey The IT University of.
To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
No. 1 Classification Methods for Documents with both Fixed and Free Formats by PLSI Model* 2004International Conference in Management Sciences and Decision.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Data mining in web applications
Storing and Replication in Topic-Based Pub/Sub Networks
Web Mining Research: A Survey
CSE591: Data Mining by H. Liu
Presentation transcript:

Web Personalization Based on Static Information and Dynamic User Behavior Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang Hyun Intelligent Database Systems Lab School of Computer Science & Engineering Seoul National University, Seoul, Korea Massimiliano Albanese, Antonio Picariello, Carlo and Lucio Sansone Department of Information and Communication Systems, University of Naples (Napoli) “Federico II”, Naples, Italy WIDM '04

Copyright  2008 by CEBT Contents  Introduction  Various personalization schemes  System architecture  The web usage mining strategy Pattern analysis, clustering, and classification The re-classification algorithm Web personalization  Experimental results  Conclusions  Evaluation IDS Lab Seminar (User Log Analysis) - 2

Copyright  2008 by CEBT Introduction  3 types of data which have to be managed in a web Content - whatever is in a web page Structure - organization of the content Log(usage) data - usage patterns of web sites  The application of data mining techniques depends on data types Web content mining, web structure mining, web usage mining  In this paper, web usage mining is considered. Described as the process of customizing the content and the structure of web sites in order to provide users with information IDS Lab Seminar (User Log Analysis) - 3

Copyright  2008 by CEBT Various personalization schemes  Yan et al A methodology for the automatic classification of web users according to their access patterns, using cluster analysis on the web logs …  Perkowitz and Etzioni First describes adaptive web sites as sites that semi-automatically improve their organization by learning from visitor access patterns – algorithm based on a clustering methodology  Srivastava et al A prototype system (WebSIFT) which performs intelligent cleansing and preprocessing for identifying users IDS Lab Seminar (User Log Analysis) - 4

Copyright  2008 by CEBT Various personalization schemes  Grandi Time-related issue Introduces an exhaustive annotated bibliography on temporal and evolution aspects in the World Wide Web  The novelty of this paper’s approach Propose a clustering process made up of 2 phases 1 st phase – A pattern analysis and classification is performed by means of an unsupervised clustering algorithm, using the registration information 2 nd phase – Re-classification is iteratively repeated until a suitable convergence is reached to overcome the inaccuracy of the registration information IDS Lab Seminar (User Log Analysis) - 5

Copyright  2008 by CEBT System architecture IDS Lab Seminar (User Log Analysis) - 6

Copyright  2008 by CEBT System architecture  User profiling The process of gathering information specific to each visitor  Log analysis / web usage mining The process of analyzing the information stored in web server logs – To extract statistical information and discover interesting usage patterns – To cluster the users into groups according to their behavior – To discover potential correlations between web pages and user groups  Content management The process of classifying the content of a web site into semantic categories to make information retrieval and presentation easier  Web site publishing A publishing mechanism that is used to present the content stored in the web server IDS Lab Seminar (User Log Analysis) - 7

Copyright  2008 by CEBT The web usage mining strategy  1 st phase The users are attributed to a tentative class by means of an unsupervised clustering algorithm. Uses the static information provided by the users Determines the number of classes the users can belong to  2 nd phase A novel re-classification algorithm is iteratively repeated until a suitable convergence in attributing each user to a class is reached Re-classification is used to overcome the inaccuracy. Accomplished by the log analysis and content management modules – On the basis of the dynamic navigational behavior of the users IDS Lab Seminar (User Log Analysis) - 8

Copyright  2008 by CEBT Pattern analysis, clustering, and classification  The task of user profiling module is to associate each user to the class that better describes her behavior To accomplish this, user have to mapped into a feature space.  Issue - the definition of the classes the users can belong to Unsupervised clustering procedure can be used for partitioning the feature space into a certain number of clusters In order to choice the optimal number of clusters, a possible approach could be the maximization of an index IDS Lab Seminar (User Log Analysis) - 9

Copyright  2008 by CEBT Pattern analysis, clustering, and classification  C index The minimization of C index indicates a good clustering. – ∵ Pairs of patterns with a small distance are in the same cluster. The denominator is used for normalization. Also used for comparing different clustering techniques 1) AutoClass C - A fuzzy clustering algorithm based on the Bayesian theory 2) the Rival Penalized Competitive Learning (RPCL) algorithm - Used for training a competitive neural network IDS Lab Seminar (User Log Analysis) - 10 S : The sum of distances over all pairs of patterns from the same cluster S min : the sum of l smallest distances S max : the sum of the l largest distances out of all pairs ( Let l be the number of those pairs ) C ∈ [0,1]

Copyright  2008 by CEBT The re-classification algorithm  Basic idea 1) Classify the web site resources based on the type of users that have accessed them 2) Re-classify the users based on both the class (previously assigned) and the resources (accessed since the last re- classification)  Content management module Selects all the significant words appearing into the different resources of the web site. Associates them to a specific content category. IDS Lab Seminar (User Log Analysis) - 11

Copyright  2008 by CEBT The re-classification algorithm  Log analysis module Registers all the activities of the users. In order to use this information for re-classifying users, we need to attribute each content category to a specific user class The process of attributing each category to a class can be accomplished by considering the first classification performed by – The chosen clustering algorithm – Counting the number of times the users of a given class ask for something belonging to a specific content category with each resource type Each content category can be attributed to a class with a probability that is proportional to the number of user requests.  The whole process will lead to convergence if after a suitable number of re-classifications the number of re-classified users leads to ‘zero’. IDS Lab Seminar (User Log Analysis) - 12

Copyright  2008 by CEBT Web personalization  Given the class of user, the contents related to the categories attributed to that class will be shown by the web site publishing module on her personalized home page when she logs in the web site.  Example All the news and the stories containing keywords related to those categories will be presented, as well as the results of last queries involving the same keywords. IDS Lab Seminar (User Log Analysis) - 13

Copyright  2008 by CEBT Experimental results  Use SOAP (a light weight, XML based protocol for exchange of information in a decentralized, distributed environment.)  A commercial web site ‘pariare.com’ Provide information about entertainment in Napoli users who already registered to the web site. The features used to initially classify the users are : – Age – Sex – Category of place in which users prefer to go – Number of times per week in which users go out – Preferred day of the week to go out – The Pariapoli parameter (a measure of the degree of interest towards the virtual community of Pariare) – Type of entertainment users are looking for IDS Lab Seminar (User Log Analysis) - 14

Copyright  2008 by CEBT Experimental results IDS Lab Seminar (User Log Analysis) - 15 The optimal number of clusters

Copyright  2008 by CEBT Experimental results IDS Lab Seminar (User Log Analysis) - 16 K : The overall percentage of re-classified users amounts (i,j) : the percentage of i-class users who have been re-classified as j-class (i,i) : the percentage of i-class users which have not been re-classified K = 3.14%K = 0.62%K = 0.16%

Copyright  2008 by CEBT Experimental results IDS Lab Seminar (User Log Analysis) - 17 The convergence is even faster ∵ The value of the C index is lower for the RPCL clustering than for the AutoClass C clustering. K = 1.78%K = 0.9%K = 0%

Copyright  2008 by CEBT Experimental results IDS Lab Seminar (User Log Analysis) - 18 AutoClass CRPCL

Copyright  2008 by CEBT Conclusions  Introduce a novel web usage mining strategy for web personalization  The novelty of this strategy relies in the fact that web site users are clustered through a two-phase process that takes into account both static information and dynamic behavior  The experimental results have confirmed the effectiveness of suggested approach. It leads to a stable classification whatever the initial classification is. IDS Lab Seminar (User Log Analysis) - 19

Copyright  2008 by CEBT Evaluation  Pros Interesting subject Introduce various personalization schemes Offer architecture and modules Show detail experiment and results  Cons Few examples – Most of contents are just explanations. Many sentences are too long. – This causes hardness to read and understand the contents. IDS Lab Seminar (User Log Analysis) - 20