Gradual Adaption Model for Estimation of User Information Access Behavior J. Chen, R.Y. Shtykh and Q. Jin Graduate School of Human Sciences, Waseda University, Japan
October 1, 2015Waseda University2 Background Why do we need information –In leisure: search a route or map for tour –In work: search business information –In learning: search academic papers Where do we get information –From traditional media such as books, magazines, etc. –From Internet How do we get information –To search it from a bookstore, library, etc. –To search it from Internet What do we have to face in information search –Too many search results including trashes
October 1, 2015Waseda University3 Information Recommendation Information recommendation –Web mining approaches Usage mining Structure mining Content mining Semantic mining –Web mining data Content data: text and multimedia provided by web sites. Structure data: organization inside a web page, internal and external links, and the web site hierarchy. Usage data: access logs data of web sites. User profile: information data of users. Semantic data: the data describe the structure and definition of semantic web sites.
October 1, 2015Waseda University4 Study Approach Proposing a gradual adaption model for estimation of user information access behavior Analyzing a variety of users' information access data in terms of short, medium, long periods, and by remarkable and exceptional categories, and based on Full Bayesian Estimation Conducting experimental simulation to show the operability and effectiveness of the proposed model
October 1, 2015Waseda University5 Related Works WUM (Web Usage Mining) –Based on implicit users’ feedback A new document representation model (Poblete and Baeza-Yates, WWW2008) –Experimented on a web site with a small number of vocabularies and specific to certain topics. Indentifying relevant web sites from user activities (Bilenko and White, WWW2008) –Needs to spend more time to train the system –Personalize information recommendation Dynamic Link Generation (Yan, et al, WWW1996) –Consists of off-line and on-line modules SUGGEST 3.0 (Baraglia and Silvestri, WT2004) –For large web sites, and only have on-line module. But the size of logs used to evaluated the system is small and limited. LinkSelector (Fang and Sheng, ACM2004) –Hyperlinks-structural and theirs access logs were used.
October 1, 2015Waseda University6 Definitions of Keyword, Link and Concept Keyword –Keywords in web pages Link –Web pages’ link Concept –Consists of a number of keywords and links nature painting Aristotle Andersen cartoon culture Leonardo Concept : Philosophy Concept: Literature Concept: Art Link a Link b
October 1, 2015Waseda University7 Full Bayesian Estimation Ð is a data collection of concept d t is the current number of click times of a concept d f is the current number of click times that a concept not be clicked α t is the history number of click times of a concept α f is the history number of click times that a concept not be clicked
October 1, 2015Waseda University8 Gradual Adaption Model Web Documents Concept KB Access Logs Concept Analyser Probability Estimator Estimation Base Input Gradual Adaption Recommender Short Medium Long Remarkable / Exceptional Matchin g Search Query Search Click Off-line On-line
October 1, 2015Waseda University9 Gradual Adaption Model We divide users’ interests into three terms of short, medium, long periods, and by remarkable, exceptional categories. This model is an adaptive one. –It can adapt to a transition of users’ information access behaviors. In the model, training is not needed, since the model uses Full Bayesian Estimation that has a learning function.
October 1, 2015Waseda University10 Gradual Adaption Model Web Documents Concept KB Access Logs Concept Analyser Probability Estimator Estimation Base Input Gradual Adaption Recommender Short Medium Long Remarkable / Exceptional Matchin g Search Query Search Click Off-line On-line
October 1, 2015Waseda University11 Simulation and Evaluation Environment –Java, Tomcat, MySQL, and Nekohtml Data –Wikipedia on DVD Version 0.5 more than 2000 web pages that belong to more than 180 concepts
October 1, 2015Waseda University12 Simulation and Evaluation Short period (such as 7 days / 1 week) –Test case This case is a user who has two interests, and these interests are affected by some factors easily. The expectation is that there is a possibility that the probability of the relation concept can change hugely in short or medium period, but not in long period. Two concepts of “Art” and “Artists” are assumed to be used, and the number of clicks is dynamically varying. –Test result The movement of the concept’s rate changing frequently. In some days, the probability of concepts in short period is bigger than long period.
October 1, 2015Waseda University13 Simulation and Evaluation Medium period (such as 30 days / 1 month) –Test case This case is a user who has a temporary interest. The user access the concept of temporary interest sometime. The expectation is that this concept ought to keep a low rate in the three periods. One concept “Philosophers” is assumed to be used per three days, –Test result The change is becoming smaller. But the probability of concepts in short period is bigger than medium period in some days.
October 1, 2015Waseda University14 Simulation and Evaluation Long period (such as 90 days / 3 months) –Test case This case is a user who has a long-term interest. The expectation is that the probability of the interested concept ought to keep a high rate in long period. One concept “Philosophical thought movements” is assumed to be used everyday, –Test result The change becomes quite stable. There is no big change in the long period.
October 1, 2015Waseda University15 Conclusion In this study, we have proposed a gradual adaption model (GAM) for estimation of user information access behavior. The three periods of GAM can correctly distinguish long-term and temporary interest of users even if has no system training.
October 1, 2015Waseda University16 Future Works To set more different patterns for short, medium and long periods to find more reasonable ones. To evaluate the proposed model with users' involvement. To compare our proposed approach with other related recommendation models.
October 1, 2015Waseda University17 Thank you for your attention.