شهره کاظمی 1 آزمايشکاه سيستم های هوشمند (http://ce.aut.ac.ir/islab) گزارش پيشرفت کار پروژه مدل مارکف.

Slides:



Advertisements
Similar presentations
Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)
Advertisements

Data Mining Classification: Alternative Techniques
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data e Web Mining Paolo Gobbo
Markov Models.
An Approach to ECG Delineation using Wavelet Analysis and Hidden Markov Models Maarten Vaessen (FdAW/Master Operations Research) Iwan de Jong (IDEE/MI)
Sensitivity Analysis In deterministic analysis, single fixed values (typically, mean values) of representative samples or strength parameters or slope.
Model Assessment, Selection and Averaging
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
Empirical Investigations of WWW Surfing Paths Jim Pitkow User Interface Research Xerox Palo Alto Research Center.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
WebKDD 2001 Aristotle University of Thessaloniki 1 Effective Prediction of Web-user Accesses: A Data Mining Approach Nanopoulos Alexandros Katsaros Dimitrios.
Human Memory Model Predicting Document Access in Large Multimedia Repositories (1996) JAMES E. PITKOW, MARGARET M. RECKER Sam Boham, Asif Hussaini, Christian.
Mining Longest Repeating Subsequences to Predict World Wide Web Surfing Jatin Patel Electrical and Computer Engineering Wayne State University, Detroit,
Web Mining Research: A Survey
شهره کاظمی 1 آزمايشکاه سيستم های هوشمند ( A Simple Definition of Portal Shohreh kazemi
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.
Markov Models. Markov Chain A sequence of states: X 1, X 2, X 3, … Usually over time The transition from X t-1 to X t depends only on X t-1 (Markov Property).
Handling Categorical Data. Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to.
Probabilistic Model of Sequences Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Sampling a web subgraph Paraskevas V. Lekeas Proceedings of the 5 th Algorithms, Scientific Computing, Modeling and Simulation (ASCOMS), Web conference,
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Hybrid Prefetching for WWW Proxy Servers Yui-Wen Horng, Wen-Jou Lin, Hsing Mei Department of Computer Science and Information Engineering Fu Jen Catholic.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Downscaling in time. Aim is to make a probabilistic description of weather for next season –How often is it likely to rain, when is the rainy season likely.
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.
Web Usage Patterns Ryan McFadden IST 497E December 5, 2002.
Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014.
Lesley Charles November 23, 2009.
Survey on Improving Dynamic Web Performance Guide:- Dr. G. ShanmungaSundaram (M.Tech, Ph.D), Assistant Professor, Dept of IT, SMVEC. Aswini. S M.Tech CSE.
Temporal Analysis using Sci2 Ted Polley and Dr. Katy Börner Cyberinfrastructure for Network Science Center Information Visualization Laboratory School.
Laxman Yetukuri T : Modeling of Proteomics Data
Mining Click-stream Data With Statistical and Rule-based Methods Martin Labský, Vladimír Laš, Petr Berka University of Economics, Prague.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Srivastava J., Cooley R., Deshpande M, Tan P.N.
1 Murat Ali Bayır Middle East Technical University Department of Computer Engineering Ankara, Turkey A New Reactive Method for Processing Web Usage Data.
CHI Web Behavior Patterns1 Separating the Swarm Categorization Methods for User Sessions on the Web Jeffrey Heer, Ed H. Chi Palo Alto Research.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
Mining Patterns in Long Sequential Data with Noise Wei Wang, Jiong Yang, Philip S. Yu ACM SIGKDD Explorations Newsletter Volume 2, Issue 2 (December 2000)
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Probabilistic Suffix Trees Maria Cutumisu CMPUT 606 October 13, 2004.
Web Analytics Xuejiao Liu INF 385F: WIRED Fall 2004.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern.
ASSOCIATIVE BROWSING Evaluating 1 Jin Y. Kim / W. Bruce Croft / David Smith by Simulation.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Using Multiple Predictors to Improve the Accuracy of File Access Predictions Gary A. S. Whittle, U of Houston Jehan-François Pâris, U of Houston Ahmed.
A Markov Model for Web Request Prediction By Habel Kurian MASTER OF SCIENCE Report Presentation Department of Computing and Information Sciences Kansas.
Personalizing the Web Todd Lanning Project 1 - Presentation CSE 8331 Dr. M. Dunham.
By Subhasis Dasgupta Asst Professor Praxis Business School, Kolkata Classification Modeling Decision Tree (Part 2)
Information Overload on the Internet: The Web Mining Techniques Approach UNIVERSITI UTARA MALAYSIA COLLEGE OF ARTS AND SCIENCES RESEARCH METHODOLOGY (SZRZ6014)
Clickprints on the Web: Are there Signatures in Web Browsing Data?
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
Effective Prediction of Web-user Accesses: A Data Mining Approach
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Lin Lu, Margaret Dunham, and Yu Meng
Chapter 12: Automated data collection methods
Web Mining Department of Computer Science and Engg.
Effective Prediction of Web-user Accesses: A Data Mining Approach
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

شهره کاظمی 1 آزمايشکاه سيستم های هوشمند ( گزارش پيشرفت کار پروژه مدل مارکف

شهره کاظمی 2 آزمايشکاه سيستم های هوشمند ( Modeling and Predicting a User’s Browsing Behavior  the problem of modeling and predicting a user’s browsing behavior on a Web site can be used to improve: the Web cache performance [1; 2; 3] recommend related pages [4;5] improve search engines [6] understand and influence buying patterns [7] personalize the browsing experience [8]

شهره کاظمی 3 آزمايشکاه سيستم های هوشمند ( Markov models  Markov models [9] have been used for studying and understanding stochastic processes  They shown to be well suited for modeling and predicting a user’s browsing behavior on a Web site.

شهره کاظمی 4 آزمايشکاه سيستم های هوشمند ( Markov models  In general, the input for these problems is the sequence of Web pages accessed by a user  The goal is to build Markov models that can be used to predict the Web page that the user will most likely access next

شهره کاظمی 5 آزمايشکاه سيستم های هوشمند ( Markov Models for Predicting Next-Accessed Page  The act of a user browsing a Web site is commonly modeled by observing the set of pages that he or she visits[10]  This set of pages is referred to as a Web session W =( P1,P2,..., Pl )

شهره کاظمی 6 آزمايشکاه سيستم های هوشمند ( Markov Models for Predicting Next-Accessed Page  The next-page prediction problem can be solved using a probabilistic framework as follows:  Let W be a user’s Web session of length l  let P( pi | W ) be the probability that the user visits page pi next  Then the page pl+1 that the user will visit next is given by

شهره کاظمی 7 آزمايشکاه سيستم های هوشمند ( Markov Models for Predicting Next-Accessed Page  the probability of visiting a page pi does not depend on all the pages in the Web session, but only on a small set of k preceding pages, where k « l  Then we have:

شهره کاظمی 8 آزمايشکاه سيستم های هوشمند ( Markov Models for Predicting Next-Accessed Page  The number of preceding pages k that the next page depends on is called the order of the Markov model, and the resulting model M is called the kth-order Markov model

شهره کاظمی 9 آزمايشکاه سيستم های هوشمند ( P1 P2 P4 P3 P5 Markov Models for Predicting Next-Accessed Page the site map for a sample Web site as a directed graph

شهره کاظمی 10 آزمايشکاه سيستم های هوشمند ( Markov Models for Predicting Next-Accessed Page a set of Web sessions that were generated on this Web site Training set W1 : W2 : W3 : W4 : W5 : W6 : Test set: Wt 1 :

شهره کاظمی 11 آزمايشکاه سيستم های هوشمند ( Markov Models for Predicting Next-Accessed Page the frequencies of different states for first-order Markov models 1 st –Order States Fr.P1P1 P2P2 P3P3 P4P4 P5P5 S(1,1)= S(1,2)= S(1,3)= S(1,4)= S(1,5)=

شهره کاظمی 12 آزمايشکاه سيستم های هوشمند ( Markov Models for Predicting Next-Accessed Page the frequencies of different states for second-order Markov models

شهره کاظمی 13 آزمايشکاه سيستم های هوشمند ( how these models are used to predict the most probable page for Web session Wt1 Markov Models for Predicting Next-Accessed Page

شهره کاظمی 14 آزمايشکاه سيستم های هوشمند ( Performance Measures for Markov Models  The first is the accuracy of the model  The second is the number of states of the model  The third is the coverage of the mode the ratio of the number of Web sessions for which the model is able to correctly predict the hidden page to the total number of Web sessions in the test set the total number of states for which a Markov model has estimated the ratio of the number of Web sessions whose state required for making a prediction was found in the model to the total number of Web sessions in the test set

شهره کاظمی 15 آزمايشکاه سيستم های هوشمند ( Lower-order Markov models  lower-order Markov models (first or second) are not successful in accurately predicting the next page to be accessed by the user  Because these models do not look far into the past

شهره کاظمی 16 آزمايشکاه سيستم های هوشمند ( Higher-order Markov models  In order to obtain better predictions, higher- order models must be used  these higher-order models have a number of limitations: (i) high state-space complexity (ii) reduced coverage (iii) sometimes even worse accuracy due to the lower coverage

شهره کاظمی 17 آزمايشکاه سيستم های هوشمند ( Comparing accuracy, coverage and model size with the order of Markov model

شهره کاظمی 18 آزمايشکاه سيستم های هوشمند ( All-Kth-Order Markov model  One method to overcome coverage problem is to train varying order Markov models and then combine them for prediction[8]  For each test instance, the highest-order Markov model that covers the instance is used for prediction  This scheme is called : All-Kth-Order Markov model  But it increases the problem of model size

شهره کاظمی 19 آزمايشکاه سيستم های هوشمند (  Some techniques developed to intelligently combine different order Markov models  The resulting model : Has low state complexity, Retains the coverage of the All-Kth-Order Markov model Achieves comparable accuracies

شهره کاظمی 20 آزمايشکاه سيستم های هوشمند ( Frequency based  They are based on the observation that states that occur with low frequency in the training set, tend to also have low prediction accuracies  These low frequency states can be eliminated without affecting the accuracy of the resulting model

شهره کاظمی 21 آزمايشکاه سيستم های هوشمند ( Frequency based  The amount of pruning is controlled by the parameter Φ referred to as the frequency threshold  Note that they will never prune a state from a first-order Markov model that will not reduce the coverage of the original model

شهره کاظمی 22 آزمايشکاه سيستم های هوشمند ( Frequency based Frequency threshold Accuracy# states

شهره کاظمی 23 آزمايشکاه سيستم های هوشمند ( Error based  The final predictions are computed by using only the states of the model that have the smallest estimated error rate  the error associated with each state is estimated by a validation step  A higher-order state is pruned by comparing its error rate with the error rate of its lower- order states

شهره کاظمی 24 آزمايشکاه سيستم های هوشمند (  For example, to prune the state S( 3,q) (Pi, Pj, Pk), its error rate will be compared with the error rate for states S( 2,r) (Pj, Pk), and state S( 1,s) (Pk); the state S( 3,q) will be pruned if its error rate is higher than any of them. Error based

شهره کاظمی 25 آزمايشکاه سيستم های هوشمند ( Training and validating Web sessions

شهره کاظمی 26 آزمايشکاه سيستم های هوشمند ( Various order Markov states with their maximum frequency page

شهره کاظمی 27 آزمايشکاه سيستم های هوشمند ( Error rates for Markov states

شهره کاظمی 28 آزمايشکاه سيستم های هوشمند (

شهره کاظمی 29 آزمايشکاه سيستم های هوشمند ( References [1] SCHECHTER, S., KRISHNAN, M., AND SMITH, M. D Using path profiles to predict http requests.In 7th International World Wide Web Conference [2] BESTRAVOS, A Using speculation to reduce server load and service time on www. In Proceedings of the 4th ACM International Conference of Information and Knowledge Management. ACM Press. [3] PADMANABHAM, V. AND MOGUL, J Using predictive prefetching to improve world wide web latency. Comput. Commun. Rev. [4] DEAN, J. AND HENZINGER, M. R Finding related pages in world wide web. In Proceedings of the 8th International World Wide Web Conference. [5] PIROLLI, P., PITKOW, J., AND RAO, R Silk from a sow’s ear: Extracting usable structures from the web. In Proceedings of ACM Conference on Human Factors in Computing Systems (CHI-96).

شهره کاظمی 30 آزمايشکاه سيستم های هوشمند ( [6] BRIN, S. AND PAGE, L The anatomy of large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference. [7] CHI, E., PITKOW, J., MACKINLAY, J., PIROLLI, P., GOSSWEILER, R., AND CARD, S Visualizing the evolution of web ecologies. In Proceedings of ACM Conference on Human Factors in Computing Systems (CHI 98). [8] PITKOW, J. AND PIROLLI, P Mining longest repeating subsequence to predict world wide web surfing. In 2nd USENIX Symposium on Internet Technologies and Systems. Boulder, CO. [9] PAPOULIS, A Probability, Random Variables, and Stochastic Processes. McGraw Hill. [10] SRIVASTAVA, J., COOLEY, R., DESHPANDE, M., AND TAN, P.-N Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explor. 1, 2.