Mining Longest Repeating Subsequences to Predict World Wide Web Surfing Jatin Patel Electrical and Computer Engineering Wayne State University, Detroit,

Slides:



Advertisements
Similar presentations
شهره کاظمی 1 آزمايشکاه سيستم های هوشمند ( گزارش پيشرفت کار پروژه مدل مارکف.
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.
Managerial Decision Modeling with Spreadsheets
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.
Planning under Uncertainty
CPSC 335 Computer Science University of Calgary Canada.
Empirical Investigations of WWW Surfing Paths Jim Pitkow User Interface Research Xerox Palo Alto Research Center.
Evaluating Search Engine
Chapter 12: Web Usage Mining - An introduction
1 Prefetching the Means for Document Transfer: A New Approach for Reducing Web Latency 1. Introduction 2. Data Analysis 3. Pre-transfer Solutions 4. Performance.
WebKDD 2001 Aristotle University of Thessaloniki 1 Effective Prediction of Web-user Accesses: A Data Mining Approach Nanopoulos Alexandros Katsaros Dimitrios.
Heuristic alignment algorithms and cost matrices
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Data Mining.
Evaluating Hypotheses
Towards a Better Understanding of Web Resources and Server Responses for Improved Caching Craig E. Wills and Mikhail Mikhailov Computer Science Department.
Locality Optimizations in OceanStore Patrick R. Eaton Dennis Geels An introduction to introspective techniques for exploiting locality in wide area storage.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
1 Drafting Behind Akamai (Travelocity-Based Detouring) AoJan Su, David R. Choffnes, Aleksandar Kuzmanovic, and Fabian E. Bustamante Department of Electrical.
Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.
Part III: Inference Topic 6 Sampling and Sampling Distributions
Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Data Cache Prefetching using a Global History Buffer Presented by: Chuck (Chengyan) Zhao Mar 30, 2004 Written by: - Kyle Nesbit - James Smith Department.
Trieschmann, Hoyt & Sommer Risk Identification and Evaluation Chapter 2 ©2005, Thomson/South-Western.
 A data processing system is a combination of machines and people that for a set of inputs produces a defined set of outputs. The inputs and outputs.
By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.
Achieving Load Balance and Effective Caching in Clustered Web Servers Richard B. Bunt Derek L. Eager Gregory M. Oster Carey L. Williamson Department of.
Hybrid Prefetching for WWW Proxy Servers Yui-Wen Horng, Wen-Jou Lin, Hsing Mei Department of Computer Science and Information Engineering Fu Jen Catholic.
Storage Allocation in Prefetching Techniques of Web Caches D. Zeng, F. Wang, S. Ram Appeared in proceedings of ACM conference in Electronic commerce (EC’03)
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
CMPE 421 Parallel Computer Architecture
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
Web Prefetching Between Low-Bandwidth Clients and Proxies : Potential and Performance Li Fan, Pei Cao and Wei Lin Quinn Jacobson (University of Wisconsin-Madsion)
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Chapter 8 Cookies And Security JavaScript, Third Edition.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Qingqing Gan Torsten Suel CSE Department Polytechnic Institute of NYU Improved Techniques for Result Caching in Web Search Engines.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
CSC 211 Data Structures Lecture 13
Abdullah Aldahami ( ) March 23, Introduction 2. Background 3. Simulation Techniques a.Experimental Settings b.Model Description c.Methodology.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Securing Passwords Against Dictionary Attacks Presented By Chad Frommeyer.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Effective Prediction of Web-user Accesses: A Data Mining Approach
Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms By Monika Henzinger Presented.
Whether you decide to use hidden frames or XMLHttp, there are several things you'll need to consider when building an Ajax application. Expanding the role.
Chapter 12: Automated data collection methods
BGP update profiles and the implications for secure BGP update validation processing Geoff Huston PAM April 2007.
Objective of This Course
Effective Prediction of Web-user Accesses: A Data Mining Approach
Retrieval Performance Evaluation - Measures
Presentation transcript:

Mining Longest Repeating Subsequences to Predict World Wide Web Surfing Jatin Patel Electrical and Computer Engineering Wayne State University, Detroit, MI

Introduction Predicting user surfing paths involves tradeoffs between model complexity and predictive accuracy. Aim of this paper is to reduce model complexity while retaining predictive accuracy. Two Techniques: (1) Longest Repeating Subsequences (LRS): –Focus on surfing patterns. –Reduce the complexity of the model. (2) Weighted specificity: –Focus on the longer patterns of past surfing. –Longer surfing are more predictive. Both techniques are able to dramatically reduce model complexity while retaining a high degree of predictive accuracy.

Surfing Paths Figure shows the diffusion of surfers through a web site. (1)Users begin surfing a web site starting from different entry pages. (2)as they surf the web site, users arrive at specific web pages having traveled different surfing paths. (3)Users choose to traverse possible paths leading from pages they are currently visiting (4)after surfing through some number of pages, users stop or go to another web site.

Application of Predictive Models Search: The ability to accurately predict user surfing patterns could lead to a number of improvements in user WWW interaction. Google search engine assumes that a model of surfing can lead to improvements in the precision of text-based search engine results. The distribution of visit over all WWW pages is obtained from this model is used to re-weight and re-rank the results of a text- based search engine.

Latency Reduction Predictive models have significant potential to reduce user-perceived WWW latencies. Delays is the Number One Problem in using the WWW. One solution is to improve prefetching and caching methods. A Number method including Markov models tells that if system could predict the content a surfer was going to visit than that pages can be prefetched to low latency local storage. Kroeger, Long and Mogul also tells that improvements in WWW interaction latencies that might be gained by predicting surfer paths. Latencies are divided into two parts. (1) Internal latencies caused by computers and networks utilized by the clients and proxies. (2) External latencies caused by computers and networks between the proxies and external WWW servers.

Predictive Surfing Models Path Profiles: Schechter, Krishnan and Smith utilized path and point profiles. The profile are constructed for the user sessions. The data are collected for a certain period of time. This data is used to predict the future web page. Example: surfer’s path is than the best match is than. Also important thing is to reduce the model complexity. To reduce model size, Schechter et al. Used a maximal prefix trie with a minimal threshold requirement for repeating prefixes.

First Order Markov Models In this method dependency graph contains nodes for all file sever accessed at a particular WWW server. Dependency arcs between nodes indicated that one file was accessed within some number of accesses w of another file. The arcs were weighted to reflect access rates. Latency reduction increases as w increased from w = 2 to w = 4. Prefetching methods essentially record surfing path transitions and use there data to predict future transitions. Transition can be recorded any were (i.e. proxy, server etc.) The important point is that all the methods seemed to improve predictions when they stored longer path dependencies.

K th - Order Markov Models Here author evaluates the predictive capabilities of Kth order Markov using ten days of log files collected at the xerox.com web site. The results of this analysis suggest that storing longer path dependencies would lead to better prediction accuracy. Surfing path can be represented as n-grams. N-grams looks like this:, this indicates sequences of page clicks by a population of users visiting a web site. As we saw earlier, first order Markov model concerned with pate-to- page transition probabilities. That can be estimated from n-grams: p(x2 | x1)  Pr(X2  x2 | X1  x1 )

K th - Order Markov Models (cont.) If we want to capture longer surfing paths, we may wish to consider the conditional probability that surfer transitions to an nth page given their pervious k = n-1. p(xn | xn  1,...xn  k )  Pr(Xn  xn | Xn  1,...,Xn  k ). Summary: Here author uses the data collected from xerox.com, and systematically tested the properties of Kth - order Markov models. The models were estimated form surfing transitions extracted from training set of WWW server log file data. And tested against test sets of data that occurred after the training set. The prediction scenario assumed a surfer was just observed making k page visits.

K th - Order Markov Models (cont.) In order to make a prediction of the next page visit the model must have. 1) An estimate of p(xn|xn-1,…xn-k ) from the training data, which required that 2) A path of k visits had been observed in the training data. When the path matches between training and test data, the model examined all the conditional probabilities p(xn|xn-1,…xn-k ) available. Predict the page having the highest probability of occurring. The important thing to note is that model did not make a prediction when a matching path in the model didn’t exist.

K th - Order Markov Models (cont.) Table 1 presents effect of order of Markov model in prefetching. Pr(Match) the probability that a penultimate path,, observed in the test data was matched by the same penultimate path in the training data. Pr(Hit|Match) the conditional probability that page xn is visited, given that, is the penultimate path and the highest probability conditional on that path is p(xn|xn-1,…xn-k ). Pr(Hit) = Pr(Hit|Match)Pr(Match), the probability that the page visited in the test set is the one estimated from the training as the most likely to occur. Pr(Miss|Match) the conditional probability that page xn is not visited, given that, is the penultimate path and the highest probability conditional on that path is p(xn|xn-1,…xn-k ).

K th - Order Markov Models (cont.) Pr(Miss) = Pr(Miss|Match)Pr(Match), the probability that the page visited in the test set is not the one estimated from the training as the most likely to occur. The last metric provides a coarse measure of the benefit-cost ratio. Benefit:Cost = B * Pr(Hit) / C * Pr(miss) Where B and C vary between 0 and 1 and represent the relative weights associated with the benefits and costs.

Model and Prediction Methods Producing an accurate predictive model using the least amount of space has many computational benefits as well as practical benefits. The solution is to remove low information elements for the model. The LRS Technique treats the problem as a data mining task. Using LRS, the storage requirement is reduced by saving only information rich paths. As seen from the kth order Markov model higher-order result in higher predictions rates. This principle of specificity encourages the use of higher-order path matches whenever possible to maximize hit rates. The drawback of this approach is that the likelihood of a higher-order path match is quite small resulting in lower overall hit rates.

Longest Repeating Sequences A longest repeating subsequence is sequence of items where 1) Subsequence means a set of consecutive items. 2) Repeated means the item occurs more than some threshold T, where T typically equals one, and 3) Longest means that although a subsequence may be part of another repeated subsequence, there is at least once occurrence of this subsequence where this is the longest repeating. Example: Suppose we have the case where a web site contains the pages A, B, C and D. As shown in the figure, if user repeatedly visit A to B, but only one user clicks through C and one user clicks through D. (Case 1) so the longest repeating sequences is AB.

Longest Repeating Sequences (cont.) The complexity of the resulting n-grams is reduced as the low probability transitions are automatically excluded form further analysis. This reduction happens for transitions that occur only T times, which in some cases will result in a prediction not being make. In figure 2, case 1. With T = 1, LRS is AB, that means prediction will not be made after the pages A and B have been requested. So this will result is slightly loss of pattern matching. Hybrid LRS-Markov Models: First hybrid LRS model is decompose each LRS pattern into a series of corresponding one-hop n-grams. i.e. LRS ABCD would result in AB, BC, CD one-grams. The second hybrid LRS model decomposes the extracted LRS subsequences into all possible n-grams.

Longest Repeating Sequences (cont.) The resulting model is a reduced set of n-grams of various lengths. The resulting model is called All-Kth-Order LRS model. The main advantages of this model is that it incorporates the specificity principle of pattern matching by utilizing the increased predictive power contained in longer paths. Model Complexity reduction: LRS store those paths that are likely to be needed. So LRS reduces complexity and space requirement. The amount of space required for all models LRS or Markov depends on the combinations. This will vary from site to site. Also it will change from time to time.

One-hop Markov and LRS Comparison In order to test whether the hybrid LRS models help achieve the goal of reducing complexity while maintaining predictive power. The test is done on the same test data as before. For this experiment the one-hop Markov and the one-hop LRS models were built using three training days. Each model was tested to see if there was matching prefix (Match) for each path and if so, if the correct transition was predicted(Hit). From this probability of a match Pr(Match), the probability of hit given a match Pr(Hit|Match), the hit rate across all transitions Pr(Hit). The benefit cost ratio Pr(Hit)/Pr(Miss). Table 2 displays the results for the one-hop Markov model and the one-hop LRS model.

One-hop Markov and LRS Comparison (cont.) The one-hop LRS model produces a reduction in the total size required to store the model. Reducing the complexity as expected. One might expect that sharp reduction in the model’s complexity would result in an reduction of predictive ability. But LRS has still good predictive ability. The important thing here is the reduction in Number of hopes and Model Size in bytes. 13,189 one-hope for One-hop Markov model, while 4,953 one-hops for One-hop LRS model. 372,218 bytes for One-hop Markov model, while 136,177 bytes for One-hop LRS model. Also LRS has almost same hit ratio at One-hop Markov Model.

All-Kth-order Markov approximation and All-Kth-order LRS Comparison Longer path should be used to get better prediction. So here we are comparing the results between All Kth order Markov approximation and All Kth order LRS. Here All Kth order LRS is the subset of All Kth order Markov Model, so we won’t expect better results, but instead we see tradeoffs between complexity reduction and the model’s predictive power. For this experiment same test data was used. As seen from the results in Table 3 All Kth order Markov Model consumes 8800 Kbytes, while All Kth order LRS Model 616Kbytes, which is 14 times less than All Kth order Markov Mode. In terms of hit ratio, we can see from the table that All Kth Order Markov Model have 30% hit ratio while All Kth Order LRS Model have 27% hit ratio.

All-Kth-order Markov approximation and All-Kth-order LRS Comparison (cont.) Figure 3 Summarizes the results of the two experiments with respect to hit ratio. As we can see from the figure that All Kth Order Markov Model has the highest hit ratio. But the Model size is too big. Also one interesting thing is that one-hop Markov model provides 83% of the predictive power while consuming only 4.2% of the space compared to All Kth Order Markov model. Also one-hop LRS model provides 80% of the predictive power while using 1.5% of the space.

Parameterization of Prediction Set Restricting the prediction to only one guess imposes a rather stringent constraint. One could also predict a larger set of pages that could be surfed to on the next click by a user. Here we have each model’s performance when returning sets of between one and ten. Figure shows that All Kth order models perform better. Also we can see that All Kth Order Markov performs best. The important thing to note is that the increasing the prediction set has dramatic impact on predictive power. As we can see from the figure that with the predictive power of each method nearly doubling by increasing the set size to four elements.

Future Work This paper focused on various Markov models,the concept of LRS can be successfully applied to Markov models in other domains as well as to other suffix-based methods. In this theory repeating can be defined to be any occurrence threshold T. determining the ideal threshold will depend upon the specific data and the intended application. The another important thing is the confidence level for each prediction. A modified pattern-matching algorithm could be restricted to only make predictions when a given probability of making a successful prediction was achieved. Another application of LRS models could be in HTTP server, in which server threads issuing hint lists to clients while maintaining the model in memory.

Conclusion There is always tradeoffs between model complexity and predictive power. One-hop LRS model was able to match the performance accuracy of the one-hop Markov model while reducing the complexity by nearly a third. In order to improve hit rates, All Kth Order LRS model performs well, almost equaling the performance of all Kth Order Markov model while reducing the complexity significantly.

Questions?