1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing Presentation T. Yan, M. Jacobsen, H. Garcia-Molina, U. Dayal
2 Agenda Introduction Some theory The paper A short critique After the paper –Academic research –The Authors work The technology in use today Conclusion Questions
3 Introduction Hypothesis That hyperlinks to unvisited and indirectly linked pages can be offered based upon pages the user has already visited Experiment a) to analyse log files to form clusters of commonly co- accessed pages b) to categorize online users into the correct categories and offer appropriate links
4 Mass customisation Concept of adapting things to each user – on a large scale Economic benefit in adding value Satisfied shoppers also more likely to return Whats new? –In the physical world, customisation doesnt scale. –Using technology and intelligent algorithms, it can.
5 Adaptive Web Sites Sites that automatically improve their organisation and presentation based on visitor access patterns We can cluster pages on a site together based on their co-occurrence frequency –Likelihood that user will visit page P having visited Q For a user browsing the site, use session history to predict which pages a user may want to access – and so adapt site
6 The Paper Yan et al. implement an adaptive web site, based on user access logs. Paper discusses different approaches to clustering and implementation Experimental data is presented –validating the concept of clustering on an academic site –showing the value added by an adaptive website using their technique The log analysis software used is published
7 The paper - Justification Use the metaphor of a shopper browsing an online shop Adaptive site can provide links to similar items to those being browsed –eg Male Yuppie browsing executive toys –Might also be interested in sportswear As site grows, static links to related content more of a challenge - dynamic is much better Many practical examples today – but not 10 years ago!
8 Online The Paper – System Design Link Generator HTML Documents Offline Access logs PreprocessCluster User Categories URL HTML with suggestions Web Server End user
9 The paper - Preprocessing For each user session –form a n-dimensional vector of the pages visited –can weight vector elements using a metric Number of hits to page Estimate of time spent on page (possibly normalised) Close session vectors in n-dimensional space form a cluster
10 The paper - Clustering Different algorithms to cluster vectors by closeness Paper uses Leader algorithm – with additional constraints –Constraint: Minimum hits in a valid session –Constraint: Minimum cluster size Algorithm fast and memory efficient –But not order invariant
11 Dynamic Link Generation Use session history to track page a user has visited –Authors buffered logs in memory using a database –Sessions part of most web servers now Match partial vector of session with pre-calculated categories to build list of appropriate pages –Partial vector, so Euclidian distance not necessarily appropriate –May be better to simply count matching categories Filter the suggestion list to remove pages visited - and possibly any already adjacent in navigation tree
12 Paper – Experimental results Time spent on particular pages follows Zipfian distribution – not useful for page weight The authors present a number of experimental results about clustering algorithm parameters, e.g. min. cluster size Found clusters on academic website that were not evident from hypertext layout – so clustering serves purpose.
13 Critique Paper presents new concept of clustering web accesses – but essentially draws together existing work in other fields Makes key simplifications –Ignores any web caching, proxies, etc –Considering all pages in a session as being in a category is naïve – e.g. navigation pages, indexes, etc Weakness in experiments –Authors invented nominal sessions based on unique end- user addresses as server didnt support sessions –Only present data for one site 2,709 sessions – of which 50% were in the same cluster!
14 Further Work Garcia-Molina –Beyond Document Similarity: Understanding Value-Based Search and Browsing Technologies (2000) Discusses judging value of web documents based on user behaviour Dayal: –Knowledge-Based Support Services: Monitoring and Adaptation (2000) Discusses a Knowledge-Based Service deployed within HP to deliver customer support services. System adapts based on observed user patterns and evolving needs
15 Related Work Web Prefetching (Jiang & Kleinrock, 1998) –Addresses slow access speeds of World Wide Web PREDICTION MODULE: Computes access probabilities. THRESHOLD MODULE: Computes prefetch thresholds. –Uses clustering to divide users into categories by access probability Restoring Meaningful Episodes in a Proxy Log (Lou et al. 2001) –Extracting users activity information from proxy logs –Classifies individual requests into meaningful semantic elements –Semantics-based CUT-AND-PICK approach
16 Related Work SUGGEST (Baraglia et al. 2002, 2004) –No off-line component –Quality metric to estimate effectiveness of suggestions Media Agents (Wenyin et al ) –Automatic collection of semantic indices of multimedia data –Semantic descriptions from content of documents –Users interaction refines semantic indices and suggests other multimedia data
17 Custom application - Analog Applications & The Paper Uses clustering tech to analyse log files To dynamically generate possibly interesting links Means End Successful (to an extent)
Technology Directions Vivisimo Google Labs Clustering Documents Amazon Flickr Tivo Collaborative Filtering
19 Amazon.com Uses recommendation algorithm – person who bought x also bought y Item-to-item collaborative filtering –provides recommendations based on grouped items, not customers For each item in product catalog, I1 For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2 Essence
20 Amazon.com Creates vectors where each vector is an item with M dimensions (customers) Similarity between two items computed by measuring cosine of angle between two vectors. Offline computation theoretically expensive: O(N 2 M) In practice only O(NM) as most customers have few purchases.
21 Conclusion The paper was on the right track Appreciated applicability of clustering to e-commerce Hypothesis proved by experiment Failed to address or even predict scalability issues
22 References Authors Work –Yan, T., Jacobsen, M., Garcia-Molina, H., Dayal, U., From User Access Patterns to Dynamic Hypertext Linking, In: Fifth International World Wide Web Conference, 1996 (Paris, France) –Paepcke, A., Garcia-Molina, H., Rodriquez, G. and Cho, J., Beyond Document Similarity: Understanding Value-Based Search and Browsing Technologies, In: Stanford University Technical Report, 2000 –Delic, K. A. and Dayal, U., Knowledge-Based Support Services: Monitoring and Adaptation, In: Proceedings of the 11th international Workshop on Database and Expert Systems Applications, IEEE Computer Society, 2000
23 References Related Work –Baraglia, R., Silverstri, F., Palmerini, P., On-line Generation of Suggestions for Web Users, In: Proceedings of IEEE International Conference on Information Technology: Coding and Computing, April 2004 –Baraglia, R., Palmerini, P., A web usage mining system, In: Proceedings of IEEE International Conference on Information Technology: Coding and Computing, April 2002 –Wenyin, L., Chen, Z., Lin, F., Zhang, H., Ma, W., Ubiquitous Media Agents: A framework for managing personally accumulated multimedia files, 9 th ACM international conference on multimedia, 2003 (Toronto, Canada) –Jiang, Z., Kleinrock, L., Web prefetching in a mobile environment, IEEE Personal Communications 5(5): 25 – 34, October 1998
24 References –Lou, W., Lu, H., Liu, G., Yiang, Q., Restoring Meaningful Episodes in a Proxy Log, –Ungar, L., Foster, D., Clustering Methods For Collaborative Filtering, In: AAAI Workshop On Recommendation Systems, –Linden, G., Smith, B., York, J., Amazon.com Recommendations Item- to-Item Collaborative Filtering, In: IEEE Internet Computing, Vo. 7, No. 1, Jan 2003.