1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing.

Slides:



Advertisements
Similar presentations
Lecture 2 - Revenue Models
Advertisements

Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
ASYCUDA Overview … a summary of the objectives of ASYCUDA implementation projects and features of the software for the Customs computer system.
28 April 2004Second Nordic Conference on Scholarly Communication 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia.
Web Mining.
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
DCV: A Causality Detection Approach for Large- scale Dynamic Collaboration Environments Jiang-Ming Yang Microsoft Research Asia Ning Gu, Qi-Wei Zhang,
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Introduction to HTML, XHTML, and CSS
Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)
Recommender Systems & Collaborative Filtering
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.
Chapter 3 Critically reviewing the literature
Fawaz Ghali Web 2.0 for the Adaptive Web.
Internet Search Engine freshness by Web Server help Presented by: Barilari Alessandro.
1 FUTURE DIRECTIONS FOR COUNTER AND USE STATISTICS David Goodman Palmer School of Library & Information Science Long Island University.
Introduction Lesson 1 Microsoft Office 2010 and the Internet
Google News Personalization Scalable Online Collaborative Filtering
Configuration management
Software change management
WEB- BASED TRAINING Chapter 4 Virginija Limanauskiene, KTU, Lithuania.
AMES-Cloud: A Framework of Adaptive Mobile Video Streaming and Efficient Social Video Sharing in the Clouds 作者:Xiaofei Wang, MinChen, Ted Taekyoung Kwon,
:: DIAsDEM :: Seminar: Web Mining WS 2003/2004 Ingo Kampe Heiko Scharff.
Integration of association rules into WUM Bastian Germershaus.
CAR Training Module PRODUCT REGISTRATION and MANAGEMENT Module 2 - Register a New Document - Without Alternate Formats (Run as a PowerPoint show)
WEB USAGE MINING FRAMEWORK FOR MINING EVOLVING USER PROFILES IN DYNAMIC WEBSITE DONE BY: AYESHA NUSRATH 07L51A0517 FIRDOUSE AFREEN 07L51A0522.
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
Requirements Analysis 1. 1 Introduction b501.ppt © Copyright De Montfort University 2000 All Rights Reserved INFO2005 Requirements Analysis Introduction.
25 seconds left…...
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Chapter 12 Analyzing Semistructured Decision Support Systems Systems Analysis and Design Kendall and Kendall Fifth Edition.
University of Malta CSA3080: Lecture 13 © Chris Staff 1 of 16 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
1 Literacy PERKS Standard 1: Aligned Curriculum. 2 PERKS Essential Elements Academic Performance 1. Aligned Curriculum 2. Multiple Assessments 3. Instruction.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Web Mining Research: A Survey
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Web Mining Research: A Survey
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Overview of Web Data Mining and Applications Part I
+ Social Bookmarking and Collaborative Filtering Christopher G. Wagner.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru
Chapter Chapter 3 Internet Agents. Chapter Contents Background Web Search Agents Information Filtering Agents Notification Agents Other Service.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Information Design Trends Unit Five: Delivery Channels Lecture 2: Portals and Personalization Part 2.
Artificial Intelligence Techniques Internet Applications 4.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Web Analytics Fundamentals Presented by Tejaswi, Chandrika, Sunil.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Data mining in web applications
Recommender Systems & Collaborative Filtering
Web Mining Ref:
Web Mining Research: A Survey
Presentation transcript:

1 From User Access Patterns to Dynamic Hypertext Linking Patrick Farrell, Siddharth Gudka, Mike Oxley, Simon Phillips A Research Directions In Computing Presentation T. Yan, M. Jacobsen, H. Garcia-Molina, U. Dayal

2 Agenda Introduction Some theory The paper A short critique After the paper –Academic research –The Authors work The technology in use today Conclusion Questions

3 Introduction Hypothesis That hyperlinks to unvisited and indirectly linked pages can be offered based upon pages the user has already visited Experiment a) to analyse log files to form clusters of commonly co- accessed pages b) to categorize online users into the correct categories and offer appropriate links

4 Mass customisation Concept of adapting things to each user – on a large scale Economic benefit in adding value Satisfied shoppers also more likely to return Whats new? –In the physical world, customisation doesnt scale. –Using technology and intelligent algorithms, it can.

5 Adaptive Web Sites Sites that automatically improve their organisation and presentation based on visitor access patterns We can cluster pages on a site together based on their co-occurrence frequency –Likelihood that user will visit page P having visited Q For a user browsing the site, use session history to predict which pages a user may want to access – and so adapt site

6 The Paper Yan et al. implement an adaptive web site, based on user access logs. Paper discusses different approaches to clustering and implementation Experimental data is presented –validating the concept of clustering on an academic site –showing the value added by an adaptive website using their technique The log analysis software used is published

7 The paper - Justification Use the metaphor of a shopper browsing an online shop Adaptive site can provide links to similar items to those being browsed –eg Male Yuppie browsing executive toys –Might also be interested in sportswear As site grows, static links to related content more of a challenge - dynamic is much better Many practical examples today – but not 10 years ago!

8 Online The Paper – System Design Link Generator HTML Documents Offline Access logs PreprocessCluster User Categories URL HTML with suggestions Web Server End user

9 The paper - Preprocessing For each user session –form a n-dimensional vector of the pages visited –can weight vector elements using a metric Number of hits to page Estimate of time spent on page (possibly normalised) Close session vectors in n-dimensional space form a cluster

10 The paper - Clustering Different algorithms to cluster vectors by closeness Paper uses Leader algorithm – with additional constraints –Constraint: Minimum hits in a valid session –Constraint: Minimum cluster size Algorithm fast and memory efficient –But not order invariant

11 Dynamic Link Generation Use session history to track page a user has visited –Authors buffered logs in memory using a database –Sessions part of most web servers now Match partial vector of session with pre-calculated categories to build list of appropriate pages –Partial vector, so Euclidian distance not necessarily appropriate –May be better to simply count matching categories Filter the suggestion list to remove pages visited - and possibly any already adjacent in navigation tree

12 Paper – Experimental results Time spent on particular pages follows Zipfian distribution – not useful for page weight The authors present a number of experimental results about clustering algorithm parameters, e.g. min. cluster size Found clusters on academic website that were not evident from hypertext layout – so clustering serves purpose.

13 Critique Paper presents new concept of clustering web accesses – but essentially draws together existing work in other fields Makes key simplifications –Ignores any web caching, proxies, etc –Considering all pages in a session as being in a category is naïve – e.g. navigation pages, indexes, etc Weakness in experiments –Authors invented nominal sessions based on unique end- user addresses as server didnt support sessions –Only present data for one site 2,709 sessions – of which 50% were in the same cluster!

14 Further Work Garcia-Molina –Beyond Document Similarity: Understanding Value-Based Search and Browsing Technologies (2000) Discusses judging value of web documents based on user behaviour Dayal: –Knowledge-Based Support Services: Monitoring and Adaptation (2000) Discusses a Knowledge-Based Service deployed within HP to deliver customer support services. System adapts based on observed user patterns and evolving needs

15 Related Work Web Prefetching (Jiang & Kleinrock, 1998) –Addresses slow access speeds of World Wide Web PREDICTION MODULE: Computes access probabilities. THRESHOLD MODULE: Computes prefetch thresholds. –Uses clustering to divide users into categories by access probability Restoring Meaningful Episodes in a Proxy Log (Lou et al. 2001) –Extracting users activity information from proxy logs –Classifies individual requests into meaningful semantic elements –Semantics-based CUT-AND-PICK approach

16 Related Work SUGGEST (Baraglia et al. 2002, 2004) –No off-line component –Quality metric to estimate effectiveness of suggestions Media Agents (Wenyin et al ) –Automatic collection of semantic indices of multimedia data –Semantic descriptions from content of documents –Users interaction refines semantic indices and suggests other multimedia data

17 Custom application - Analog Applications & The Paper Uses clustering tech to analyse log files To dynamically generate possibly interesting links Means End Successful (to an extent)

Technology Directions Vivisimo Google Labs Clustering Documents Amazon Flickr Tivo Collaborative Filtering

19 Amazon.com Uses recommendation algorithm – person who bought x also bought y Item-to-item collaborative filtering –provides recommendations based on grouped items, not customers For each item in product catalog, I1 For each customer C who purchased I1 For each item I2 purchased by customer C Record that a customer purchased I1 and I2 For each item I2 Compute the similarity between I1 and I2 Essence

20 Amazon.com Creates vectors where each vector is an item with M dimensions (customers) Similarity between two items computed by measuring cosine of angle between two vectors. Offline computation theoretically expensive: O(N 2 M) In practice only O(NM) as most customers have few purchases.

21 Conclusion The paper was on the right track Appreciated applicability of clustering to e-commerce Hypothesis proved by experiment Failed to address or even predict scalability issues

22 References Authors Work –Yan, T., Jacobsen, M., Garcia-Molina, H., Dayal, U., From User Access Patterns to Dynamic Hypertext Linking, In: Fifth International World Wide Web Conference, 1996 (Paris, France) –Paepcke, A., Garcia-Molina, H., Rodriquez, G. and Cho, J., Beyond Document Similarity: Understanding Value-Based Search and Browsing Technologies, In: Stanford University Technical Report, 2000 –Delic, K. A. and Dayal, U., Knowledge-Based Support Services: Monitoring and Adaptation, In: Proceedings of the 11th international Workshop on Database and Expert Systems Applications, IEEE Computer Society, 2000

23 References Related Work –Baraglia, R., Silverstri, F., Palmerini, P., On-line Generation of Suggestions for Web Users, In: Proceedings of IEEE International Conference on Information Technology: Coding and Computing, April 2004 –Baraglia, R., Palmerini, P., A web usage mining system, In: Proceedings of IEEE International Conference on Information Technology: Coding and Computing, April 2002 –Wenyin, L., Chen, Z., Lin, F., Zhang, H., Ma, W., Ubiquitous Media Agents: A framework for managing personally accumulated multimedia files, 9 th ACM international conference on multimedia, 2003 (Toronto, Canada) –Jiang, Z., Kleinrock, L., Web prefetching in a mobile environment, IEEE Personal Communications 5(5): 25 – 34, October 1998

24 References –Lou, W., Lu, H., Liu, G., Yiang, Q., Restoring Meaningful Episodes in a Proxy Log, –Ungar, L., Foster, D., Clustering Methods For Collaborative Filtering, In: AAAI Workshop On Recommendation Systems, –Linden, G., Smith, B., York, J., Amazon.com Recommendations Item- to-Item Collaborative Filtering, In: IEEE Internet Computing, Vo. 7, No. 1, Jan 2003.