Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.1 Chapter 7 : Navigating the Web Frustration.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

1. XP 2 * The Web is a collection of files that reside on computers, called Web servers. * Web servers are connected to each other through the Internet.
Angstrom Care 培苗社 Quadratic Equation II
3rd Annual Plex/2E Worldwide Users Conference 13A Batch Processing in 2E Jeffrey A. Welsh, STAR BASE Consulting, Inc. September 20, 2007.
AP STUDY SESSION 2.
1
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Processes and Operating Systems
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Manuscript Central Training Author Center Module 2.
We need a common denominator to add these fractions.
Microsoft Access 2007 Advanced Level. © Cheltenham Courseware Pty. Ltd. Slide No 2 Forms Customisation.
Prepared by: Workforce Enterprise Services For: The Illinois Department of Commerce and Economic Opportunity Bureau of Workforce Development ENTRY OF EMPLOYER.
Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.
Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
Custom Services and Training Provider Details Chapter 4.
CALENDAR.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt BlendsDigraphsShort.
Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
The 5S numbers game..
Office 2003 Introductory Concepts and Techniques M i c r o s o f t Windows XP Project An Introduction to Microsoft Windows XP and Office 2003.
Photo Slideshow Instructions (delete before presenting or this page will show when slideshow loops) 1.Set PowerPoint to work in Outline. View/Normal click.
© Tally Solutions Pvt. Ltd. All Rights Reserved Shoper 9 License Management December 09.
Welcome. © 2008 ADP, Inc. 2 Overview A Look at the Web Site Question and Answer Session Agenda.
Break Time Remaining 10:00.
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
1 IMDS Tutorial Integrated Microarray Database System.
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Health Artifact and Image Management Solution (HAIMS)
Bellwork Do the following problem on a ½ sheet of paper and turn in.
INTRODUCTION Lesson 1 – Microsoft Word Word Basics
Association Rule Mining
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Sample Service Screenshots Enterprise Cloud Service 11.3.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
 Copyright I/O International, 2013 Visit us at: A Feature Within from Item Class User Friendly Maintenance  Copyright.
Adding Up In Chunks.
FAFSA on the Web Preview Presentation December 2013.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
2004 EBSCO Publishing Presentation on EBSCOadmin.
: 3 00.
5 minutes.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Speak Up for Safety Dr. Susan Strauss Harassment & Bullying Consultant November 9, 2012.
To the Assignments – Work in Progress Online Training Course
Essential Cell Biology
ANSC644 Bioinformatics-Database Mining 1 ANSC644 Bioinformatics §Carl J. Schmidt §051 Townsend Hall §
Clock will move after 1 minute
PSSA Preparation.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
Immunobiology: The Immune System in Health & Disease Sixth Edition
Chapter 13 Web Page Design Studio
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
RefWorks: The Basics October 12, What is RefWorks? A personal bibliographic software manager –Manages citations –Creates bibliogaphies Accessible.
1 Atlas Copco Distribution Center DS Connect User’s Guide This document is uncontrolled if viewed or printed outside the IMS.
1.step PMIT start + initial project data input Concept Concept.
Page 1 Orchard Harvest ™ LIS Find a Patient Training.
South Dakota Library Network MetaLib User Interface South Dakota Library Network 1200 University, Unit 9672 Spearfish, SD © South Dakota.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Presentation transcript:

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.1 Chapter 7 : Navigating the Web Frustration in browsing and navigating. Basic navigation tools. Breadcrumb navigation. Revisitation of web pages. Hypertext orientation tools. Starting points for navigation. Web data mining. Mining user navigation patterns. The Best Trail algorithm. Visualisation that aids navigation.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.2 Frustration in Web Browsing and Navigation Frustrating experiences due to navigation are: –Lost connections. –Long download time of web pages. –Web pages that are not found (404 error). –Popup adverts. Browsing frustrations: –Badly designed web pages. –Unpredictable user interfaces.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.3 Basic Navigation Tools Link marker – changes colour when clicked. Back button – stack-based, high use and recurrence rate. Bookmarks – insertion rate much higher than deletion rate. History lists – linear display, can search. Search engine toolbar.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.4 Breadcrumb Navigation Figure 7.3 : Navigation bar

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.5 What do web users do? Formula for recurrence rate – well above 50% There is about 40% chance that the next page visited is within 6 pages visited. Almost all users have 1-2 pages they revisit more often than others, e.g. their home page.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.6 Hypertext Orientation Tools Figure 7.4 : Nielsens hypertext implemented in Apples Hypercard environment

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.7 Wired News 14/02/03, Marc Andreessen, one of the founders of Netscape, said If I had to do it over again, I'd probably show some sort of graphical representation of a tree, so you could see what path you're travelling on and could backtrack. I'd also include thumbnail renderings on the tree to show where you'd been.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.8 What is a good starting point? PageRank measures quality by recommendation, it does not measure whether a page is a good starting point for navigation. A starting page should be: –Relevant to the users goals. –Central, i.e. distance to other pages minimal. –Should be able to reach a maximum of other pages, i.e. should be connected.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.9 Potential Gain Computation Iterate the following equations n times: count = G * count PG = PG + (f(d) * count) G – adjacency matrix of the web graph. count – vector of no. of tips from start. PG – potential gain vector. f(d) – discount fn, decreases with d.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.10 Example Web Site

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.11 Web PagePotential Gain Mark PhD WebTech Staff Azy Research Kevin SCSIS Students KLab WebDyn0.0001

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.12 Web Data Mining Content mining – concerned with the information contained in web pages, e.g. text mining. Structure mining – concerned with link analysis. Usage mining – attempts to discover patterns in log data.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.13 W3C Extended Log File Format cs = client-to-server actions s = server actions c = client actions sc = server-to-client actions

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.14 Analog Analog – Web Log File Analyser Gives basic statistics such as –number of hits. –average hits per time period. –what are the popular pages in your site. –who is visiting your site. –what keywords are users searching for to get to you. –what is being downloaded. Log data does not disclose the visitors identity What do analogs reports mean?mean Report for

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.15 Applications of Usage Mining Pre-fetching and caching web pages E-commerce and clickstream analysis Web site reorganisation Personalisation Recommendation of links and products

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.16 Identification of User By IP address –Not so reliable as IP can be dynamic –Different users may use same IP Through cookies –Reliable but user may remove cookies –Security and privacy issues Through login –Users have to register

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.17 Sessionising Time oriented (robust) –By total duration of session not more than 30 minutes –By page stay times (good for short sessions) not more than 10 minutes per page Navigation oriented (good for short sessions and when timestamps unreliable) –Referrer is previous page in session, or –Referrer is undefined but request within 10 secs, or –Link from previous to current page in web site

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.18 Mining Navigation Patterns Each session induces a user trail through the site A trail is a sequence of web pages followed by a user during a session, ordered by time of access. A pattern in this context is a frequent trail. Co-occurrence of web pages is important, e.g. shopping-basket and checkout. Use a Markov chain model.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.19 Trails inferred from Log data (Each session results in a trail) IDTrail 1A1 > A2 > A3 2 3A1 > A2 > A3 > A4 4A5 > A2 > A4 5A5 > A2 > A4 > A6 6A5 > A2 > A3 > A6

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.20 The Markov Chain from the Data

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.21 Support and Confidence Support s in [0,1) – accept only trails whose initial probability is above s. –Setting support to be above the average click- through is reasonable. Confidence c in [0,1) – accept only trails whose probability is above c. –The probability of a trail is obtained by multiplying the transition probabilities of the links in the trail.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.22 Mining Frequent Trails Find all trails whose initial probability is higher than s, and whose trail probability is above c. Use depth-first search on the Markov chain to compute the trails. The average time needed to find the frequent trails is proportional to the number of web pages in the site.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.23 Frequent Trails Support = 0.1 and Confidence = 0.3 TrailProbability A1 > A2 > A30.67 A5 > A2 > A30.67 A2 > A30.67 A1 > A2 > A40.33 A5 > A2 > A40.33 A2 > A40.33 A4 > A60.33

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.24 Frequent Trails Support = 0.1 and Confidence = 0.5 TrailProbability A1 > A2 > A30.67 A5 > A2 > A30.67 A2 > A30.67

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.25 Pre-fetching and Caching Pages Learn access patterns to predict future accesses. Pre-fetch predicted pages to reduce latency. Can use Markov model and base the prediction on history of access. Also cache results of popular search engine queries.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.26 E-commerce Click stream Analysis What is the users intention: browse, search or buy? Measure time spent on site - site stickiness Repeat visits – it has been shown that repeat visitors spend less time on the site; can be explained by learning. Measure visit-to-purchase conversion ratio, and predict purchase likelihood.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.27 Supplementary Analyses to Improve eCommerce Web Sites Detecting visits from crawlers as opposed to human visitors. Form error analysis, e.g. login errors, mandatory fields not filled, incorrect format. When and why do people exit the site, e.g. visitor puts item in cart but exists before reaching the checkout. Analysis of local search engine logs – correlate with site behaviour. Product recommendations based on association rules (people who bought x also bought y). Geographic analysis – where are the customers? Demographic analysis – who are the customers?

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.28 Adaptive web sites Modify the web site according to user access. –Automatic synthesis of index pages (hubs that contain links on a specific topic) –Based on a clustering algorithm that uses the co-occurrence frequencies of pages from the log data. –Finds a concept that best describes each cluster.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.29 Trail Engine – Automating Navigation A Relevant Trail for the Querymark research Mark Teaching SCSIS Staff Research WebTech

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.30 Trail Engine – Automating Navigation Markov Chain Constructed from Search Engine Scores Mark (1) Teaching (3) SCSIS (2) Staff (5)Research (3) WebTech (6)

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.31 Search Engine vs. Trail Engine A query is a conjunction of keywords. A search engine returns pages containing all the keywords. A trail engine returns trails such that each keyword appears in at least one page on the trail. So, a search engine is a special case of a trail engine.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.32 Scoring Trails (I) Mark (1) > Teaching (3) Average score: (1+3)/2 = 2 Discounted sum (discount factor = 0.75): 1 + 3*0.75 = 3.25

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.33 Scoring Trails (II) Mark (1) > SCSIS (2) > Staff (5) > Mark (1) > Teaching (3) Sum distinct/no. pages: ( )/5 = 2.2 Discounted sum (discount factor = 0.75): 1 + 2* *0.75^2 + 1*0.75^3 + 3*0.75^4= 6.68

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.34 Redundancy in Trails Mark > SCSIS > Staff > Mark Can remove last page in trail as it has already been visited. Mark > SCSIS Is redundant with respect to Mark > SCSIS > Staff

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.35 The Best Trail Algorithm Given a query we generate K starting points. We repeat the main computation M times for each starting point (as there is stochastic variation). The algorithm is essentially a probabilistic best first algorithm.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.36 Best Trail Algorithmic Detail Algorithm maintains a navigation tree, that keeps track of the trails explored. At each step we expand a link proportional to the score of the trail that is created by following the link. First explore then converge.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.37 A Navigation Tree (Figure 7.11) Expanded according to the Markov chains probabilities 0:Mark 1:Teaching2:SCSIS 3:Staff 4:Research 6:WebTech 5:Mark 7:SCSIS

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.38 Best Trail – User Interface Figure 7.12 : Trail Search for query knowledge technologies

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.39 Best Trail – User Interface Figure 7.13 : Nav-Search for query knowledge technologies

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.40 Best Trail – User Interface Figure 7.14 : Visual Search for query knowledge technologies

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.41 Visualisation that Aids Navigation Visualisation of web site structure. Visualisation of web usage data. Visual search engines.

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.42 Web Site Maps Hierarchical Site Map Figure 7.19 Graphical Site Map Figure 7.5

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.43 Directory Structures Open directory categories Figure 7.16 Map of the Open Directory Figure 7.17

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.44 Categorised Site Map Figure 7.20

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.45 Query Specific Map for Web Technologies Figure 7.22

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.46 Fisheye Views Figure 7.24 : Example of a star tree

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.47 Rapid Serial Visual Presentation Figure 7.24 : RSVP browser on a small screen

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.48 Visualisation of User Trails in a Web Site Figure 7.25 : VISVIP

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.49 Web Site Usage Visualisation Figure 7.26 : Anemone

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.50 Visual Search Engines Figure 7.28 : Grokkers topic map for beatles

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.51 Visual Search Engines Figure 7.29 : Kartoos topic map for beatles

Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.52 Museum Experience Recorder Figure 7.30 : Trail of a visitor to a museum