Download presentation
Presentation is loading. Please wait.
Published byErin Cameron Modified over 10 years ago
1
WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London www.dcs.bbk.ac.uk/~mark/
2
2 Talk Overview Hypertext and the navigation problem NavigationZones solutionNavigationZone Problems being researched A Demonstration
3
3 Hypertext and Navigation Long history –Bush 1945, memex – trail blazing –Nelson 1965, Xanadu - network of documents Problem of getting lost in hyperspace Navigation aids –Bookmarks –History –Overview diagrams – Recommendations
4
4 State-of-the-Art Navigation Aids Novel User-Interfaces to visualise web sites Clustering (e.g. Self-Organising Maps) Web data mining – finding user patterns Semi-automated navigation, BestTrail algorithm – motivation to follow …
5
5 Typical corporate search
6
6 A typical search scenario 1)Submit a query to a search engine Is it too broad / too specific? Does it capture my information needs? 2)Select a URL from the result set Have I made the right choice? 3)Start manual navigation Where - am I? have I come from ? am I going to ? 4)Goto (1) to reformulate the query
7
7 Content centric approach a c e d * ba e d
8
8 Problems with standard Search Page level relevance scoring –sensitive to query terms No look ahead –click and discover No context –results are totally isolated No navigation support –Users are left on their own to find their way
9
9 Possible solutions (information retrieval) Improve basic IR Link analysis, e.g. pagerank and HITS Meta data tagging –Keywords and taxonomies (semantic web) Natural language –Q&A, sentence analysis, synonyms
10
10 Possible solutions (information seeking) Suggestion engines –Link and content generation Categories and directories –Explicit manual construction Automatic classification –Machine learning techniques
11
11 Are these feasible? Re-architecting corporate information infrastructure is extremely expensive Sophisticated approaches are not always intuitive and are yet to be proven Same problem every couple of years Mergers and acquisitions
12
12 There is, actually, a better way! Treat sequence of pages, or trails, as first- class citizens for search Consider the topology of the area in which you are searching Employ navigational aids
13
13 Context centric approach a c e d * ba c e d * b e a c d * b
14
14 The information value of a trail is higher than the sum of it parts!
15
15 Our approach Provide information retrieval of the highest quality and in addition, Find out what is beyond the most relevant pages by exploring the area Present users with precise and relevant trails Provide navigation assistance within the UI
16
16 NavZone user interface
17
17 First Monday paper Task – find answers to 5 types of questions 1)Fact Finding – What are the term dates? 2)Judgement – Is CSIS a good place to do research? 3)Fact Comparison – Which train station is closest to the college? 4)Judgement Comparison – Is the research in deptA better than that in deptB? 5)General Navigational – How do you get to the checkout? NavZone Usability Study
18
18 % of subjects, 4+ questions correct 59% Google 75% Compass 83% NavZone NavZone vs. Google and Compass
19
19 44 Google 40 Compass 27 NavZone NavZone is bandwidth green ! Average # clicks to complete task
20
20 18 Compass 17 Google 13 NavZone Average time taken per task (min) Wilcoxon Test - Statistically Significant
21
21 The ingredients of the System State-of-the-art web crawler Highly efficient document Indexer Competitive IR Patent protected trail engine and UI
22
22 The main ingredients robot Parser HTML, XML, PDF, PostScript, Word, Other generic format crawler BestTrail web graph user interface trail engine postprocessor inverted file indexer BestTrail web graph user interface
23
23 The crawler Pick a URL from the queue Download the page Parse and extract main features Replace URL in queue with outlinks QDR1R1 PR2R2 queuedownloadersparsers
24
24 The indexer Compute page statistics for IR Compute page navigability potential (PG) Compute page authority ranking (GR) Build page summary information Build inverted index
25
25 The trail engine Compute page scores for query Explore graph from good starting nodes Rank candidate trails Build result set
26
26 Under Development Alternative User-Interfaces Seamless integration with relational databases and file systems Data mining and personalisation Mobile/PDA support
27
27 Open Problem How do we make use of statistical regularities that are present in the web to improve search and navigation? See, Levene et al. A stochastic model for the evolution of the web., Condensed Matter Archive, cond-mat/0110016, 2001- many distributions related to the web graph follow a power lawA stochastic model for the evolution of the web.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.