WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London www.dcs.bbk.ac.uk/~mark/

Slides:



Advertisements
Similar presentations
Towards Data Mining Without Information on Knowledge Structure
Advertisements

iRobot: An Intelligent Crawler for Web Forums
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
1. XP 2 * The Web is a collection of files that reside on computers, called Web servers. * Web servers are connected to each other through the Internet.
Chapter 12 Decision Support Systems
Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
Take another look Alison Hayman Search Solutions Unit Dissemination Divison February 2011 Statistics Canada site search.
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
…to Ontology Repositories Mathieu dAquin Knowledge Media Institute, The Open University From…
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Implementation of a Validated Statistical Computing Environment Presented by Jeff Schumack, Associate Director – Drug Development Information September.
1 Web Search Environments Web Crawling Metadata using RDF and Dublin Core Dave Beckett Slides:
28 April 2004Second Nordic Conference on Scholarly Communication 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Determine Eligibility Chapter 4. Determine Eligibility 4-2 Objectives Search for Customer on database Enter application signed date and eligibility determination.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.
Chapter 3 Critically reviewing the literature
Internet Search Engine freshness by Web Server help Presented by: Barilari Alessandro.
- A Powerful Computing Technology Department of Computer Science Wayne State University 1.
4. Internet Programming ENG224 INFORMATION TECHNOLOGY – Part I
Introduction Lesson 1 Microsoft Office 2010 and the Internet
The internet. Background Created in 1969, connected computers at UCLA, Stanford Research Institute, U. of Utah, and UC at Santa Barbara With an estimated.
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
Configuration management
ABC Technology Project
M obile U ser I nterface Survey March 11, 2011 Hyojin Song.
ACM CIKM 2008, Oct , Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
:: DIAsDEM :: Seminar: Web Mining WS 2003/2004 Ingo Kampe Heiko Scharff.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Executional Architecture
Welcome! A powerpoint guide to IOP’s Electronic Journals Contents journals.iop.org 2 journals.iop.org Journals list 3Journals list Journal homepages 4Journal.
Addition 1’s to 20.
25 seconds left…...
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Week 1.
We will resume in: 25 Minutes.
University of Malta CSA3080: Lecture 13 © Chris Staff 1 of 16 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
CFR 250/590 Introduction to GIS, Autumn 1999 Data Search & Import © Phil Hurvitz, find_data 1  Overview Web search engines NSDI GeoSpatial Data.
Introduction Peter Dolog dolog [at] cs [dot] aau [dot] dk Intelligent Web and Information Systems September 9, 2010.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
South Dakota Library Network MetaLib User Interface South Dakota Library Network 1200 University, Unit 9672 Spearfish, SD © South Dakota.
Application of Ensemble Models in Web Ranking
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
A similar problem also exists in automatically-generated corpuses such as program documentation. Program documentation, like other online support systems,
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Information Retrieval in Practice
Search Engines and Information Retrieval
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Overview of Search Engines
Internet Research Search Engines & Subject Directories.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Data mining in web applications
Information Retrieval in Practice
Search Engine Architecture
Search Engines & Subject Directories
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
Data Mining Chapter 6 Search Engines
Search Engines & Subject Directories
Search Engines & Subject Directories
Web Mining Research: A Survey
Presentation transcript:

WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London

2 Talk Overview Hypertext and the navigation problem NavigationZones solutionNavigationZone Problems being researched A Demonstration

3 Hypertext and Navigation Long history –Bush 1945, memex – trail blazing –Nelson 1965, Xanadu - network of documents Problem of getting lost in hyperspace Navigation aids –Bookmarks –History –Overview diagrams – Recommendations

4 State-of-the-Art Navigation Aids Novel User-Interfaces to visualise web sites Clustering (e.g. Self-Organising Maps) Web data mining – finding user patterns Semi-automated navigation, BestTrail algorithm – motivation to follow …

5 Typical corporate search

6 A typical search scenario 1)Submit a query to a search engine Is it too broad / too specific? Does it capture my information needs? 2)Select a URL from the result set Have I made the right choice? 3)Start manual navigation Where - am I? have I come from ? am I going to ? 4)Goto (1) to reformulate the query

7 Content centric approach a c e d * ba e d

8 Problems with standard Search Page level relevance scoring –sensitive to query terms No look ahead –click and discover No context –results are totally isolated No navigation support –Users are left on their own to find their way

9 Possible solutions (information retrieval) Improve basic IR Link analysis, e.g. pagerank and HITS Meta data tagging –Keywords and taxonomies (semantic web) Natural language –Q&A, sentence analysis, synonyms

10 Possible solutions (information seeking) Suggestion engines –Link and content generation Categories and directories –Explicit manual construction Automatic classification –Machine learning techniques

11 Are these feasible? Re-architecting corporate information infrastructure is extremely expensive Sophisticated approaches are not always intuitive and are yet to be proven Same problem every couple of years Mergers and acquisitions

12 There is, actually, a better way! Treat sequence of pages, or trails, as first- class citizens for search Consider the topology of the area in which you are searching Employ navigational aids

13 Context centric approach a c e d * ba c e d * b e a c d * b

14 The information value of a trail is higher than the sum of it parts!

15 Our approach Provide information retrieval of the highest quality and in addition, Find out what is beyond the most relevant pages by exploring the area Present users with precise and relevant trails Provide navigation assistance within the UI

16 NavZone user interface

17 First Monday paper Task – find answers to 5 types of questions 1)Fact Finding – What are the term dates? 2)Judgement – Is CSIS a good place to do research? 3)Fact Comparison – Which train station is closest to the college? 4)Judgement Comparison – Is the research in deptA better than that in deptB? 5)General Navigational – How do you get to the checkout? NavZone Usability Study

18 % of subjects, 4+ questions correct 59% Google 75% Compass 83% NavZone NavZone vs. Google and Compass

19 44 Google 40 Compass 27 NavZone NavZone is bandwidth green ! Average # clicks to complete task

20 18 Compass 17 Google 13 NavZone Average time taken per task (min) Wilcoxon Test - Statistically Significant

21 The ingredients of the System State-of-the-art web crawler Highly efficient document Indexer Competitive IR Patent protected trail engine and UI

22 The main ingredients robot Parser HTML, XML, PDF, PostScript, Word, Other generic format crawler BestTrail web graph user interface trail engine postprocessor inverted file indexer BestTrail web graph user interface

23 The crawler Pick a URL from the queue Download the page Parse and extract main features Replace URL in queue with outlinks QDR1R1 PR2R2 queuedownloadersparsers

24 The indexer Compute page statistics for IR Compute page navigability potential (PG) Compute page authority ranking (GR) Build page summary information Build inverted index

25 The trail engine Compute page scores for query Explore graph from good starting nodes Rank candidate trails Build result set

26 Under Development Alternative User-Interfaces Seamless integration with relational databases and file systems Data mining and personalisation Mobile/PDA support

27 Open Problem How do we make use of statistical regularities that are present in the web to improve search and navigation? See, Levene et al. A stochastic model for the evolution of the web., Condensed Matter Archive, cond-mat/ , many distributions related to the web graph follow a power lawA stochastic model for the evolution of the web.