A similar problem also exists in automatically-generated corpuses such as program documentation. Program documentation, like other online support systems,

Slides:



Advertisements
Similar presentations
Chapter 16 The World Wide Web.
Advertisements

C++ Interface for Making Visualized Graphs By N.K. Bonsack and E.Harcourt Abstract Software engineers and computer scientists alike frequently come upon.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
WWW Challenges : Supporting Users in Search and Navigation Natasa Milic-Frayling Microsoft Research, Cambridge UK SOFSEM 2004 January 28, 2004.
Information Retrieval in Practice
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Web applications. Javascript. Web 2.0: The dynamic, read-write web UC Santa Cruz CMPS 10 – Introduction to Computer Science
New Library Catalogue Interface Proposal 3. Introduction This presentation will outline the design decisions for the new interface of the on-line library.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Automatic Data Ramon Lawrence University of Manitoba
What is adaptive web technology?  There is an increasingly large demand for software systems which are able to operate effectively in dynamic environments.
PaperScope: Visually Exploring the ADS Mark Holliman VOTECH Web Developer University of Edinburgh ADASS XVII, London,
Chapter 1 Understanding the Web Design Environment
Information Retrieval
How to Create Top Ranking Searchable and Accessible Documents Chris Pollett and Elizabeth Tu April, 2010.
Overview of Search Engines
1 LOMGen: A Learning Object Metadata Generator Applied to Computer Science Terminology A. Singh, H. Boley, V.C. Bhavsar National Research Council and University.
ARCHIBUS Log On Instructions. Log Into ARCHIBUS Web Central Log In Screen 1.Open your Internet browser. 2.Enter the URL to view the ARCHIBUS Login Page.
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
Chapter 16 The World Wide Web. 2 Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Write basic HTML.
UML Tools ● UML is a language, not a tool ● UML tools make use of UML possible ● Choice of tools, for individual or group use, has a large affect on acceptance.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Chapter 16 The World Wide Web Chapter Goals ( ) Compare and contrast the Internet and the World Wide Web Describe general Web processing.
Business computing Session 2 22nd October The MS Office suite softwares The word processor : « Word » The spread sheet : « Excel » The presentation.
Revolutionizing the Field of Grey-box Attack Surface Testing with Evolutionary Fuzzing Department of Computer Science & Engineering College of Engineering.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Search Engine Interfaces search engine modus operandi.
NCBI/WHO PubMed/Hinari Course Introduction Session #1, Sept 13, 2005 Session #2, Sept 14, 2005 Internet Concepts and Scientific Literature Resources Ho.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Objective Understand concepts used to web-based digital media. Course Weight : 5%
COMP106 Assignment 2 Proposal 1. Interface Tasks My new interface design for the University library catalogue will incorporate all of the existing features,
Active Server Pages  In this chapter, you will learn:  How browsers and servers interacted on the Internet when the Internet first became popular 
Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without sophisticated search engines, it would be.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Project Overview Graduate Selection Process Project Goal Automate the Selection Process.
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
CMPS 435 F08 These slides are designed to accompany Web Engineering: A Practitioner’s Approach (McGraw-Hill 2008) by Roger Pressman and David Lowe, copyright.
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
CONTENTS  Definition And History  Basic services of INTERNET  The World Wide Web (W.W.W.)  WWW browsers  INTERNET search engines  Uses of INTERNET.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
® IBM Software Group © 2009 IBM Corporation Essentials of Modeling with the IBM Rational Software Architect, V7.5 Module 15: Traceability and Static Analysis.
University of Malta CSA4080: Topic 7 © Chris Staff 1 of 15 CSA4080: Adaptive Hypertext Systems II Dr. Christopher Staff Department.
G042 - Lecture 09 Commencing Task A Mr C Johnston ICT Teacher
G053 - Lecture 02 Search Engines Mr C Johnston ICT Teacher
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Windows Vista Configuration MCTS : Internet Explorer 7.0.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
User-Adaptive Systems
A Brief Introduction to the Internet
Creating a Successful Web Presence
Data Mining Chapter 6 Search Engines
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Web Mining Department of Computer Science and Engg.
Chapter 16 The World Wide Web.
Academic Map Report And Exhibition Group24.
Presentation transcript:

A similar problem also exists in automatically-generated corpuses such as program documentation. Program documentation, like other online support systems, provides the opportunity for enhanced productivity. However, users often take longer finding the required information in such systems than they would in conventianal paper- based documentation \cite{TOMA99}. Such corpuses are often highly interlinked. Links are created in JavaDocs due to package structure, class inheritance and type references. This is of great benefit if the information can be harvested correctly, as a link exists in most cases where the relationship between the classes suggests one should. Hence, a greater number of potential trails also exist. Filtering this increased density of link information provides the challenge. We achieve this by combining our Best Trail algorithm with a set of heuristics customized for on-line documentation. Interesting questions may be asked of such corpuses. Should links based on the usage of objects in code be treated in the same way as those based upon human judgements? Are these created with the same distributions of preferential linking as found in web sites? How does this affect the performance of metrics which assume that links denote quality or authority, such as Kleinberg's Hubs \& Authorities (HITS) or Brin and Page's PageRank? The AutoDoc Solution Richard Wheeldon, Mark Levene and Nadav Zin Department of Computer Science Birkbeck College, University of London Trail-based approach T raditional A pproach Find good starting nodes Associate probabilities with tips Iterate until “happy”Select a tip in random Expand node Assign probabilities to outlinks based on trail relevance Mark node as visited and add new tips Normalise probabilities End Source: AutoDoc: A Search and Navigation Tool for Web-Based Program Documentation We present a search and navigation tool for use with automatically generated program documentation, which builds trails in the information space. Three user interfaces are suggested, which show the web pages in context, and hence better explain the structure of the code. To help meet the demand for program documentation, many organizations are using tools like JavaDoc, which are capable of producing Hypertext documentation from special markup in the code. However, none of the tools for creating such documentation provide search facilities or automated navigational support for the user. We have devloped a new search and navigation tool, which we call AutoDoc, to address these problems. An example of AutoDoc using the Javadocs for Sun's Java Development Kit v1.4 is available at The Search and Navigation Challenge We have developed a navigation system which builds information trails (or navigation paths) in response to a user's query. To do this, we first find good starting nodes (using term based IR and some novel link analysis, associate probabilities with tips and expand the search tree assigning probabilities to outlinks based on trail relevance. Takes into account linkage to present a full set of relevant document trails Less reliant upon accuracy of search terms Provides a planning horizon rather than just “hits” Treats pages at individual/atomic level, ignoring links. Heavily reliant on accuracy of search terms No look-ahead, forcing “click and discover” The AutoDoc tool is based on the same technology as our website navigation tool, SiteNavigator. The trail-finding and information retrieval subsystems remain the same, as do the major components of the user interfaces. AutoDoc, like SiteNavigator, provides full-text indexing of the corpus. Our trail finding algorithm uses probabilistic expansion of a search tree to select candidate trails quickly, with the results returned in an XML format that can be adapted using XSLT StyleSheets to any format required. Three user interfaces are provided, each with their own advantages and restrictions. Three user interfaces are provided for AutoDoc. The screenshots below show results in each of these interfaces for the query “sql”. The results show the classes of JDBC (Java’s software for connecting to SQL databases). By examining the links between pages on the trails, we can see the connections between the classes. As these interfaces are generated by applying stylesheets to a common XML input, new interfaces can easily be incorporated. The flat TrailSearch user interface appears very similar to that of a traditional search engine. Such an interface is the best choice for introducing new users who are already familiar with search engines to the ideas of a returned trail. However, it is still difficult to see the context of a node or the structure of a site in such a display, nor is there adequate support during the navigation session. The NavSearch user interface has two main elements – navigation tool bar comprising of a sequence of URLs (the ``best trail'') and a navigation tree window with the rest of the trails. Pop-ups display relevant metadata about packages and classes. Our prototype GraphSearch interface uses the GraphViz program to display the results in the form of a graph, where each trail is indicated by a different colour. The trail set output is piped to the clickable image map generator. AutoDoc allows increased productivity while programming, being quicker and more up-to-date than using a book or manually navigating the on-line documentation. It is thus ideal for experienced users who want answers quickly, as well as providing a smooth introduction to the language for novice programmers. Research has shown that SiteNavigator performed better than Google or Compass as a search engine for UCL’s web site. We expect similar showings for AutoDoc. What Next? Following the release of AutoDoc for Java documentation, we are currently extending the coverage for C++, C# and other languages. We intend to further develop the GraphSearch user interface to production quality, with pop-up windows containing useful information similar to those in the NavSearch interface, including document summaries and metadata. Finally, we intend to allow personalization so that programmers working in a particular field can have query results tailored to their particular needs. Try it out at The trails presented in the NavSearch tree structure were generated from the link structure of the hypertext, but could sometimes be confused with a hierarchy or directory structure. This is a problem we have attempted to address with our new GraphSearch interface. Should links based on the use of objects in code be treated in the same way as those based upon human judgements? These graphs show inlink and outlink distributions that suggest a strong similarity between the two link types. Similar navigation problems to those found in websites also exist in automatically-generated corpuses such as program documentation. Program documentation, like other online support systems, provides the opportunity for enhanced productivity. However, users often take longer finding the required information. Such corpuses are often highly interlinked. Links are created in JavaDocs due to package structure, class inheritance and type references. Filtering this increased density of link information provides the challenge. We have adapted our website navigation system by combining our Best Trail algorithm with a set of heuristics customized for on-line documentation. Will it work? Motivation A Navigation System The Navigation Problem User Interfaces What is it Good for?