Download presentation
Presentation is loading. Please wait.
Published byAshlee Bates Modified over 9 years ago
1
Automated Benchmarking Of UK Museum Web Sites With An Introduction to UKOLN and UK Web Focus Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY UKOLN is supported by: Email B.Kelly@ukoln.ac.uk URL http://www.ukoln.ac.uk/
2
2 Contents About UKOLN UKOLN’s WebWatch Work For UK HEIs Benchmarking UK Museum Web Sites Comparison With “6 Of The Best” Limitations Of Approach Where To From Here?
3
3 UKOLN UKOLN: National focus of expertise in digital information management Based at University of Bath Funded by JISC (HE and FE sector) and Resource: The Council for Museums, Archives and Libraries, together with project funding (e.g. EU and JISC) About 25 FTEs Carries out applied research (e.g. in metadata), software development and provides policy and advisory services
4
4 UKOLN’s Dissemination Work UKOLN carries out dissemination activities including work carried out by UKOLN’s Policy and Advice Team: Interoperability Focus Close links with Resource and Museums community (member of CIMI Executive Committee) Involved in e-GIF standards work See Collection Description Focus Funded by JISC, RSLP and British Library Coordination work on collection description methods, schemas & tools with goal of ensuring consistency across projects, disciplines, institutions and sectors See Bibliographic Management UK Web Focus - myself
5
5 UK Web Focus UK Web Focus: Funded by JISC to provide advice on Web developments Organises events (e.g. annual Institutional Web Management Workshop), writes articles (e.g. regular columns in Ariadne e-journal), gives talks, etc. A member of UKOLN’s Policy and Advice Team (which also includes Interoperability Focus, Collection Description Focus and Public Library Networking Focus) Managed the original WebWatch project and continues to publish results of WebWatch surveys
6
6 Community Building An important part of my work is community building within UK HE / FE Web management communities: An annual 3 day workshop which provides an opportunity for Web managers to: update their technical skills and approaches to managerial and strategic thinking discuss and share problems and solutions with peers Active participation in (e.g.) JISCMail mailing lists e.g.: web-support: “My home page doesn’t look right in Netscape 4. Can anyone help?” website-info-mgt: “A Web site has stolen text and images from my Web site. What should I do?” “How should I impose a consistent look-and-feel across all departmental Web sites?” Comparing approaches across community and sharing best practices
7
7 WebWatch Project WebWatch project: Initially funded for 1 year in 1997 by BLRIC to develop and use automated robot software to analyse Web developments across various UK communities Once funding finished the work continued, but made use of (mainly) freely available Web services to analyse various features of Web site communities Supports community-building work across UK HE/FE Web managers (sharing, not flaming) See
8
8 WebWatch Surveys Search Engines Used To Index UK HE Web Sites: ht://Dig most popular and growing in popularity followed by an MS solution Interest in licensed Ultraseek/Inktomi solution Interest in externally hosted indexers (e.g. Google) Surprising number of institutions with no search facility See Nos. of Links Cambridge has most (231,000 links to all servers) Sheffield has the most to a single server (46,000) See Nos. Of Web Servers Cambridge has most (200+) See
9
9 Update On Search Engines Sept 1999 ht://Dig: 25 Excite: 19 Microsoft: 12 Harvest: 8 Ultraseek: 7 SWISH: 5 Other: 23 None: 59 Today: ht://Dig: 48 Microsoft: 17 Ultraseek/Inktomi: 12 Google: 11 Excite: 5 Webinator: 5 Others: 22 None: 29 The growth in popularity of ht://Dig, the unexpected appearance of the Google externally-hosted service and the move from SWISH and Harvest would not have been noticed without the snapshots. The discussion of surveys informed decision-making. NOTE
10
10 WebWatch Activities As well as these metrics a number of observations of features have been carried out 404 Error Page The appearance of and functionality provided by the institution’s 404 error page Appearance of Main Entry Point The appearance of the institution’s entry point, and identifying main types (menu-style vs news) and use of technologies (Java, DHTML, etc.) A “rolling demo” has been provided of these features allowing interested parties to quickly get a feel of the approaches taken within the community These have proved very popular – see
11
11 Benchmarking WebWatch approach of monitoring UK HE Web sites can be extended into a benchmarking exercise: Making comparisons with peers Checking compliance with standards Checking compliance with community or funders guidelines (e.g. e-GIF guidelines) This has advantages for organisations: Observing best practices and learning from them Ditto for bad practices Community building and some potential disadvantages: Establishment of leagues tables Inappropriate comparisons Penalty clauses for failure to comply with standards This has advantages for organisations: Observing best practices and learning from them Ditto for bad practices Community building and some potential disadvantages: Establishment of leagues tables Inappropriate comparisons Penalty clauses for failure to comply with standards
12
12 Benchmarking Museum Web Sites WebWatch approach to benchmarking has been applied to a small number of UK Museum Web sites: Small selection chosen in order to: Keep resource requires to a minimum Validate methodology Gauge interest in this approach Selected resources were : Sample of museum Web sites Guardian’s six best museum Web sites If methodology is felt to be valid and there is sufficient interest the approach could be taken more widely across the museum community Details of survey available from
13
13 Benchmarking Activity Choosing the sample: mda list of UK Museum Web sites used as master source Web sites beginning with letter “A” were chosen Andrew Carnegie Birthplace Museum removed from sample as Web site was unavailable Abbot Hall Art Gallery Aberdeen Art Gallery & Museums AccessArt Aerospace Museum Allhallows Museum Althorp House Amberley Museum American Museum in Britain Armagh Planetarium Arnolfini Gallery Ashmolean Museum of Art & Archaeology Astley Hall Museum and Art Gallery Avoncroft Museum of Historic Buildings The 13 Selected Museum Web Sites
14
14 Approaches Approaches taken: Use of freely-available Web sites which provide analysis capabilities Page of “live links” provided enabling all users to reproduce findings Complement this with manual inspection Benefits of this approach: Openness, reproducibility and objectivity of survey http://www.netmechanic.com/ toolbox/html-code.htm
15
15 Domain Names Findings 11 museums (92%) have an entry point which is the domain name and 2 (8%) have an entry point which is one level beneath the domain name 6 (46%) have a.co.uk domain; 3 (23%) have.org.uk ; 2 (15%) have.com ; 1 (8%) has.org ; 1 (8%) has.ac.uk Discussion Most of the museums have a short, memorable URL The variety of top level domains may be confusing for end users How will the new.museum domain be deployed? Is there an opportunity for a major advertising campaign? Reminder – findings are for a small, non-random sample
16
16 Server Software Netcraft used to analyse Web server software Findings 7 hosted on a Unix platform (4 on Linux, 2 on Solaris and 1 on BSD) 6 hosted on a Microsoft platform (4 on NT 4 or Windows 98, 2 on Windows 2000) Issues Security, scalability, ease-of-use, …. http://www.netcraft.com/
17
17 Standards Compliance Entry point examined for compliance with HTML and CSS standards using the NetMechanic and W3C Validator Web-based tools: Findings 0 pages were HTML compliant (according to W3C) Of the 5 sites which contained a CSS style sheet, 0 had errors (according to W3C) 3 pages were HTML compliant (according to NetMechanic) Issues HTML-compliance is important for ensuring wide accessibility and for repurposing content
18
18 Accessibility Entry point examined for compliance with W3C WAI guidelines for accessibility using the Bobby Web- based tool: Findings Only 2 pages had no WAI Priority 1 error Issues Compliance with accessibility standards is important for ensuring access to resources for people with disabilities Compliance with accessibility standards may be an organisational requirement Compliance with accessibility standards may be a legal requirement
19
19 Size Of Entry Point Using Bobby Findings (Bobby) Largest entry point initially appeared to be 159 Kb On further analysis of framed sites the largest entry point was found to be 236.91 Kb The smallest appeared to be 1 Kb – but this was a FRAMES page (and not the individual linked pages) On further analysis of framed sites the smallest entry point was found to be 15.45 Kb Issues Bobby flagged pages which used frames but further manual analysis and calculations were needed
20
20 Size Of Entry Point Using NetMechanic Findings (NetMechanic) Largest entry point initially appeared to be 237,107 b (231 Kb) The smallest appeared to be 16,045 b (15.7 Kb) Issues NetMechanic flagged pages which used frames but further manual analysis and calculations were needed Bobby and NetMechanic identified the same largest and smallest sites – but this is not always the case
21
21 Comments On Size Measurements Use of tools to analyse size of Web pages has indicated several issues: Need for manual inspection of results (normally outliers) in order to spot invalid comparisons Different ways of treating: Redirects Frames User-agent negotiation etc. and inconsistencies in handling: robot exclusion protocol external files (e.g. CSS and JavaScript), etc. may result in inconsistent findings Changes in content of page (e.g. inclusion of news items, personalised interfaces, etc.) Output generated for viewing on Web, not further processing Current need to manual sum sub-parts
22
22 Link Popularity The numbers of links to the Web site was found using LinkPopularity (which has an interface to AltaVista): Findings The most linked-to Web site had 2,731 links The least linked-to Web site had 45 links Issues Links can drive traffic to your Web site Links can be used by citation-based search engines (such as Google) to boost the ranking of your site (many links to your page means Google will give it a higher ranking than a similar page with fewer links) Snapshots of link popularity can help gauge effectiveness of publicity campaigns
23
23 Search Engine Coverage / Size Of Web Site AltaVista and Netscape’s What’s Related tool were used to measure the size of the museum Web sites (i.e. the numbers of pages they had indexed): Findings Most no. of pages indexed by AV was 2,037 pages Most no. of pages indexed by NS was 1,919 pages Least no. of pages indexed by AV was 0 pages Most no. of pages indexed by NS was 0 pages Issues The nos. of pages indexed should be ≥ 0 and ≤ nos. of pages on Web site If significantly fewer pages are indexed than exist, this may show a Web site which is not search- friendly (e.g. use of frames, splash screens, etc.)
24
24 Search Facility Information on museum’s search engine was found: Findings 10 sites have no search facility 3 have a search facility: 1 uses the FreeFind externally-hosted search engine 1 uses a Microsoft search engine 1 uses a Perl script (to search an online catalogue) 1 search facility not working (over 1 month period) Issues Users expect to be provided with search facilities It can take < 30 minutes (and little technical expertise) to make an externally hosted search engine available, suitable for simple static Web sites (but not many people know this)
25
25 404 Error Page Information on the 404 error page was found: Findings 10 sites use the default 404 error message 3 have a lightly branded error message, but with little additional functionality Issues The 404 error page is (sadly) likely to be widely accessed It is desirable that it: Reflects the Web sites look-and-feel Provides functionality to assist a user who is ‘lost’: Provides access to a search facility / site map Provides contact details The 404 page can also be context-sensitive (e.g. different pages for users following a local link / remote link / no link)
26
26
27
27 Robots.txt Information on the Web site’s robots.txt file was found: Findings 12 sites have no robots.txt file 1 site has a simple robots.txt file Issues robots.txt file can be used to control indexing of your Web site e.g. stop robots from indexing: Pre-release versions of pages Test areas …
28
28 Other Surveys Additional surveys were carried out: Cachability Of Entry Point Cacheability Engine used 11 entry points were cachable and 2 were not What’s Related To Web Site Netscape’s What's Related? facility used to record: Popularity, nos. of pages and nos. of links Relationships with other sites
29
29 Six of the Best: Museums Guardian’s Online supplement (18 Oct 2001) published their list of the six best Museum Web sites: The Hermitage in St Petersberg at Metropolitan Museum at SCRAN at Tate Modern at The Louvre at Design Museum at
30
30 Comparisons Automated Surveys 3 had a search facility Nos. of links to sites ranged from 723 to 18,366 All surveyed entry points had P1 accessibility errors All surveyed entry points had HTML errors Observations 3 were providing a search facility Most were providing a simple robots.txt file Some of the 404 error messages were slightly better
31
31 Accessible to Browsers How do the Web sites look in different browsers? The Lynx text browser and an emulation of the Mosaic browser were used in order to investigate how the Web sites would look to: Users of old browsers Users of browsers with no JavaScript support Users of text browsers (or an indexing robot)
32
32 Mosaic
33
33 Lynx
34
34 Limitations Of Survey Limitations of this type of benchmarking approach include: Lack of standards Limitations of the tools Resources needed to carry out surveys Scoping of Museum sites and invalid comparisons Automated approach fails to address content issues which require a manual approach
35
35 Limitations - Standards There is a lack of standards to support benchmarking work (or conflicting standards). For example: Size of a page How do you measure the size of the museum’s entry point? You need this in order to make comparisons and if, say, you have guidelines on the maximum file size. Problems What do you measure (HTML file, inline images, external CSS and JavaScript files, …)? Changes in file content (e.g. user-agent negotiation, news content, frames and refresh elements, etc.) How do you handle the robot exclusion protocol (REP) NOTE: Bobby and NetMechanic work differently: the former only measure HTML and images, the latter obeys the REP
36
36 Limitations - Tools Issues: Auditing tools tend to make implicit definitions (e.g. measuring size of a page). Different results may be obtained when using different tools for same purpose (or if vendor changes its definition) Use of Web-based auditing services: Talk has described use of (mainly free) Web-based services The providers may change their policy Use of the URL interface to pass parameters (rather than direct use of the form on the Web page) may not be allowed Use of desktop auditing tools Use of desktop tools avoids the problems of change control of Web based services. However it means that it may be difficult for others to reproduce findings
37
37 Limitations - Resources It can be time-consuming to: Maintain URL of entry point to museum Web sites (need to have close links with provider of central portal) Manage the input to the variety of Web-based services Process the output from the Web-based services (current need to initiate inquiry, wait for results and manually copy and paste results)
38
38 Limitations – Scope of Web Site Scope What is a museum Web site? What is not part of a museum Web site? It can be difficult to answer these questions. There are no standard ways to define a “Web site” other than by use of domain names and directory structures Even directory structures can be inadequate if they are not used correctly Comparisons It may not to sensible to make comparisons between museums of different types and sizes
39
39 Limitations – Automated Only Use of an automated approach: Would not (easily) address content issues Has been supplemented with manual observations (e.g. home page, 404 page & search engine page) However: An automated approach can be more objective and reproducible An automated approach should be less resource- intensive (once software has been set up to maintain links to resources, surveys sites and process results) A automated approach could be used in conjunction with a manual survey (of a representative sample set of resources)
40
40 Beyond A Pilot Despite the limitations which have been described, would a comprehensive and systematic benchmark of UK Museum Web sites be of benefit? Can we address the resource issues? Are the lack of standards being addressed? Can we find someone to do the work? Should the focus be developmental? Can the work be extended to provide notification of problems (e.g. search engine not working)? What may happen if we don’t do this? Might we find that funders set up inappropriate or flawed performance indicators? What may happen if we don’t do this? Might we find that funders set up inappropriate or flawed performance indicators?
41
41 A Model For Implementation The benchmarking process can be made less time-consuming if a more flexible model for managing the data was used At present we seem to have a HTML page with links to museum Web sites Unfortunately HTML pages are difficult to repurpose Page for viewing Page for input to Web services A better model is to store links in a neutral databases, and to generate pages for viewing by end users and for input into benchmarking Web services The database could also be reused for other purposes e.g. checking links and email notifications of problems
42
42 Towards “Web Services” Background Web initially implemented for provision of information CGI allowed users to input data and provided integration with backend applications Techniques described use URL as input to auditing service. However this provides limited functionality and is susceptible to vagaries of marketplace Future “Web Services” will support machine integration by providing a standard messaging infrastructure which uses HTTP protocol XML output (e.g. EARL) will provide a neutral format for benchmarking output, and can describe benchmarking environment (EARL is RDF)
43
43 Need For Standard Definitions There is a need for standard definitions of terminology such as Web page, visit, unique visit, session, etc. in order to ensure that meaningful and objective comparisons can be made The market place is addressing current deficiencies within Web Advertising and Web Auditing communities (and there are financial incentives for this to be solved) With the growth in e-governments internationally and governments setting targets (X% of government work to be carried about electronically by 2005)
44
44 Doing The Work If there is further interest, who should do the work? Who Funding body Auditing body Other central body Volunteer Part of current remit What Why Other(s) New remit Research interest Dissemination Provides benefits to community Maintain central database Software development Student project Producing reports Benchmarking Work Researcher
45
45 What Next? To summarise: Approach to the automated benchmarking of a small set of museum Web sites has been shown Implications of the findings have been discussed There are limitations of the methodology It is suggested that: Despite the limitations benchmarking of museum Web sites can be beneficial: Community building Learning from successes and mistakes There may be advantages in carrying out this work within the community
46
46 Questions Any questions? Questions For You Would further work be useful? Who would do the work? Is there a need for a portal for use by the community of museum Web managers as well as for end users? Anyone interested in joint work in this area (possibilities of a paper for a conference - e.g. Museums and the Web 2002 conf. - proposals needed by 30 Nov) Questions For You Would further work be useful? Who would do the work? Is there a need for a portal for use by the community of museum Web managers as well as for end users? Anyone interested in joint work in this area (possibilities of a paper for a conference - e.g. Museums and the Web 2002 conf. - proposals needed by 30 Nov)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.