Understanding library users you don't see Techniques for tracking and analyzing library Web resources Saturday June 24 Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Theme For many libraries, the number of visitors of their Web site and electronic resources exceeds the numbers that visit their physical premises. It's vital for libraries to understand how these remote visitors approach the Web site, not only to measure use but to improve the resources themselves. Marshall Breeding will present a number of practical techniques that libraries can use to better understand the use of their Web-based resources. Topics will include the basics of analyzing the server logs of the library's Web site, transaction logs from the OPAC, the complexities of measuring use of subscription-based electronic resources, and techniques for enhancing applications to better record how they are used.
Understanding remote users Vital to providing relevant library services More libraries may use library resources remotely through the Web than from physical library facilities Must work harder to ensure that Web- based services meet patron needs Move beyond hit counters and raw statistics to more sophisticated analysis and assessment
Analysis goals Improve usability Web site diagnostics Understand user needs Content selection decisions Improve quality of service Marketing Budget justification Strategy to increase interest and activity
Data sources for tracking remote use Web server logs Application logs Remote tracking data (Google Analytics) Vendor provided use statistics (e- resources)
Enterprise approach to analytics Multiplicity of Resources to track Web Servers OPACS E-Resources Databases Repositories Important to track the flow of use among all the library’s Web-based resources Beyond the library: study flow to and from higher-level Web sites and portals (University -> Courseware -> Library)
Web server logs Web servers are routinely configured to record detailed information about each request. Common elements include: File requested Date / time stamp Status code Request directive (get, post, head) Referrer (where the user came from) User agent (browser and platform data)
Example Web log Raw data for analysis process :01: GET /index.pl c hsd1.md.comcast.net Mozilla/4.0+(compatible;+MSIE+6.0;+Windows +NT+5.1;+SV1;+.NET+CLR ) e=off&q=september+11+television+archive
Exploiting referral data The query string component of the referrer can be parsed to reveal search terms and other interesting information off&q=september+11+television+archive off&q=september+11+television+archive User typed “september 11 television archive” in Google to find our site Important to study how users get to your site [example: TV News Public Web queries vs OpenWeb)
Analysis methodology Go beyond simply counting pages Identify Sessions Categorize users Determine use patterns Measure interest Time spent on Web site Bounce rate Page overlay analysis
Move from measurement to impact Establish site goals Benchmark current use Implement goal oriented improvements Measure impact Repeat as needed (Example: enhancement of TV News OpenWeb)
Appropriate data filtering Requests from indexing bots (crawlers) can skew statistics Count user requests and bot requests separately Performance monitors Link checkers Monitoring crawler activity is an important component of SEO and Web site discoverability strategies.
Resource Discovery How do users get to your site? Track performance of the Web site relative to major search engines SEO – Search engine optimization Few users begin with library Web sites
Troubling statistic Where do you typically begin your search for information on a particular topic? College Students Response: 89%Search engines (Google 62%) 2%Library Web Site (total respondents -> 1%) 2%Online Database 1% 1% Online News 1% Online bookstores 0% Instant Messaging / Online Chat OCLC. Perceptions of Libraries and Information Resources (2005) p
Library Discovery Model Library Web Site / Catalog Web Library as search Destination
TV News OpenWeb project Dramatic increase in Web site activity and loan requests through systematic and controlled exposure of metadata to Google and other search engines SEO (Search Engine Optimization) strategy Helped the Archive become financially self-sufficient.
Examples of Web reporting and analysis tools
Selected utilities Analog – free, open source NetTracker – enterprise level Web analysis application Google utilities Sitemap – process for submitting Web pages for optimized indexing by Google with some assessment capabilities Analytics – Sophisticated approach for measuring Web site performance
Analog Free Open Source application Basic Web statistics application Includes fairly full set of static metrics Command line utility – generates Web report Windows, Unix, Linux, etc.
NetTracker Unica Corporation Enterprise level Web analytics
NetTracker Executive Dashboard
NetTracker Bandwidth Trends
NetTracker Content
NetTracker Keyword Summary
NetTracker Referrers
NetTracker Pages Viewed
Google SiteMaps XML specification for systematically submitting URLs that represent a Web site Makes indexing more efficient but does not affect PageRank SiteMap interface provides utilities for monitoring how the site has been indexed with some analytical information on terms used to find your Web site.
Google SiteMaps Top Searches
Google SiteMaps Page Analysis
Google Analytics Available at no cost from Google Must receive invitation code Slanted toward e-commerce “Conversion University” – training on how to optimize Web site for high conversion rates. Allows Webmasters to establish site goals and measure performance
Google Analytics main
Google Analytics overview
Google Analytics Browser Versions
Google Analytics Top Content
Google Analytics Entrance-Bounce Rates
Google Analytics Navigational Analysis
Google Analytics Goal tracking
Application-level reporting and analysis Content management systems and other dynamically driven Web environments can provide additional usage information. Can offer additional information beyond raw Web logs More capabilities for identifying use based on user categories Reporting can be built into the business logic of the application
Examples from the TV News Web Site Reports of use by user category and institution Statistics on resource use Data on search types, query terms, etc. Ability to track all aspects of business activity
Other sources of Use data ILS OPAC Logs Proxy Server logs and reports Link resolver logs and reports
Limitations Can’t know the intent of the user User success can only be estimated Difficult to obtain trends by user type More aggressive reporting might intrude on privacy Few libraries require the level of user authentication needed to determine use by type of patron
Additional Information Breeding, Marshall. Strategies for Measuring and Implementing E-use. ALA TechSource. May-June pages. Breeding, Marshall. “Analyzing Web server logs to improve a site’s usage.” Computers in Libraries. Information Today. Medford, CT. October 2005.
Handout Presentation will be available after the conference at: ons/ala2006.ppt