Download presentation
Presentation is loading. Please wait.
Published byElvin Owen Adams Modified over 9 years ago
1
Data, Data Everywhere: Making Sense of the Sea of User Data
2
MaxData Project Carol Tenopir and Donald W. King Gayle Baker, UT Libraries Eleanor Read, UT Libraries Maribeth Manoff, UT Libraries David Nicholas, Ciber, University College London http://web.utk.edu/~tenopir/maxdata/index.htm
3
MaxData “Maximizing Library Investments in Digital Collections Through Better Data Gathering and Analysis” Funded by Institute of Museum and Library Services (IMLS) 2005-2007
4
Study Objectives To compare different methods of data collection To develop a model that compares costs and benefits to the library of collecting and analyzing data from various methods To help libraries make the best use of data
5
Study Teams Surveys (UT and Ohio Libraries) Library Data Reports (Vendor-provided and library collected) (UT Libraries) Deep Log Analysis of raw journal usage data (Ciber and OhioLINK)
6
A bit more about the surveys…
7
Surveys Research Extensive Research Intensive Master’s 1 Case Western AkronAshland UTFindlay Malone
8
Three Types of Questions Demographic Recollection Critical (last) incident of reading
9
Critical Incident Added to General Survey Questions Specific (last incident of reading) Includes all reading--e & print, library & personal Detailed questions about last article read, e.g., purpose, value, time spent, format, how located, source Last reading=random sample of readings Allows detailed analysis
10
What Surveys Answer that Logs Do Not Non-library readings Print as well as electronic readings Purpose and value of readings Outcomes of readings
11
Surveys provide much useful data, but… Surveys rely on memory and truthfulness Response rates are falling Surveys cost your users’ time Surveys can only be done occasionally Log reports and raw logs show usage
12
Local Sources of Use Data Local log data for databases Vendor-supplied usage reports Other sources of data
13
Local Log Data: Database Use Environment: Mixture of web-based and locally-loaded resources Problem: Use data from vendors not available or not uniform Solution: Log requests for databases from library’s database menu (1999- )
14
Local Log Data: Process MySQL and Perl CGI scripts Log files compiled monthly Process data with Excel and SAS Extract, reformat, summarize, graph
15
Uses of Local Log Data Subscription management Number of simultaneous users Pattern of use of a database over time Continuation decisions Cost per request Services management Use patterns by day, week or semester Location of users (campus, off-campus, wireless)
16
Local Log Data: Issues Logs requests for access, not sessions No detail on activity once in database Undercounts: Aggregators and full-text collections Bookmarked access Metasearch Other sources of usage data supplement log data
17
Vendor-Supplied Usage Reports Little post-processing of vendor data until 2002 Made available upon request Special attention to “big ticket” items Full-text Integrate subscription info with vendor data
18
Vendor-Supplied Usage Reports: Additional Processing ARL Supplemental Statistics Use data for electronic resources requested: Number of logins (sessions) Number of queries (searches) Number of items requested Fiscal year: July ‘04 – June ‘05
19
Vendor Reports to Review University of Tennessee Reports from 28 of 45 vendors listed as compliant with Release 1 of the Counter Code of Practice Reports from 26 other vendors
20
The Challenge of Vendor- Supplied Use Reports Request mode Delivery Format Time period Subscribed / titles used / all titles
21
Other Sources – Link Resolvers (e.g. SFX) Past the database level to access of individual journals Use is measured the same way across packages Where vendor reports are unavailable or incomplete (Open Access, backfiles) The more places SFX links are used (catalog, e-j list), the more complete the data
22
Other Sources – MetaSearch Engines (e.g. MetaLib) “Number of searches” data that may not be counted in vendor reports (Z39.50) Most useful and interesting to see how patrons are using federated searching
23
Other Sources – Proxy Servers (e.g. EZProxy) Standard web log format captures data for every request to the server – this generates large logs that have to be analyzed Some libraries send all users (not only remote users) through the proxy server for more complete log data
24
OhioLINK deep log analysis (DLA) showcase Choice of OhioLINK – oldest big deal, common publisher platform and source of interesting data Two purposes: 1) to show what kinds of data that DLA could generate; 2) raise the questions that need to be asked Raw server logs of off-campus use June to December ’04 (pick-up returnees) and on-campus use for October. Logs uniquely contained search and navigational behaviour, too
25
Metrics Four ‘use’ metrics employed – number of items/pages viewed, number of sessions conducted, number of items viewed in a session (site penetration) and amount of time spent online. An ‘item’ might be: a list of journals – (subject or alphabetic), a list of journal issues, a contents page, an abstract or full-text article. Search or navigational approach used (search engine, subject list of journals etc) Users: returnees; by subject of journal and sub-net; name and type of institution.
26
Is the resource being used? Items viewed. 1,215,000 items viewed on-campus (1 month) and 1,894,000 items viewed off campus (7 months). Titles used. Journals available October 2004 = 5872 5,868 jnls used if content lists, abstracts & articles included; 5,193 if only articles included. 5% of jnls accounted for 38% of usage; 10% for 53%, and 50% for 93%.
27
Is the resource being used? Number of journals viewed in a session. Very pertinent: OhioLINK all about massive choice Third of sessions saw no views to any items associated with a particular journal Of two-thirds of sessions recording a journal item view, half viewed item (s) from 1 journal, 30% from 2 to 3 journals, 14% from 4 to 9 journals and 7% from 10+ 49% of sessions saw a full text article viewed and the average number of articles viewed in a session was just over 2.
28
Is the resource being used? Site penetration 23% viewed 1 item in a session, 40% viewed 2 to 4 items, 21% viewed 5 to 10 items, 9% viewed 11 to 20 and 7% viewed 21+. Figures quite impressive when compared to other digital libraries. Thus, in the case of EmeraldInsight, 42% of users viewed just one item. Due to the greater level of download freedom offered by OhioLINK?
29
Is the resource being used? Returnees (off-campus) 73% accessed OhioLINK journals once during the seven months (might have also used OhioLINK on campus). 22% came back between 2 to 5 times, 3% between 6 to 15 times and 2% more than 15 times. Data compromised by floating IP addresses and multi-user machines
30
What can we learn about the methods used to find articles? Search engine popularity. 41% of sessions saw search engine only being used and a further 23% of sessions saw engine used together with either the alphabetic or subject lists. Users of engines more likely to look at wider range of: Journals. 66% of those using search engine viewed 2 or more journals, compared to 43% using either alphabetic or subject lists. People using all three methods most likely to view 10 or more different journals; nearly 1 in 5 did so;
31
What can we learn about the methods used to find articles? Users of engines more likely to look at wider range of: Subjects. Those utilising the engine were more likely to have viewed two or more subjects - 54% had done so compared to 41% of those whose sessions saw use of an alpha or subject list. Older material. Search engine users viewed older material, while those accessing the service via the alphabetical or subject lists were more likely to view very current material.
32
Issues This is only pilot data Caching means not all transactions recorded in logs Studying usage patterns of a given IP address, not a given user and there are the consequent problems that arise from multi-user machines, proxy servers and floating IP addresses There are problems with calculating session time However: 1) use a number of metrics; 2) will be collaborated by survey techniques; 3) we have three years to perfect our techniques!
33
References Nicholas D, Huntington P, Russell B, Watkinson A, Hamid R. Jamali, Tenopir, C. The big deal: ten years on. Learned Information 18(4) October, 2005, pp?? Nicholas D, Huntington P, Hamid R. Jamali, Tenopir, C. Journal of Documentation. 62, (2), 2006, pp?? Nicholas D, Huntington P, Hamid R. Jamali, Tenopir, C Finding information in (very large) digital libraries: a deep log approach to determining differences in use according to method of access. Journal of Academic Librarianship. March 2006, pp??
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.