Analyzing EZproxy logs with ezPAARSE Hi my name is Kat Greer. I am the Systems Librarian at Georgia Gwinnett College. I’m here today to talk to you about the software ezPAARSE and how it can analyze your EZproxy logs in order to provide you with statistics about electronic resource usage. Kat Greer Georgia Gwinnett college kgreer1@ggc.edu
ezPAARSE Free and open source software Log analyzer and associated knowledgebase Developed by French consortium COUPERIN (http://www.couperin.org) Piloted at University of Lorraine 2014 LIBER award Documentation: http://analogist.couperin.org/ezpaarse/doc/usage_en GitHub: https://github.com/ezpaarse-project/ezpaarse So what is ezPAARSE? It’s a free and open source software, which consists of a log analyzer and it’s associated knowledgebase. It was developed by the French consortium COUPERIN, which is A nation-wide consortium that was developed to consolidate the evaluation, negotiation, and purchase of electronic resources Membership is varied, consisting of universities, research institutions, hospitals, and other institutions First piloted at the University of Lorraine, France Not a lot of recognition this side of the Atlantic, but it has had some success in Europe, including a summary for the software that won a LIBER award in 2014. While their documentation is primarily in French, there is some English-version documentation available. And there is always Google Translate. I have also provided the link to the ezPAARSE GitHub project site, for those of you who are more technically inclined.
Hosts an ezPAARSE installation Macro-enabled workbook Visualizations Software Log Analyzer Parser Multi-platform Wiki Portal Collaborative Space Hosts an ezPAARSE installation Macro-enabled workbook Visualizations Charts and graphs There are three main entities that allow this process to work: ezPAARSE, Analogist, and a macro-enabled workbook (xlsm). ezPAARSE is the software component, which you can choose to either install locally (it’s multi-platform so it works on Microsoft, Linux, or Mac), or use the (SaaS) version hosted by Analogist. ezPAARSE essentially takes your EZproxy logs and parses them, which means that is transforms the data in the logs into data that it can interpret. It takes each line of the log and separates it into discrete categories, such as date or time, platform accessed, format accessed, login used, etc, and delimits them (separates them). It also enriches the data from the logs by applying information from known publishers. Not only does ezPAARSE separate and organize the log data, but it pulls information from it’s available knowledgebase/platform analyses to help bring meaning to the log data that it is parsing. (i.e. it dissects the url request into it’s parts and is able to recognize platform, format, type of resource, issn, etc) The second component is Analogist, which serves as the wiki portal and collaborative space for sharing analyses of publisher platforms. So ezPAARSE is the software that analyzes your EZproxy log files, and it accomplishes this task because of the knowledgebase, Analogist. The final component is the macro-enabled workbook, which is available as a Microsoft Excel or LibreOffice file. It takes the parsed data output that ezPAARSE gives you and through a series of macros or predefined steps, automation created via Visual Basic for Applications (VBA), transforms the parsed data into charts and graphs that can be better interpreted. It provides the visual representation of the data from your log files.
Here I have a visual representation of the entire process. So it starts here with the user who requests to access a resource. If they are off-campus, they are required to authenticate via LDAP or some other method. Once they are authenticated, the proxy server directs their traffic to the requested resources. EZproxy records the information it processes in this transaction in its log files (what it records can be customized, which we will talk about in a minute). So here comes ezPAARSE, which parses the log file(s) with the aid of the information available publisher knowledgebase and converts the data into “usage events”. Need more info on KBART (http://www.uksg.org/kbart/endorsement) These “usage events” are then transformed into visual representation. Viola! Image used with permission of ezPAARSE
About Your Logs Records any data that the server processes Do you have access? Do you know your LogFormat? OCLC documentation: http://www.oclc.org/support/services/ezproxy/documentation/cfg/logformat.en.html Example LogFile ezp.log LogFormat %h %l %u %t "%r" %s %b 132.174.1.1 - - [14 /Mar/2014:09:39:18 -0700] “GET http://www.somedb.com:80/index.html HTTP/1.0” 200 1234 Before we move on to a demonstration
An Example Process
Thoughts and Considerations
Benefits Using locally-created logs On-demand source for usage statistics Easy to implement Virtually effortless Visual representation COUNTER-like reports (JR1)
Drawbacks Still in development Not a perfect solution Only tracks off-campus usage Not all encompassing Only parses information known from Analogist knowledgebase Doesn’t recognize certain platforms
Overall Impressions Value of locally-gathered usage statistics Software potential (Still in development) Value of using ezPAARSE for GGC? Still a work in progress Other options AWstats Python scripts Homegrown solution Use data your way
Questions?