DSpace Statistics Graham Triggs Head of Repository Systems, Symplectic.

Slides:



Advertisements
Similar presentations
Module 5: Routing BizTalk Messages. Overview Lesson 1: Introduction to Message Routing Lesson 2: Configuring Message Routing Lesson 3: Monitoring Orchestrations.
Advertisements

© 2012 Entrinsik, Inc. Informer Administration Exploring the system menu and functions PRESENTER: Jason Vorenkamp| Informer Software Engineer| March 2012.
ESafe Reporter V3.0 eSafe Learning and Certification Program February 2007.
SharePoint User Group Chicago: 1/24/2013 SharePoint 2013 Search Overview.
DSpace 4: TDL upgrades and new features SEPTEMBER 30, 2014.
Calendar Browser is a groupware used for booking all kinds of resources within an organization. Calendar Browser is installed on a file server and in a.
Supervisor: Amichai Shulman Students: Vitaly Timofeev Eyal Shemesh.
Hands-On Microsoft Windows Server 2003 Networking Chapter 7 Windows Internet Naming Service.
Conference Calendar 1.Description Overview 2.Conference Information 3.User Information 4.Use Cases 5.Schedule.
IIS Configuration © N. Ganesan, Ph.D.. Renaming the Default Web.
Shibabroto Banerjee Department of Computer Science and Engineering IIT Kharagpur.
1 Web Content Delivery Reading: Section and COS 461: Computer Networks Spring 2007 (MW 1:30-2:50 in Friend 004) Ioannis Avramopoulos Instructor:
The Design Of A Web Document Snapshots Delivery System David Chao College of Business San Francisco State University.
23 October 2002Emmanuel Ormancey1 Spam Filtering at CERN Emmanuel Ormancey - 23 October 2002.
1 Enabling Secure Internet Access with ISA Server.
Administrator Training. Login Screen Filled Forms Screen Logging In.
Quick Start Guide: Filters Advanced Learn about: 1.What filters are and their functionality 2.How to create a filter using Samples, Equipment & Labels.
National Workshop on Institutional Digital Repository
This presentation will guide you though the initial stages of installation, through to producing your first report Click your mouse to advance the presentation.
W3Perl A free logfile analyzer. Features Works on Unix / Windows / Mac – based on Perl scripts Web / FTP / Squid / servers – Others log format can.
Linux Operations and Administration
Configuring a Web Server. Overview Overview of IIS Preparing for an IIS Installation Installing IIS Configuring a Web Site Administering IIS Troubleshooting.
Pc Naming Configuration 1.WEB REGISTER 2.FIXNAME 3.MCAFEE AGENT SETUP ITC Training: Session 2.
Zabbix Performance Tuning
Lecturer: Ghadah Aldehim
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Building Highly Scalable Websites Karol Jarkovsky Sr. Solution Architect
1 Guide to Novell NetWare 6.0 Network Administration Chapter 13.
Finish configuration cloudclinica root jdbc:postgresql:5432//localhost/cc_db JDBC Url: JDBC Driver: User name: Password: ******** org.postgresql.Driver.
Launch ClinCapture root jdbc:postgresql:5432//localhost/cc_db JDBC Url: JDBC Driver: User name: Password: ******** org.postgresql.Driver When CC starts.
Vantage Report 3.0 Product Sales Guide
DSL-2544N Dual Band Wireless N600 Gigabit ADSL2+ Modem Router
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
NEEO Technical Workshop 2 Exchange of usage metadata Sciences Po, Paris January 15th, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels.
Dspace Workflow, Administration and Customization Dr. Noorhidawati Abdullah Digital Library Research Group, FCSIT, UM
Database control Introduction. The Database control is a tool that used by the database administrator to control the database. To enter to Database control.
Module 7: Resolving NetBIOS Names by Using Windows Internet Name Service (WINS)
Module 10 Administering and Configuring SharePoint Search.
A Brief Documentation.  Provides basic information about connection, server, and client.
Module 9: Implementing Caching. Overview Caching Overview Configuring General Cache Properties Configuring Cache Rules Configuring Content Download Jobs.
11 MANAGING AND MONITORING DNS Chapter 4. Chapter 4: MANAGING AND MONITORING DNS2 DNS MANAGEMENT TOOLS  DNS console  Nslookup  DNSLint  Logging features.
Agilent Technologies Copyright 1999 H7211A+221 v Capture Filters, Logging, and Subnets: Module Objectives Create capture filters that control whether.
Maintaining a Database Access Project 3. 2 What is Database Maintenance ?  Maintaining a database means modifying the data to keep it up-to-date. This.
What's New in Kinetic Calendar 2.0 Jack Boespflug Kinetic Data.
MCSE Guide to Microsoft Exchange Server 2003 Administration Chapter Five Managing Addresses.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
2007cs Servers on the Web. The World-Wide Web 2007 cs CSS JS HTML Server Browser JS CSS HTML Transfer of resources using HTTP.
The DSpace Course Module – DSpace statistics and Google Analytics.
Dr. David Roldán Martínez Universidad Politécnica de Valencia, Spain & Nuno Fernandes Universidade Fernando Pessoa, Portugal Site Stats, the power of event.
Optimizing Windows Vista Performance Lesson 10. Skills Matrix Technology SkillObjective DomainObjective # Introducing ReadyBoostTroubleshoot performance.
Windows 7 WampServer 2.1 MySQL PHP 5.3 Script Apache Server User Record or Select Media Upload to Internet Return URL Forward URL Create.
 Tracks seats availability in a specific class (CRN)- only for HOKIES.  Has the ability to track a list of classes.  The tracking list grows dynamically.
Session 11: Cookies, Sessions ans Security iNET Academy Open Source Web Development.
VITALE, CATURANO & COMPANY LTD Microsoft SharePoint Databases Overview VITALE, CATURANO & COMPANY LTD SharePoint Developer Series – SharePoint Database.
DBS Monitor and DAN CD Projects Report July 9, 2003.
Google Analytics Graham Triggs Head of Repository Systems, Symplectic.
© Janice Regan, CMPT 128, Jan 2007 CMPT 371 Data Communications and Networking HTTP 0.
Windows Vista Configuration MCTS : Internet Explorer 7.0.
Nate Anderson So, You’ve Inherited an OnBase System.
ArcGIS for Server Security: Advanced
Dspace Statistics: Google Analytics, Solr
IMPLEMENTING NAME RESOLUTION USING DNS
Upgrade SFX V3 to V4 Lieve Rottiers.
Detailed search stats from DSpace Solr
Servicenow Admin Certification Training
Web Privacy Chapter 6 – pp 125 – /12/9 Y K Choi.
Gregory Smith November 2012
Getting Started With Solr
Overview Multimedia: The Role of WINS in the Network Infrastructure
Designing IIS Security (IIS – Internet Information Service)
Presentation transcript:

DSpace Statistics Graham Triggs Head of Repository Systems, Symplectic

A Brief History

Statistics in DSpace 1.0

Statistics in DSpace 1.1 This slide is left intentionally blank

Statistics in DSpace 1.2 If I’m honest, this is just padding

Statistics in DSpace 1.3

Classic Statistics  Shows items archived, views, searches  Parses dspace.log  Renders flat HTML files  Uses two scripts which must be scheduled  Reports can be public, or admin only

Classic Statistics – Config  All configuration in [dspace]/config/dstat.cfg (overview and search exclusions)  Displays:  Overview  Archive breakdown (item types)  Items viewed  Actions (Deletion, Update, Create, etc.)  Logins  Searches (keywords)  Action names in [dspace]/config/dstat.map

Classic Statistics – Issues  dspace.log is primarily for debugging  May not log all information required  May log lots of unnecessary information  Size of log files  1 log line does not equal a single access  No filtering of spiders, robots, etc.  Log parsing may take some time  Slow to update stats

Fast Forward: DSpace 1.6

Solr Statistics  Available for JSP and XML Uis  Event logger writes to Apache Solr  Filters Spiders by IP address  Reports are searches of usage data  Reports can be public, or admin only

Solr Stats - What is Indexed  Time  Type (item, bitstream, etc), Id  Owning Community, Owning Collection, Owning Item  IP, Continent, Country, City, Longtitude / Latitude  Eperson Id, User Agent  Flag to indicate Robot / Spider

Solr Stats – Home  Top 10 items

Solr Stats – Community  Total visits  Visits last 7 months  Top 10 Countries  Top 10 Cities

Solr Stats – Collection  Total visits  Visits last 7 months  Top 10 Countries  Top 10 Cities

Solr Stats – Item  Total visits  Total file views  Visits last 7 months  Top 10 Countries  Top 10 Cities

Solr Stats – Config (v1.6)  [dspace]/config/dspace.cfg  solr.log.server  Location of Solr server / application  solr.dbfile  Location of Geo database  solr.spiderips.url  URLs to download IP addresses of search spiders  useProxies  Client identification when hosted behind proxy  solr.query.filter.spiderIp  Filter out spider IP addresses in query  solr.query.filter.isBot  Filter out ‘isBot’ field in query  statistics.item.authorization.admin  Set to ‘true’ to restrict to admins, false for public access

Solr Stats – Config (v1.7)  [dspace]/config/dspace.cfg  solr.log.server  solr.dbfile  solr.spiderips.url  useProxies  solr.query.filter.spiderIp  solr.query.filter.isBot  statistics.item.authorization.admin  solr.resolver.timeout  Timeout for the DNS resolver (lower for fewer connections)  solr.satatistics.logBots  Disable logging of events by spider IP addresses

Solr Stats – Config (v1.8)  [dspace]/config/modules/solr-statistics.cfg  server  spiderips.urls  dbfile  resolver.timeout  useProxies  logBots  query.filter.spiderIp  query.filter.isBot  authorization.admin  query.filter.bundles  Bundles for which to display file stats (requires 1.8 index)

Solr Stats – Improvements  Dspace v1.8  Displayed file bundle  Configurable - defaults to ORIGINAL bundle  [dspace]/bin/dspace stats-util –b –r  Dspace v1.7  Solr Optimization  [dspace]/bin/stats-util –o  Autocommit  Defaults to 15 minute intervals  Configurable in [dspace]/solr/statistics/colrconfig.xml  maxTime property

Solr Stats – Upgrade from Classic  Scripts parse dspace.log files to Solr entries  [dspace]/bin/dspace stats-log-converter  [dspace]/bin/dspace stats-log-importer  -I  Input file  -m  Adds a wildcard to the input (i.e. dspace.log*)  -s  Skip reverse DNS lookup (can be slow)  -v  Verbose output

Solr Stats –Custom Queries  You can expand the reports by querying the Solr index directly Example: Top downloads for a user – query on epersonid facet:

Solr Stats - Maintenance  [dspace]/bin/dspace stats-util –h usage: StatisticsClient -b,--reindex-bitstreams Reindex the bitstreams to ensure we have the bundle name -r,--remove-deleted-bitstreams While indexing the bundle names remove the statistics about deleted bitstreams -u,--update-spider-files Update Spider IP Files from internet into /dspace/config/spiders -f,--delete-spiders-by-flag Delete Spiders in Solr By isBot Flag -i,--delete-spiders-by-ip Delete Spiders in Solr By IP Address -m,--mark-spiders Update isBot Flag in Solr -h,--help help -o,--optimize Run maintenance on the SOLR index

Solr Stats - Issues  Privacy laws – IP addresses not anonymized  Performance issues / resource usage  Maintenance of Solr  Usage when Solr is unavailable  Usage tracking during periods of high usage

Summary  Classic Statistics  Possibly slow to analyse, fast to display  Delay in updating  Very imperfect  Solr Statistics  Updates ‘real time’  Can be slow to render as dataset grows  Improved in each release  Less imperfect