Download presentation
Presentation is loading. Please wait.
Published byTimothy McCoy Modified over 8 years ago
1
DSpace Statistics Graham Triggs Head of Repository Systems, Symplectic
2
A Brief History
3
Statistics in DSpace 1.0
4
Statistics in DSpace 1.1 This slide is left intentionally blank
5
Statistics in DSpace 1.2 If I’m honest, this is just padding
6
Statistics in DSpace 1.3
7
Classic Statistics Shows items archived, views, searches Parses dspace.log Renders flat HTML files Uses two scripts which must be scheduled Reports can be public, or admin only
8
Classic Statistics – Config All configuration in [dspace]/config/dstat.cfg (overview and search exclusions) Displays: Overview Archive breakdown (item types) Items viewed Actions (Deletion, Update, Create, etc.) Logins Searches (keywords) Action names in [dspace]/config/dstat.map
9
Classic Statistics – Issues dspace.log is primarily for debugging May not log all information required May log lots of unnecessary information Size of log files 1 log line does not equal a single access No filtering of spiders, robots, etc. Log parsing may take some time Slow to update stats
10
Fast Forward: DSpace 1.6
11
Solr Statistics Available for JSP and XML Uis Event logger writes to Apache Solr Filters Spiders by IP address Reports are searches of usage data Reports can be public, or admin only
12
Solr Stats - What is Indexed Time Type (item, bitstream, etc), Id Owning Community, Owning Collection, Owning Item IP, Continent, Country, City, Longtitude / Latitude Eperson Id, User Agent Flag to indicate Robot / Spider
13
Solr Stats – Home Top 10 items
14
Solr Stats – Community Total visits Visits last 7 months Top 10 Countries Top 10 Cities
15
Solr Stats – Collection Total visits Visits last 7 months Top 10 Countries Top 10 Cities
16
Solr Stats – Item Total visits Total file views Visits last 7 months Top 10 Countries Top 10 Cities
17
Solr Stats – Config (v1.6) [dspace]/config/dspace.cfg solr.log.server Location of Solr server / application solr.dbfile Location of Geo database solr.spiderips.url URLs to download IP addresses of search spiders useProxies Client identification when hosted behind proxy solr.query.filter.spiderIp Filter out spider IP addresses in query solr.query.filter.isBot Filter out ‘isBot’ field in query statistics.item.authorization.admin Set to ‘true’ to restrict to admins, false for public access
18
Solr Stats – Config (v1.7) [dspace]/config/dspace.cfg solr.log.server solr.dbfile solr.spiderips.url useProxies solr.query.filter.spiderIp solr.query.filter.isBot statistics.item.authorization.admin solr.resolver.timeout Timeout for the DNS resolver (lower for fewer connections) solr.satatistics.logBots Disable logging of events by spider IP addresses
19
Solr Stats – Config (v1.8) [dspace]/config/modules/solr-statistics.cfg server spiderips.urls dbfile resolver.timeout useProxies logBots query.filter.spiderIp query.filter.isBot authorization.admin query.filter.bundles Bundles for which to display file stats (requires 1.8 index)
20
Solr Stats – Improvements Dspace v1.8 Displayed file bundle Configurable - defaults to ORIGINAL bundle [dspace]/bin/dspace stats-util –b –r Dspace v1.7 Solr Optimization [dspace]/bin/stats-util –o Autocommit Defaults to 15 minute intervals Configurable in [dspace]/solr/statistics/colrconfig.xml maxTime property
21
Solr Stats – Upgrade from Classic Scripts parse dspace.log files to Solr entries [dspace]/bin/dspace stats-log-converter [dspace]/bin/dspace stats-log-importer -I Input file -m Adds a wildcard to the input (i.e. dspace.log*) -s Skip reverse DNS lookup (can be slow) -v Verbose output
22
Solr Stats –Custom Queries You can expand the reports by querying the Solr index directly Example: Top downloads for a user – query on epersonid facet: 1167 251 42 36 20 18 9 0
23
Solr Stats - Maintenance [dspace]/bin/dspace stats-util –h usage: StatisticsClient -b,--reindex-bitstreams Reindex the bitstreams to ensure we have the bundle name -r,--remove-deleted-bitstreams While indexing the bundle names remove the statistics about deleted bitstreams -u,--update-spider-files Update Spider IP Files from internet into /dspace/config/spiders -f,--delete-spiders-by-flag Delete Spiders in Solr By isBot Flag -i,--delete-spiders-by-ip Delete Spiders in Solr By IP Address -m,--mark-spiders Update isBot Flag in Solr -h,--help help -o,--optimize Run maintenance on the SOLR index
24
Solr Stats - Issues Privacy laws – IP addresses not anonymized Performance issues / resource usage Maintenance of Solr Usage when Solr is unavailable Usage tracking during periods of high usage
25
Summary Classic Statistics Possibly slow to analyse, fast to display Delay in updating Very imperfect Solr Statistics Updates ‘real time’ Can be slow to render as dataset grows Improved in each release Less imperfect
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.