The search for alternative metrics for taxonomy Daphne Duin & Peter van den Besselaar VU university Amsterdam Org Science & Network Institute
Altmetrics for research evaluation Quality (impact, relevance, originality) is assessed by relevant audiences – Scholarly – Non-scholarly, such as economic, professional, policy, general public Metrics based on communication – Scholarly publications, citations (not very useful), but this does not work (in all fields), topics, networks Science on the web: altmetrics –> new communication media – Non-scholarly Indicators for societal impact: new metrics needed, based on communication with societal audiences. Eg, website visits/activity
Test
The case Scratchpads are biodiversity research communities on the Web – a research infrastructure Platform for collaboration, data sharing, publishing – Tagging – Data analysis – Blogs – Collaborative writing
Scratchpads Started 2007 – Today > 200 communities and > 3000 registered users, numbers go up every week EU funded Evaluation of research infrastructures – Scholarly use – Societal benefits -> Horlings, Van den Besselaar, review, forthcoming
Questions Can we use web data to identify the relevant audiences of the Scratchpads infrastructure? Can we use web data to study if and how often the sites are used to “produce” content?
The Data We used: Google Analytics reports for all Scratchpads and compared them to the report of one specific site Period Oct 1, 2010-March 31, 2011 Web server log data for 1 day (24 h) to see what people are doing - CMS system with standardised (/add/edit/delete/comment/content)
Results – the audiences I 1 Oct March unique Service Providers came in to Scratchpad domain (no bots); “Average time on site” > 4 seconds Of which 6896 telecom-internet companies (ISPs) Of which 2316 identifiable user organizations = non- ISPs (25%) Clustered in 8 categories
Audiences - all sites 2316 unique Service Providers >200 community sites 1 Oct 10 –April 11
Categories Research/Education = Universities, laboratories, science museums, botanical gardens libraries, schools, colleges Government = National/state/local departments in agriculture, pest control, food security, forest management, environment energy, transport Companies = Food, Pharmaceuticals, Mining, energy companies, pest control products, Accountancy, Consultancy biotechnology Non-profit= conservation environmental agencies, societies Health= Hospitals, health services health/medical research, Art/culture/media = art museums, art academies, broadcasting and media services publishing companies Travel = hotels, airlines, stations (wifi?) Other=Church
Scholarly use Quick test: – Scholarly journals in the field – Authors – Corporate addressed – Are these organizations also scratchpad users? – Yes – but by far not exclusively
Results – the audiences II Same for 1 specific site 276 unique Service Providers 201 are internet/telecom companies (ISPs) 75 are identifiable user organizations (non ISPs) 2 categories
Results – 1 site MicroOrg.info 75 unique Service Providers, commercial one’s excluded
Using the Web is science, to do what? Web data to study the if and how often the sites are used to “produce” content Web sever log data for February 1, 2011 (24h) COMINED WITH POST /node/add/image HTTP/1.1 /add; /edit; /delete; /comment; /content 1148 “producing” actions Feb 1, 2011: 1270 visits registered in Google Analytics
So what does this tell us? I Analysis of Services Providers... Reveals interesting insights on part of the audiences coming to the Scratchpads Such as... Audiences from different educational levels; from diverse scientific disciplines and other professionals, organizations However: 75% through ISP’s
So what does this tell us? II Working on the web, to do what? Scratchpad web sever log data can be used to identify if and how often content “producing” activities occur If combined with numbers on consuming actions this gives a much more comprehensive view of the use of an e-science infrastructure
Discussion Data sets used are rich and analyse discussed here only a start Further research should tell us more about: - interdisciplinary and different educational interest of the sites -which sites or content attract what type of audience (ISP) - Division between producing and consuming actions Issues: – Division of ISPs versus user organizations – Who what type of organizations/companies are have their own ISPs and who not why, what are the trends? – Trends on tele-working and use of wifi in the work enviroment – Percentage of bots in Internet traffic
Acknowledgments We thank the following people for helping us gathering and analyzing the web data for Altmetrics11 Simon Rycroft and the rest of the Scratchpad team at the Natural History Museum London Laura Hollink – TU Delft