Assessing the impact of software on science through bootstrapped learning in full texts Erjia Yan Metadata Mondays February 1, 2016.

Slides:



Advertisements
Similar presentations
Usage statistics in context - panel discussion on understanding usage, measuring success Peter Shepherd Project Director COUNTER AAP/PSP 9 February 2005.
Advertisements

Assessing and Increasing the Impact of Research at the National Institute of Standards and Technology Susan Makar, Stacy Bruss, and Amanda Malanowski NIST.
ISI Web of Knowledge – Innovative Solutions ISI Web of Knowledge / Web of Science – coming developments BIOSIS Archive Web Citation Index – New product.
Overview What is ‘Impact’, and how can it be measured? Citation Metrics Usage Metrics Altmetrics Strategies and Considerations.
Bibliometrics Toolkit Google Scholar (GS) is one of three central tools (the others being ISI and Scopus) used to generate bibliometrics for researchers.
Bibliometrics – an overview of the main metrics and products The MyRI Project team.
1/15 A New Semantic Similarity Based Measure for Assessing Research Contribution Petr Knoth & Drahomira Herrmannova Knowledge Media institute, The Open.
SCIENTROMETRIC By Preeti Patil. Introduction The twentieth century may be described as the century of the development of metric science. Among the different.
BIBLIOMETRICS Presented by Asha. P Research Scholar DOS in Library and Information Science Research supervisor Dr.Y.Venkatesha Associate professor DOS.
Steve Yip Head of Reference and Research Services HKUST Library Research Support Provided by HKUST Library and other JULAC Libraries in HK 1 Date : March.
INCITES PLATFORM NATIONAL OCEANIC AND ATMOSPHERIC ADMINISTRATION (NOAA)
Where should I submit my publication? Application Training Module Series III by Customer Education Team Stop Searching, Start.
Aims Correlation between ISI citation counts and either Google Scholar or Google Web/URL citation counts for articles in OA journals in eight disciplines.
1 Using metrics to your advantage Fei Yu and Martin Cvelbar.
Using Journal Citation Reports The MyRI Project Team.
SciVerse Science Direct SciVerse Scopus Gamze Keskin Customer Development Manager Turkey, Iran, Middle East and Central Asia.
Publishing strategies – A seminar about the scientific publishing landscape Peter Sjögårde, Bibliometric analyst KTH Royal Institute of Technology, ECE.
Web of Science Pros Excellent depth of coverage in the full product (from 1900-present for some journals) A large number of the records are enhanced with.
Araba Dawson-Andoh 122 A Alden Library
Journal Impact Factors and H index
CHOOSING THE APPROPRIATE MEDIUM FOR YOUR PUBLICATION: JOURNAL SELECTION TACTICS SCIENTIFIC LAWRENCE LIBERTI, MS, RPh VP, GENERAL MANAGER JUNE 2008.
THE DATA CITATION INDEX AN INNOVATIVE SOLUTION TO EASE THE DISCOVERY, USE AND ATTRIBUTION OF RESEARCH DATA MEGAN FORCE 22 FEBRUARY 2014.
Guillaume Rivalle APRIL 2014 MEASURE YOUR RESEARCH PERFORMANCE WITH INCITES.
Are downloads and readership data a substitute for citations? The case of a scholarly journal? Christian Schlögl Institute of Information Science and Information.
Biological Science Database Proquest WEDAD AL-HUSAINAN ISD/NSTIC Kuwait Institute for Scientific Research November/2012.
Orientation to Web of Science Dr.Tariq Ashraf University of Delhi South Campus
Wojciech Fenrich Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) University of Warsaw Prague, KRE 12,
INFORMATION SOLUTIONS Mary L. Van Allen 21 September 2005 Open Access Journals and citation patterns International Seminar on Open Access for Developing.
The Web of Science database bibliometrics and alternative metrics
The Latest in Information Technology for Research Universities.
New Workflows in Research and Collaboration and the Role of the Library. Introducing the Mendeley Institutional Edition LIBER Conference, Tartu, June 2012.
Rajesh Singh Deputy Librarian University of Delhi Research Metrics Impact Factor & h-Index.
Bibliometrics and Impact Analyses at the National Institute of Standards and Technology Stacy Bruss and Susan Makar Research Librarians SLA Pharmaceutical.
The Web of Science, Bibliometrics and Scholarly Communication 11 December 2013
T H O M S O N S C I E N T I F I C Marian Hollingsworth Manager, Publisher Relations July 18, 2007 Using Metrics to Improve your Journal Veterinary Journal.
Altmetrics Gail McMillan, Director CDRS Services.
THOMSON SCIENTIFIC Patricia Brennan Thomson Scientific January 10, 2008.
Web of Science® Krzysztof Szymanski October 13, 2010.
Altmetrics for large, multidisciplinary research groups Alexandra Jobmann (IPN) & Isabella Peters (ZBW) Anita Eppelin (ZB MED), Christian Hoffmann (Universität.
Interactive Science Publishing: A Joint OSA-NLM Project Michael J. Ackerman National Library of Medicine.
Thomson Reuters ISI (Information Sciences Institute) Azam Raoofi, Head of Indexing & Education Departments, Kowsar Editorial Meeting, Sep 19 th 2013.
How to use Bibliometrics in your Career The MyRI Project Team.
Bibliometrics toolkit Website: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Further info: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Scopus Scopus was launched by Elsevier in.
Open Access - an introduction, Aleppo, December Open Access – an introduction Ian Johnson.
The Web of Science, Bibliometrics and Scholarly Communication
Database collection evaluation An application of evaluative methods S519.
RESEARCH – DOING AND ANALYSING Gavin Coney Thomson Reuters May 2009.
ESSENTIAL SCIENCE INDICATORS (ESI) James Cook University Celebrating Research 9 OCTOBER 2009 Steven Werkheiser Manager, Customer Education & Training ANZ.
1 Making a Grope for an Understanding of Taiwan’s Scientific Performance through the Use of Quantified Indicators Prof. Dr. Hsien-Chun Meng Science and.
JOURNAL CITATION REPORTS James Cook University Celebrating Research 9 OCTOBER 2009 Steven Werkheiser Manager, Customer Education & Training ANZ Thomson.
Focusing on quality International Research Assessment Exercise 2008.
Tools for building literature review and measuring research impact Jan. 27, 2016 Mei Ling Lo Math/Computer Science Librarian
Altmetrics Sara Mitha and Avenal Finlayson. Altmetrics is … O Hot topic O Big buzz word O Trending.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
The Thomson Reuters Journal Selection Policy – Building Great Journals - Adding Value to Web of Science Maintaining and Growing Web of Science Regional.
الله الرحيم بسم الرحمن علیرضا صراف شیرازی دانشیار و مدیر گروه دندانپزشکی کودکان رئیس کتابخانه مرکزی و مرکز علم سنجی دانشگاه علوم پزشکی مشهد.
Measuring Your Research Impact Citation and Altmetrics Tools University Libraries Search Savvy Seminar Series April 9 & 10, 2014 Prof. Amanda Izenstark.
Bibliometrics at the University of Glasgow Susan Ashworth.
INTRODUCTION TO BIBLIOMETRICS 1. History Terminology Uses 2.
Tools for Effective Evaluation of Science InCites David Horky Country Manager – Central and Eastern Europe
Demonstrating Scholarly Impact: Metrics, Tools and Trends
Measuring Scholarly and Public Impact: Let’s Talk Metrics
Bibliometrics toolkit: Thomson Reuters products
Beyond the Words: Predicting User Personality from Heterogeneous Information Presenter: Benyi Gong.
Petr Knoth & Nancy Pontika CORE The Open
Altmetrics: Analysis of Library and Information Science (LIS) Research in the Social Media Ifeanyi J. Ezema (Ph.D) Paper Presented at the 1st International.
Advanced Scientometrics Workshop
IL Step 3: Using Bibliographic Databases
Increasing Research Impact Through Open Access Publishing
Measuring Your Research Impact
Presentation transcript:

Assessing the impact of software on science through bootstrapped learning in full texts Erjia Yan Metadata Mondays February 1, 2016

Research areas Research instruments My research | 2 network-basedtext-based Data were analyzed using SPSS 15.0 software package. Left patterns: analyze use <>, be analyze use <>, datum be analyze use <>; Right pattern: <> software package; Middle pattern: use <> software.

Scholarly data | 3 bibliographic datafull texts

Why full texts? –richer contents –more fine-grained analyses –variety means of analyses –use in conjecture with bibliographic data –increased accessibility Full texts | 4 free, OA possibly free for academic use

Today’s agenda Agenda | 5 Motivation Bootstrapping Future work Application Software impact

| 6 Motivation Bootstrapping Future work Application Software impact

Motivation Publications have been long seen as the end research outputs. This notion has become more transient in recent years as digital outputs such as software and data can be the end products in many contemporary scientific inquiries. Motivations | 7 publicationssoftwaredata ?

Needs assessment –initiate impact evaluation for digital outputs, i.e., software and data, to expand the scientific reward system; –develop tools that can identify and characterize software reference contexts in large and heterogeneous full-text datasets; and –design hybrid metrics to systematically capture the impact of software in a variety of scholarly communication channels. Needs assessment | 8

Objectives We intend to examine: –the method to extract software entities from full-text corpora; –the popularity of pieces of software in science; –software use and citation impact; and –disciplinary characteristics of software use and citation practices Objectives | 9

| 10 Motivation Bootstrapping Future work Application Software impact

Methods Bootstrapping: recursively, using seed terms to learn the context and then using the context to learn more terms and these terms become new seeds terms. Methods | 11 requires much less hand-labeled data (i.e., training data) than supervised machine learning methods contexts of the terms to be extracted need to be distinguishable, i.e., terms of interest and other terms should pertain to different contexts

Flow chart | 12

FeatureScore ~ f( UppercaseLetter, VersionNumber, LeftTrigger, RightTrigger ) –find these features within next or previous 5 words –a positive trigger word list (six in total; i.e., package, program, software, tool, toolbox, and toolkit) –a negative trigger word list (51 in total; e.g., microscope, scanner, and spectrometer) PatternScore ~ f( #PositiveEntities, #NegativeEntities, FeatureScore ) –PositiveEntities and NegativeEntities calculated based on EntityScore EntityScore ~ f( FeatureScore, PatternScore ) Scoring | 13

| 14 Motivation Bootstrapping Future work Application Software impact

Tools | 15 Our program: Stanford Pattern-based Information Extraction and Diagnostics (SPIED):

Performance PLoS ONE in 2014: 9,571 full-text papers, 523,974 sentences and 11,633,395 words Performance | 16 customization is vital to ensure performance our system outperformed others primarily because the adopted four features: UppercaseLetter, VersionNumber, LeftTrigger, and RightTrigger

Patterns | 17 RankPatternExtracted software 1 use <> software88 2 perform use <>51 3 be perform use <>51 4 analysis be perform use <>35 5 analyze use <>22 6 analysis be perform with <>14 7 <> statistical software11 8 <> software be use8 9 quantify use <>8 10 be calculate use <>8 Top scored patterns

| 18 Motivation Bootstrapping Future work Application Software impact

Most used software Top software | 19 RankSoftwareMentionsFree?RankSoftwareMentionsFree? 1SPSS1868No11SPM253Yes 2ImageJ1065Yes12Photoshop241No 3SAS611No13ClustalW164Yes 4Stata578No14JMP157No 5MATLAB452No15MUSCLE155Yes 6BLAST403Yes16SigmaPlot150No 7EXCEL391No17MASCOT144No 8MEGA366Yes18Image ProPlus143No 9FlowJo268No19Ingenuity IPA139No 10PRISM262Yes20STRUCTURE133Yes

Most cited software Top software | 20 RankSoftwareCitationsFree?RankSoftwareCitationsFree? 1MEGA240Yes11Bowtie63Yes 2ImageJ121Yes12Stata57No 3BLAST108Yes13SPM57Yes 4MUSCLE97Yes14Blast2GO57Yes 5ClustalW94Yes15BEAST54Yes 6ARLEQUIN81Yes16PHYML54Yes 7MrBayes75Yes17SAS53No 8BioEdit69Yes18Clustal X53Yes 9STRUCTURE66Yes19PLINK51Yes 10MATLAB66No20BWA49Yes

Mentions vs. citations Mentions vs, citations | 21

Popularity of software | 22 7,637 articles (79.79%) mentioned software 2,342 unique software entities with 25,997 mentions and 7,405 citations top 20% most frequently mentioned software entities attracted more than 80% of mentions 40% of software entities did not receive any citation and almost 50% received less than three citations (obliteration by incorporation?) Free software received more citations than commercial software

Proportions of papers that used software Proportion | 23

Mentions and citations | 24 high mention and high citation ratio: Agriculture, Biology, Ecology and environmental sciences, and Computer and information sciences high mention and low citation ratio: Chemistry and Research and analysis methods; low mention and high citation ratio: Physics and Earth sciences; and low mention and low citation ratio: Medicine and health sciences, Engineering, Mathematics, and Social sciences.

Extensive reach Reach | 25 ArcGIS, ClustalW, Cluster X, ESTIMATES, ImageJ, JMP, MATLAB, Microsoft Access, Microsoft Excel, SAM, SAS, SPSS

Top 5 most mentioned software | 26 Discipline AgricultureSPSSMEGABLASTJMPSAS BiologySPSSImageJSASMATLABBLAST ChemistrySPSSSigmaPlotImageJSASAMBER Computer scienceMATLABSPMPfamSPSSPSI-BLAST Earth sciencesSPSSArcGISSASMothurMEGA EcologySPSSArcGISVEGANQIIMEMEGA EngineeringMATLABSPSSImageJSPMSAS MathematicsSASSPMSPSSMATLABStata Medicine sciencesSPSSImageJStataSASMATLAB PhysicsSPSSMATLABStataImageJSAS Research methodsSPSSImageJStataMATLABSAS

Top 5 most cited software | 27 Discipline12345 AgricultureMEGABLASTMothurSTRUCTUREPHYLIP BiologyMEGAImageJBLASTMUSCLEClustalW ChemistryModellerAMBERRefmacMOEPHENIX Computer sciencePSI-BLASTMATLABSPMWekaGROMACS Earth sciencesMEGAMaxEntArcGISMothurVEGAN EcologyVEGANMEGAARLEQUINMothurMaxEnt EngineeringSPMImageJMATLABFSLSVS MathematicsSPMSTARPSI-BLASTTACEMBOSS Medicine sciencesImageJStataMEGASPMPLINK PhysicsVMDMATLABAMBERRefmacImageJ Research methodsImageJStataMATLABBWATopHat

| 28 Motivation Bootstrapping Future work Application Software impact

–software altmetrics: number of mentions in social media, number of downloads, etc. –full-spectrum software impact metrics in both formal and informal scholarly communication channels –attribution and scientific rewards: different roles that software developers fulfill Future work: Software impact | 29

Data impact –design case studies to examine their provenance (e.g., w/DOI, w/URI, institutional archives, journal- specific archives, and private archives) –cross-reference with citation data from Data Citation Index by Thomson Reuters –triangulate data impact by using usage, mention, and citation statistics –involve time as another dimension to conduct trend analyses and predictions Future work: Data impact | 30

Deliverables –Pan, X., Yan, E., Wang, Q., & Hua, W. (2015). Assessing the impact of software on science: A bootstrapped learning of software entities in full- text papers. Journal of Informetrics, 9(4), –Yan, E., & Pan, X. (2015). A bootstrapping method to assess software impact in full-text papers (Poster). In Proceedings of the 15th International Conference on Scientometrics and Informetrics (ISSI 2015), June 29-July 4, 2015, Istanbul, Turkey. –Our program: Research supported by Deliverables | 31

Assessing the impact of software on science through bootstrapped learning in full texts Erjia Yan Metadata Mondays February 1, 2016