XSEDE TAS Scientific Impact and FutureGrid Lessons Gregor von Laszewski (IU), Fugang Wang (IU), Geoffrey C. Fox Steve Gallo (UB) &

Slides:



Advertisements
Similar presentations
Academic Search Engines
Advertisements

FOR PROFESSIONAL OR ACADEMIC PURPOSES September 2007 L. Codina. UPF Interdisciplinary CSIM Master Online Searching 1.
Open repositories: value added services The Socionet example Sergey Parinov, CEMI RAS and euroCRIS.
1 of 16 Information Access The External Information Providers © FAO 2005 IMARK Investing in Information for Development Information Access The External.
Usage statistics in context - panel discussion on understanding usage, measuring success Peter Shepherd Project Director COUNTER AAP/PSP 9 February 2005.
ASYCUDA Overview … a summary of the objectives of ASYCUDA implementation projects and features of the software for the Customs computer system.
California Digital Library NISO Standardized Usage Statistics Harvesting Initiative (SUSHI): Z39.93 Chan Li California Digital Library ALA Midwinter 2009.
Usage Statistics in Context: related standards and tools Oliver Pesch Chief Strategist, E-Resources EBSCO Information Services Usage Statistics and Publishers:
The Open Access Research Web Publication-archiving, Data-archiving and Publications as Scientometric Data Metrics and Mandates Stevan Harnad Canada Research.
Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
Collaborative Open Access Projects: Collaborative promotion of research outputs Iryna Kuchma, eIFL Open Access program manager, eIFL.net Presented at Open.
CHORUS Implementation Webinar May 16, 2014 Mark Martin Assistant Director, Office of Scientific and Technical Information Office of Science U.S. Department.
“Facts are stubborn things, but statistics are pliable.”
Bibliometrics: Measuring the Impact of Your Publications Jane Buggle Deputy Librarian.
Scopus. Agenda Scopus Introduction Online Demonstration Personal Profile Set-up Research Evaluation Tools -Author Identifier, Find Unmatched Authors,
Asha Balakrishnan Vanessa Peña Bhavya Lal Task Leader November 5, 2011
Evaluating (Scientific) Knowledge for people, documents, organizations/activities/communities ICiS Workshop: Integrating, Representing and Reasoning over.
1 Scopus Update 15 Th Pan-Hellenic Academic Libraries Conference, November 3rd,2006 Patras, Greece Eduardo Ramos
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Jefferson Ridgeway 2, Ifeanyi Rowland Onyenweaku 3, Gregor von Laszewski 1*, Fugang Wang 1 1* Indiana University, Bloomington, IN 47408, U.S.A.,
Using Journal Citation Reports The MyRI Project Team.
Web of Science Pros Excellent depth of coverage in the full product (from 1900-present for some journals) A large number of the records are enhanced with.
Electronic or Print: Are Scholarly Journals Still Important? Carol Tenopir, University of Tennessee, USA.
How to promote your publications via live CV? Stop Searching, Start Discovering.
Annual SERC Research Review - Student Presentation, October 5-6, Extending Model Based System Engineering to Utilize 3D Virtual Environments Peter.
Guillaume Rivalle APRIL 2014 MEASURE YOUR RESEARCH PERFORMANCE WITH INCITES.
SCOPUS AND SCIVAL EVALUATION AND PROMOTION OF UKRAINIAN RESEARCH RESULTS PIOTR GOŁKIEWICZ PRODUCT SALES MANAGER, CENTRAL AND EASTERN EUROPE KIEV, 31 JANUARY.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Social Networking Techniques for Ranking Scientific Publications (i.e. Conferences & journals) and Research Scholars.
Bibliometrics toolkit: ISI products Website: Last edited: 11 Mar 2011 Thomson Reuters ISI product set is the market leader for.
Towards a Javascript CoG Kit Gregor von Laszewski Fugang Wang Marlon Pierce Gerald Guo
X-Informatics Web Search; Text Mining B 2013 Geoffrey Fox Associate Dean for.
A Web/Grid Services Approach for a Virtual Research Environment Implementation Y. W. Sim, C. Wang, L. A. Carr, H. C. Davies, L. Gilbert, S. Grange, D.
Thomson Scientific October 2006 ISI Web of Knowledge Autumn updates.
Impact factorcillin®: hype or hope for treatment of academititis? Acknowledgement Seglen O Per (BMJ 1997; 134:497)
SCOPUS AND SCIVAL EVALUATION AND PROMOTION OF UKRAINIAN RESEARCH RESULTS PIOTR GOŁKIEWICZ PRODUCT SALES MANAGER, CENTRAL AND EASTERN EUROPE LVIV, 11 SEPTEMBER.
THOMSON SCIENTIFIC Patricia Brennan Thomson Scientific January 10, 2008.
INTRODUCTION TO RESEARCHERID. Agenda What is ResearcherID? Why do you need ResearcherID? Search ResearcherID Access ResearcherID and create a profile.
API, Interoperability, etc.  Geoffrey Fox  Kathy Benninger  Zongming Fei  Cas De’Angelo  Orran Krieger*
Access to electronic scientific information: policies, strategies and programmes The Brazilian experience Elenara Chaves Edler de Almeida Brazilian Federal.
1 SciELO: lessons for an open access movement for less developed countries Anna María Prat CONICYT-Chile SciELO Chile.
Search Update April 1-3, 2009 Joshua Ganderson Laura Baalman.
October 21, 2015 XSEDE Technology Insertion Service Identifying and Evaluating the Next Generation of Cyberinfrastructure Software for Science Tim Cockerill.
Bibliometrics for your CV Web of Science Google Scholar & PoP Scopus Bibliometric measurements can be used to assess the output and impact of an individual’s.
Bibliometrics toolkit Website: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Further info: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Scopus Scopus was launched by Elsevier in.
How Scientists Use Journals: Electronic and Print Carol Tenopir Donald W. King
RESEARCH – DOING AND ANALYSING Gavin Coney Thomson Reuters May 2009.
Department of Information Science and Applications Hsien-Jung Wu 、 Shih-Chieh Huang Asia University, Taiwan An Intelligent E-learning system for Improving.
Discussion Issues for IIB Presented by Steve Browdy.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
1 Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not needed. XDMoD Financial Analytics Craig Stewart ORCID.
ICAT Status Alistair Mills Project Manager Scientific Computing Department.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
INTRODUCTION TO BIBLIOMETRICS 1. History Terminology Uses 2.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
SciENcv: a Federal biosketch tool NIH Regional Meeting October 2016 Neil Thakur, PhD Office of Extramural Research Bart Trawick, PhD National Center for.
Data Mining for Expertise: Using Scopus to Create Lists of Experts for U.S. Department of Education Discretionary Grant Programs Good afternoon, my name.
WIS/COLLNET’2016 Nancy, France
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Digital Measures Replacement
An ecosystem of contributions
Introduction of KNS55 Platform
Digital Science Center III
Mashup Service Recommendation based on User Interest and Service Network Buqing Cao ICWS2013, IJWSR.
ROLE OF «electronic virtual enhanced research-engaged student teams» WEB PORTAL IN SOLUTION OF PROBLEM OF COLLABORATION INTERNATIONAL TEAMS INSIDE ONE.
Knowledge Sharing Mechanism in Social Networking for Learning
EERQI Innovative Indicators and Test Results
Citation databases and social networks for researchers: measuring research impact and disseminating results - exercise Elisavet Koutzamani
Presentation transcript:

XSEDE TAS Scientific Impact and FutureGrid Lessons Gregor von Laszewski (IU), Fugang Wang (IU), Geoffrey C. Fox Steve Gallo (UB) & Tom Furlani (UB) Presentation: Improving the Link Between Publications & User Facilities, ORNL, Thursday, Jan , more than 12 participants Teleconference, Organizer Terry Jones, ORNL

Agenda Objective Approach How did we obtain data The metrics derived Software system design and implementation Results Future plan and discussions

Objective Provide information to the funding agency and the XSEDE management about scientific impact of research conducted with XSEDE resources Assist in collecting the information semi-automatically. Provide information to the funding agency and the XSEDE management about scientific impact of research conducted with XSEDE resources Assist in collecting the information semi-automatically. It seems objective may be similar for DOE … Provide information to the funding agency and the DOE management about scientific impact of research conducted with DOE resources – Differences: We can federate based on publication requirements between DOE Labs, preprint databases Extends not only to publication but to possible datasets (NeXus, …) Resources are not just super computers, it could be a beamline, experiment setup, but also a data collection.

TAS Objective - Measurement Measure the scientific impact of XSEDE as a single entity – How many publications produced by XSEDE users/projects; – How many citations to those publications received; – Other metrics Measure how the impact metrics of individual users, projects, field of science, resources, etc. compare to each other – When evaluating a proposal request, what is the criteria to judge whether the proposal is potentially leading to good research and broader impact, and how to get metrics to back up this? – When correlating the impact metrics to the resources allocated (or consumed), how does one project or fos compare to the peers?

FutureGrid Objective - Collection Assist in collecting results as part of the user management. Simplify the input of publication data. Allow a wide variety of input formats. Problem: – Users have lots of other things to do and avoid reporting. – Users affiliation may change and reports are incomplete.

Approach Get the relevant publication and citation data – All publications authored by XSEDE users Google; Microsoft Academic Search; ISI; NSF award search data – Publications that are identified as related to XSEDE (as a result of using XSEDE resources) User uploaded publications via XSEDE portal Using the publication and citation data to derive metrics for scientific output impact

Data Acquisition Publication data: Automatic approach o Mining the NSF award search data provided by NSF; o Utilizing services from Google Scholar, Microsoft Academic Search, etc.; o Mashup data from different sources; Requiring user input o FG portal has pioneered a means for users to upload their publication data o XD portal now also provides a means for users to upload their publication data. However currently the data gathered is very limited. o We offer service interface to the XD portal exposing the publication data we obtained so users could have an easier way to populate and confirm the publication data (XSEDE portal team is developing the UI to integrate this service). o Users provide their public profile id in a 3rd party online biblio management system like Google Scholar, and we then do the automatic retrieval; Citation data: From Google Scholar, From ISI Web of Science.

Metrics Intuitive Metrics: Number of publications, Number of citations H-index – Derived based on productivity (quantity of papers published) and impact (based on citation) – h as the number of papers with citation number higher or equal to h – Proposed by J. E. Hirsch on – H-index(m) to compare veteran researchers with junior researchers G-index – Similar to h-index but it uses average citations so you got rewarded if you have a paper with very high citations – Proposed by Leo Egghe on Other Metrics – i10-index (number of publications with at least 10 citations) Does a researcher keep up with the good research he/she usually does more recently – Metrics from only recent publications (last 5 years)

Software Design and Implementation Pluggable data sources via mining databases and/or accessing 3 rd party service APIs Mashup database providing common interface to collaborating systems like XDMOD Service layer and web presentation The core system code base is in python. – Would allow integration with LDAP, DOE certs, OpenID, … Uses REST framework for the service interface and Web GUI MySQL is the currently adopted database solution but we will be using NoSQL alternatives where appropriate.

Results – Impact in general Obtained 122k publication entries for all XSEDE users – from the Nov 2012 NSF award search data Citation data from Google Scholar and metrics based on that available for all XD PIs active (based on XD resource usage) in 2012 (1469 in total). – This accounts for 27.8% of all publications collected, or ~34k out of ~122k. As an alternative, finished citation count data retrieval from ISI Web of Science for all the publications. Data Source Disclaimer: The NSF award search data through October 2012 The citation data were obtained from Google Scholar. The user information were obtained from XDcDB. The usage data were obtained from XDMOD

Results – Impact XD related only XD users: 830 Organizations: 212 XSEDE projects: 290 Number of publications: 757 Total citations received from these publications: (User reported publications via XD portal, as of Dec 16, 2013 )

Results – Impact metrics vs XD allocations Limited correlation observed between allocations vs metrics (npubs, ncited, hindex) on individual project level Correlation on Field of Science (FOS) – R 2 : 0.55 – Dot/circle size proportional to number of projects in that FOS (size) – It suggests that FOS size contributes to the linear relationship – Allocation distribution is lognormal alike when using average per project within each FOS – osvsalloc osvsalloc Dataset to small?

Achievements Constructed a UNIQUE mashup database containing the consolidated data. – Mined NSF award search data and retrieved publications for all XD users (122k). – Fetching citation data for some publications via Google Scholar (~30% done). – Fetched citation data for all publications via ISI Web of Science. – Fetched publication data from XDcDB (757 entries as of Dec ) Defined and calculated metrics (# of pubs; # of citations; h-index; and g-index; etc.) for a portion of users as a proof of concept – Impact in general – Completed for all PIs who had active usage in – XD Related – Based on all currently available user uploaded publications (757 of them as of Dec 2013) Data is presented via the REST service framework. – – planned to be integrated within XDMOD framework Conducted correlation analyses of the metrics vs. the allocation for users, projects, and Field of Science.

Ongoing work Visualization of the complex connections – Users/authors; projects; fos; etc. Insight when correlating our collected data to other data sources (e.g., some data from our collaborator at Clemson) Name ambiguity as a challenge when trying to utilize individual level general impact data – Social networks, …

Can we adapt it for DOE? Yes. REST service – Independent UI – Simple UI provided as prototype by IU User Management – DOE certs, openID, registration process of users at beamlines We could support more than Publications – Data sets, Experiments, NeXus, … – Full text search required … Integration with DOE publication departments at the Labs

Screenshoots

Cloud Metric Runtime data What do users/projects do on current system Will be coupled with Impact metrics to give system staff hints about users