Download presentation
Presentation is loading. Please wait.
Published byAllison Armstrong Modified over 8 years ago
1
The OpenAIRE Infrastructure On Measuring Research Impact - The EGI use-case - Paolo Manghi paolo.manghi@isti.cnr.it Natalia Manola natalia@di.uoa.gr
2
Outline The What and How of OpenAIRE Supporting research communities Contexts, categories and concepts User input Results and analytics Looking Ahead Developing the Open Sience Commons - Sept 25, Amsterdam2
3
OpenAIRE in a nutshell European data infrastructure for scholarly communication Facilitating discovery of research outcome across disciplines Promotes & implements Open Access Interlinks and contextualizes research outcomes Integrates publication, data, software repositories, CRIS systems Monitoring research outputs and measuring research impact Open Access policy evaluation Funding schemes: return of investment through impact Research initiatives: research impact Providing both human and technical infrastructure to make this possible! Developing the Open Sience Commons - Sept 25, Amsterdam3
4
Visualize - Manage Enhanced Publications Get support (NOADs) Linked Content Statistics +++ Search & Browse Curate & collaborate Deposit Publications & data Research impact Citations, usage statistics +++ APIs Data repositories Data Journals Metadata on data Publication repositories Institutional & Thematic Open Access Journals Usage data Metadata And pdfs 8,700,000 OA publications 460 validated repositories National funding EC funding Guidelines for use services Institutional CRIS Systems CERN/OpenAIRE “catch-all” repository Guidelines for data interoperability Services for Project Coordinators, Project Funders, Funders Infrastructure coordination Infrastructure: data sources Deposits in institutional or thematic repository Publishes in OA journal Publishes data Fully compliant? Mine for project Mine for other infoDe-duplicateLinkEnrich Organizations Projects Authors Datasets Publications Data Providers
5
8.7 mi publications 7 mi authors 460+ data providers 90K publications linked to projects 2 funders 700 datasets linked to publications 33K organizations 2731 publications linked to EGI Added Value: Integrated Scientific Information System Organizations Projects Authors Datasets Publications Data Providers Developing the Open Sience Commons - Sept 25, Amsterdam Research Communities 5
6
BEHIND THE SCENES Developing the Open Sience Commons - Sept 25, Amsterdam6
7
Internal data flow Developing the Open Sience Commons - Sept 25, Amsterdam 7 Data source import End-user claims Native Information Space De-duplication Public Information Space Data Inference Human Data Curation Enriched Information Space OpenAIRE Portal: Discovery & Impact measure Harvesting End-user Inferring Off-line
8
RESEARCH ANALYTICS Developing the Open Sience Commons - Sept 25, Amsterdam8
9
66K pubs – 7.5K projects FP7 FP7 timeline - total FP7 breakdowns Monitoring OA policy Research Output Measures
10
Classification Text mining - Supervised techniques Developing the Open Sience Commons - Sept 25, Amsterdam10
11
Beyond the Obvious Text mining – Unsupervised techniques (topic modeling) Developing the Open Sience Commons - Sept 25, Amsterdam Example 1 FP7 programmes connected through scientific pubs Research Trends Structural effects 11 Interactive graphs Providing overview
12
Developing the Open Sience Commons - Sept 25, Amsterdam12 Example 2 How FP7 programme areas are related
13
EGI & OPENAIRE 1-year pilot ended in May 2014 Official service release: Oct 2014 @www.openaire.eu Developing the Open Sience Commons - Sept 25, Amsterdam13
14
Supporting communities Enriched OpenAIRE data model Context (e.g. “EGI”) Category (“Virtual Organizations”) Concept (“alice”) Text mining algorithms tailored to community needs, integrated into OpenAIRE text mining framework Developing the Open Sience Commons - Sept 25, Amsterdam14
15
What OpenAIRE does Extract full text from publications if structured, use “funding” & “acknowledgements” fields Scan text for matches against any of the EGI organization names provided For each match, search surrounding context for general terms & suggested acknowledgements (using word pairs) to add a confidence value to the match and eliminate false matches For EC projects, we search not only for the project acronym (e.g. EGI-InSPIRE) but also for the grant ID (261323) Behind the scenes Developing the Open Sience Commons - Sept 25, Amsterdam15
16
How to identify EGI Identify publications associated to EGI in terms of Associated to EGI projects Publication “enabledBy EGI:XYZ” Publication ”supportedBy EGI:XYZ” Associated to a certain Virtual Organisation (VO) or National GRID Infrastructures (NGI) Publication "used EGI" Publication "used NGI:XYZ" Publication ”producedBy VO:XYZ” Associated to a certain EGI scientific discipline Publication "related to EGI Scientific Discipline:XYZ” Text mining on pdfs from repositories, publisher metadata Developing the Open Sience Commons - Sept 25, Amsterdam16
17
What EGI community should do Use proper acknowledgement in the publication STEP 1 Developing the Open Sience Commons - Sept 25, Amsterdam17 Organisation Name TypeGrant ID Suggested Acknowledgement WeNMREC Project 261572 "The WeNMR project (European FP7 e-Infrastructure grant, contract no. 261572, www.wenmr.eu), supported by the European Grid Initiative (EGI) through the national GRID Initiatives of Belgium, France, Italy, Germany, the Netherlands (via the Dutch BiG Grid project), Portugal, Spain, UK, South Africa, Taiwan and the Latin America GRID infrastructure via the Gisela project is acknowledged for the use of web portals, computing and storage facilities." and the following article describing the WeNMR portals should be cited: Wassenaar et al. (2012). WeNMR: Structural Biology on the Grid.J. Grid. Comp., 10:743-767. EGI-InSPIREEC Project 261323 The authors acknowledge the use of resources provided by the European Grid Infrastructure. For more information, please reference the EGI-InSPIRE paper (http://go.egi.eu/pdnon).http://go.egi.eu/pdnon ALICEVOn/a The ALICE collaboration gratefully acknowledges the resources and support provided by all Grid centres and the Worldwide LHC Computing Grid (WLCG) collaboration. LHCbVOn/a The Tier1 computing centres are supported by IN2P3 (France), KIT and BMBF (Germany), INFN (Italy), NWO and SURF (The Netherlands), PIC (Spain), GridPP (United Kingdom). We are thankful for the computing resources put at our disposal by Yandex LLC (Russia), as well as to the communities behind the multiple open source software packages that we depend on. NGI:PTNGIn/a This work makes use of results produced with the support of the Portuguese National Grid Initiative. More information in https://wiki.ncg.ingrid.pthttps://wiki.ncg.ingrid.pt
18
What EGI community should do Option 1: follow the OpenAIRE guides Publish in OA journal or deposit in OA repository – preferably the OpenAIRE compatible ones for OpenAIRE 2.0+ guidelines (i.e., link to funding) Option 2: use the OpenAIRE portal “claiming” service to associate any publication (within OpenAIRE or not) to EGI results to additional EGI information: VO, classification, relationship STEP 2 Developing the Open Sience Commons - Sept 25, Amsterdam18
19
User Input Developing the Open Sience Commons - Sept 25, Amsterdam19
20
Developing the Open Sience Commons - Sept 25, Amsterdam20
21
What does it look like Developing the Open Sience Commons - Sept 25, Amsterdam21
22
Aggregated statistics Developing the Open Sience Commons - Sept 25, Amsterdam22
23
Lessons learned & Best practices Mandates on how to write acknowledgements are crucial but often missing Try to collect as much information that may help with the mining beforehand. Even information that you may not think that it'll help, it may prove useful in the end. Clean and normalize your input data (character encoding, stop-word removal, character case, special characters, etc.). Design your data mining methods to be very tolerant. In our case, suggested acknowledgements never appeared exactly as given in the input texts. Do manual curation of the results to tune your data mining methods. Yes it is very labor intensive, but without it you'll be blind to your mistakes. Design and implement your data processing methods to work in a streamed fashion and to be performant. Streamed design solves the “data bigger than memory” problem, performance design solves the “having to wait one week for results” problem. Developing the Open Sience Commons - Sept 25, Amsterdam23
24
Roadmap Release Results of inference visible from the portal Claim user interfaces available from the portal Plan Production release – ready by 1 st of October 2013 Add more communities (e.g., FET) Developing the Open Sience Commons - Sept 25, Amsterdam24
25
www.openaire.eu @openaire_eu facebook.com/groups/openaire linkedin.com/groups/OpenAIRE-3893548 Thank you! Looking forward to your questions and feedback paolo.manghi@isti.cnr.it Developing the Open Sience Commons - Sept 25, Amsterdam25
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.