SPIRES and INSPIRE Travis Brooks SLAC National Accelerator Laboratory INSPIRE Collaboration PPA Computing 1 July 2010.

Slides:



Advertisements
Similar presentations
INSPIRE A new information system for High-Energy Physics
Advertisements

Partnering with Faculty / researchers to Enhance Scholarly Communication Caroline Mutwiri.
50 Years of Experience in Making Grey Literature Available Matching the Expectations of the Particle Physics Community Carmen ODell.
SCOAP 3 a new publishing model for High-Energy Physics Anne Gentil-Beccot, Salvatore Mele, Jens Vigen CERN European Organization for Nuclear Research scoap3.org.
SCOAP 3 Forum ACRL Seattle Sponsoring Consortium for Open Access Publishing in Particle Physics Salvatore Mele CERN European Organization for Nuclear.
S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
Rolf-Dieter Heuer DESY - Research Director HEP CERN - Director-General Elect APE2008 Berlin - January
PubMed Central ANCHASL Spring Meeting April 1, 2005 Robert James Associate Director of Public Services Duke University.
Electronic publishing: issues and future trends Anne Bell.
1 2 HEP aims to understand how our Universe works: -Experimental HEP : builds the largest scientific instruments ever to reach.
Maximizing the benefit of research information in Particle Physics *** A user-driven story Anne Gentil-Beccot, CERN. EuroCris. 11 May 2010.
Citing and reading behaviours in High Energy Physics *** Learning from OA bibliometrics? Anne Gentil-Beccot, CERN. Uppsala. 17 November 2010.
Realizing the Dream of a Global Digital Library in High-Energy Physics Annette Holtkamp, Salvatore Mele, Tibor Simko, Tim Smith CERN, Geneva DML 2010 –
Information-Seeking Behavior in the High-Energy Physics Community Tamar Sadeh School of Informatics, City University, London Ex Libris HCI conference,
Open Access in High-Energy Physics and the SCOAP3 project Salvatore Mele CERN European Organization for Nuclear Research scoap3.org (Publishing in HEP)
JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot CERN Library GS/SIS The Library behind the scene Opportunities for Scientific.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
What have we done? What are we doing? What can we do? Travis Brooks (SLAC) Zaven Akopov (DESY)
11/18/02Travis Brooks-ASIST The Unpublishing of High Energy Physics Travis Brooks SPIRES Scientific Databases Manager Stanford Linear Accelerator.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Changing the Service Paradigm: the HEP- SPIRES Evolution Patricia A. Kreitz and Abraham Wheeler Stanford Linear Accelerator Center Library June 25, 2006.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
National Aeronautics and Space Administration Implementing DSpace at NASA Langley Research Center 1 Greta Lowe Librarian NASA Langley Research Center
Introduction to Information Retrieval Got a question concerning literature? Ask! Marion Bierhahn (4630) Where is the library? Bldg:1d.
The CERN Scientific Information Service presented in a few minutes Open access to literature and data Jens Vigen 10 October 2008 PDG Collaboration Meeting,
Information systems for HEP: INSPIRE, arXiv and more Annette Holtkamp CERN ASP 2012 Kumasi, Ghana, Aug 3, 2012.
Digital Library Architecture and Technology
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
SPIRES and PDG Travis Brooks Pat Kreitz SLAC-SPIRES.
INSPIRE Travis Brooks (SLAC) Tibor Simko (CERN). SPIRES’ History Index to HEP literature for 35 years Via terminal login Via Via web (1st U.S. Website/1st.
European Organization for Nuclear Research Organisation Européenne pour la Recherche Nucléaire CDS Invenio CERN’s open source digital library information.
1 Chuck Koscher, CrossRef New Developments Relating to Linking Metadata Metadata Practices on the Cutting Edge May 20, 2004 Chuck Koscher Technology Director,
XXII International Symposium on Nuclear Electronics & Computing NEC’09 TOWARDS OPEN ACCESS PUBLISHING AT JINR I.A. Filozova, V.V. Korenkov, G. Musulmanbekov.
CERN – IT Department CH-1211 Genève 23 Switzerland t CERN Open Source Collaborative tools: Digital Library Software Tim Smith CERN/IT.
Collaborative Approach to Open Access: Experience from Bioline International Leslie Chan Associate Director Bioline International University of Toronto.
E-Infrastructures for scholarly communication A first step to OA. An indispensable step for e-Science The case of High-Energy Physics Jens Vigen – Head.
UC3 Standards and Best Practices for Datasets and Other Supplemental Journal Article Materials UC3 Stephen Abrams Patricia Cruse John Kunze.
European Organization for Nuclear Research Organisation Européenne pour la Recherche Nucléaire Digital Library and Conferencing update HEPiX at Cornell.
Microsoft Academic Search Search | Explore | Discover Alex D. Wade Director - Scholarly Communication.
Tullio Basaglia, CERN GS-SI CERN Scientific Information Service The context Presentation of the Service How do they search and use information? The project.
Jukka Klem & Salvatore Mele | D4Science-II Kick-Off Meeting | Pisa 15 Oct 2009.
A Tony Thomas-inspired guide to INSPIRE The evolution from SPIRES to INSPIRE and what it means for you Tony Thomas 60th Birthday Fest Feb Heath.
Journal candidates for conversion to OA JournalPublisherImpact Factor ArticlesHEP Articles HEP Fraction Phys.Rev.DAPS % Phys.Lett.BElsevier %
CERN - IT Department CH-1211 Genève 23 Switzerland t INSPIRE A Global Digital Library for HEP 14 th February 2011 Tim Smith on behalf of.
VIVO and Scholarly Repositories: Synergistic Opportunities.
T. Brooks OAI6 18/6/09 Giving researchers what they want SPIRES, High-energy physics and subject repositories Travis Brooks SLAC National Accelerator Laboratory.
Digital repositories and scientific communication challenge Radovan Vrana Department of Information Sciences, Faculty of Humanities and Social Sciences,
A Global Digital Library for High-Energy Physics Annette Holtkamp CERN-UNESCO School on Digital Libraries – Rabat, Nov 2010.
Open CERN The context High Energy Physics information landscape Open Access: 3 myths to be dispelled Policies Some stats Licenses What’s next:
1 The next generation HEP information system. HEP scientists love community services 2 What is the primary source of information for HEP scientists? From.
CNRS Documentation project : CCSD (Center for Direct Scientific Communication ) Htask meeting (Madrid) 06/12/ Lyon Daniel Charnay / Hélène Jamet.
FLN SIPB 5 Dec 071 SIPB report last meeting: 10 October 2007 ACCU representatives: K. Freudenreich, F.-L. Navarria
Using Content Presented by Karen Andrews Physical Sciences & Engineering Librarian, U.C. Davis Tuesday, September 13, :30-9:30 ASIDIC Fall 2005 Meeting.
Information Literacy & Open Access for Physics and Astronomy Graduate Students Jackie Werner, Science Librarian Georgia State University
Introducing orcid What, why and how
Open Research Data and Open Access publications: How do they sit in the Web of Science? Guillaume Rivalle, Manager, Europe solution specialists
Promoting and Preserving FIU Research and Scholarship
The High Energy Physics information platform: Introduction
Annette Holtkamp - AAHEP7
ACS 2016 Moving research forward with persistent identifiers
Tim Smith CERN Geneva, Switzerland
Compilation of SCOAP supported papers
Annette Holtkamp - 2nd HEP Information Summit, DESY, May 20-21, 2008
H.B. O'Connell HEP Info Summit DESY May 2008
Introduction to Information Retrieval
Introduction to Information Retrieval
Gwyn P. Williams and Kim Kindrew Pizza Seminar, September 18, 2013
Contact: Ya’ou JIANG: Tips for Researchers Contact: Ya’ou JIANG:
Building an open library without walls : Archiving of particle physics data and results for long-term access and use Joanne Yeomans CERN Scientific Information.
DESY Documentation: Status + projects
Presentation transcript:

SPIRES and INSPIRE Travis Brooks SLAC National Accelerator Laboratory INSPIRE Collaboration PPA Computing 1 July 2010

Infrastructure The basic facilities, services and installations needed for the functioning of a community or society wiktionary.org

Community ~30,000 researchers worldwide Questions like: What is the universe made of? What happened 3µs after the Big Bang? Distinction between Theory and Experiment

~15,000 HEP scientists smash stuff at the speed of light to produce new stuff

…and it works! LHC re-discovering known particles for starters. First needles in the haystack: one in a million.

Another 15,000 HEP researchers scratch their heads to make sense of all that stuff and then some more

Community Experiment Large, global collaborations ( > 2000 authors!) Big centers of research distributed globally SLAC, Fermilab, CERN, DESY, KEK Theory Small, but global collaborations (avg 3 authors) Self-contained papers

1960’s ’s HEP Lab Libraries store paper preprints Distributed via postal mail to major centers “Institute-pays” Open Access “SPIRES” catalogs (and distributes) preprints received at SLAC Centralized, community-driven model Users query SPIRES via terminal login accounts

1990’s CERN Invents WWW Users query SPIRES at SLAC via 1 st Web Site in the U.S.

1991: arXiv.org

Preprint Culture Connections/trust/expertise Infrastructure from Labs SPIRES, WWW, arXiv Researcher desire for rapid communication

2000’s 2007 survey of 2,000 physicists by CERN, DESY, Fermilab and SLAC. “What is your primary HEP Information Resource?” Gentil-Beccot et al, Information Resources in High-Energy Physics: Surveying the Present Landscape and Charting the Future Course. J.Am.Soc.Inf.Sci.60: ,2009 arXiv:

97% of published literature freely available on arXiv No Mandates – No Debates

Researchers want speed

SPIRES counts: citations to/from preprints/articles Citation peaks at publications Scientific discourse proceeds on discipline repository

Citation Advantage When an arXiv paper is published, it has already surpassed the citation count a non-arXiv paper will have after 2 years

Read Journals? Gentil-Beccot et al. arxiv: As many scientists as analyzed here go straight to arXiv so 80% arXiv users becomes 90% arXiv users arXiv 82% Publisher server 18% ∼ 30,000 clicks (choice between arXiv and journal)

Benefits to Researchers Centralized discipline-based repository with curated metadata/search Includes Peer reviewed literature Links to every known copy dois, urls, arXiv

Numbers 834,049 (as of Oct 15) 50,077 (During 2008) 82,719 (Oct 15 - typical) 178 (Last week - typical)

What is SPIRES? Deep, carefully curated metadata Authors, Affiliations, Citations, Keywords Carefully, intentionally limited to HEP Associated community information Conferences, Institutions, People, Jobs

Future of HEP Information Conversations on arXiv Noting, but not waiting for peer review. blog/wiki - like Rapid turnaround Freely accessible content Community driven Use technology to tighten this relationship further…with an existing community

2010 Past 40 Years: Information Infrastructure in response to user needs Community Needs in 2010: Preserve Quality Promote Access Archive Research Artifacts

2010 Past 40 Years: Information Infrastructure in response to user needs Community Needs: Preserve Quality - SCOAP3 Promote Access - INSPIRE Archive Research Artifacts - INSPIRE/HEPData

Quality via Peer Review Peer Review and other journal services currently funded by HEP libraries paying for access...

..to material that is freely available

HEP Open Access LHC scientists (8000 scientists from 54 countries): "We strongly […] support the principles of Open Access Publishing, which includes granting free access of our publications to all. Furthermore, we encourage all our members to publish papers in easily accessible journals, following the principles of the Open Access Paradigm."

SCOAP3 Model An international consortium to convert existing (and new) top-quality HEP journals to OA Libraries re-direct subscriptions to SCOAP3 SCOAP3 pays centrally for peer-review service Price-per-article established by call for tender Articles are (free and libre) Open Access

SCOAP3 Partnerships

SCOAP3 Outlook Reach critical mass Partnership in Asia and Latin America Engage publishers in a call for tender Go/No-Go decision

2010 Past 40 Years: Information Infrastructure in response to user needs Community Needs: Preserve Quality - SCOAP3 Promote Access - INSPIRE Archive Research Artifacts - INSPIRE/HEPData

Future of HEP Information Conversations on arXiv Noting, but not waiting for peer review. Rapid turnaround of freely accessible content Community driven Literature growing more complex Objects that aren’t papers, but are “information” “Datasets”, figures, tables, Computer code Use technology to tighten this relationship further…with an existing community

Guts...

SPIRES System PL360 Emulated in C! SPIRES (non-SQL DBMS + internal scripting language) And the clearest, least obfuscated, best documented part of the code base is......Perl!

INSPIRE Joint Project of CERN, DESY, Fermilab and SLAC Unify SPIRES content with Invenio platform Invenio = Open source digital library

INSPIRE Philosophy Leverage Users Clean, maintainable, sharable codebase Open Source/Open Standards Continue manual curation......but utilize automation feeds where possible Utilize person-power to drive user participation exercise judgement (author ID, classification)

Invenio: Modern System Stable, modern, extensible software stack (LAMP)‏ Fast, even with large repository Focused on search Open Source (GPL) community Substantial HEP use (CERN, ILC, …)‏ Over 20 production instances worldwide Modular architecture Based on open standards MARCXML, OAI-PMH, etc

Opportunities Enhanced Search and Discovery Automated classification using taxonomy User tagging Organize your personal papers etc. Run a Journal Club Author identification Claim your papers

User tagging Hidden 20 FTE - Can be utilized via interactive techniques 2007 survey of 2,000 physicists by CERN, DESY, Fermilab and SLAC Gentil-Beccot et al, Information Resources in High-Energy Physics: Surveying the Present Landscape and Charting the Future Course. J.Am.Soc.Inf.Sci.60: ,2009 arXiv:

Who do we know? HEPNames: 80K entries Affiliation history for 20K researchers s for 25K 800K papers with authors and (standardized) affiliations 5M ‘signatures’ on papers 350K unique name strings

Who Automatic Disambiguation Henning Weiler - PhD On 963 documents, 21 real authors could be identified for the query "Chen, G". 22 orphans remain 98% identified

User Accounts Tied to academic affiliation...and ORCID.... Ability to correct information and claim papers Corrections still vetted by staff

Sources Source of 2008 additions Many papers have information from multiple sources Many arXiv papers will be published later

arXiv OAI-PMH Feed Rough Metadata (author/title/id) LaTeX and/or PDF parsing Citations, Authors, Affiliations, Keywords Parsed by Perl/Python Checked (or redone) by Humans

Journals

Publishers APS (Phys.Rev.D, Phys.Rev.Lett.) Elsevier (Phys.Lett.B, Nucl.Phys.B) Springer (Eur.Phys.C, JHEP(>2010)) IOP (J.Phys.G, JHEP (<2010))

Feeds APS in OAI-PMH Full Metadata + References Elsevier, Springer, JHEP In-house XML via FTP Rich Metadata, Most with References Fall back to screen-scraping HTML

Users In 2008: 173 Added papers directly from users 3,800 Papers with user updates/corrections to reference lists 4,000 User updated profiles (institutional history, etc)

Export DOIs, publication information to arXiv, ADS bidirectional exchange of XML Currently: Rough “API” with in-house XML formats for Physicists building apps INSPIRE:OAI-PMH interface, rich API NLM DTD MARCXML

2010 Past 40 Years: Information Infrastructure in response to user needs Community Needs: Preserve Quality - SCOAP3 Promote Access - INSPIRE Archive Research Artifacts - INSPIRE/HEPData

~15,000 HEP scientists smash stuff at the speed of light to produce new stuff

…and it works! LHC re-discovering known particles for starters. First needles in the haystack: one in a million.

Data HEPData - Durham U. Stores Data “behind” figures/tables Submitted from Experiments INSPIRE partners with HEPData Provides access, linking and deposition in central community location Serve “long-tail” of theorists and others with “misc.” materials Enables access citation etc..

Existing Infrastructure

Data Trusted Community Infrastructure Future? DPHEP Study Group Continuing conversation with researchers to develop data preservation strategy

Conclusion Access, Quality, and Artifacts Emerging from community of researchers Aligned with community needs Target what scientists need Quality - Speed - Completeness Building on existing, trusted infrastructures

Infrastructure The basic facilities, services and installations needed for the functioning of a community or society wiktionary.org

Questions? For more information on INSPIRE see