Presentation is loading. Please wait.

Presentation is loading. Please wait.

SPIRES and INSPIRE Travis Brooks SLAC National Accelerator Laboratory INSPIRE Collaboration PPA Computing 1 July 2010.

Similar presentations


Presentation on theme: "SPIRES and INSPIRE Travis Brooks SLAC National Accelerator Laboratory INSPIRE Collaboration PPA Computing 1 July 2010."— Presentation transcript:

1 SPIRES and INSPIRE Travis Brooks SLAC National Accelerator Laboratory INSPIRE Collaboration PPA Computing 1 July 2010

2 Infrastructure The basic facilities, services and installations needed for the functioning of a community or society wiktionary.org

3 Community ~30,000 researchers worldwide Questions like: What is the universe made of? What happened 3µs after the Big Bang? Distinction between Theory and Experiment

4 ~15,000 HEP scientists smash stuff at the speed of light to produce new stuff

5 …and it works! LHC re-discovering known particles for starters. First needles in the haystack: one in a million.

6 Another 15,000 HEP researchers scratch their heads to make sense of all that stuff and then some more

7 Community Experiment Large, global collaborations ( > 2000 authors!) Big centers of research distributed globally SLAC, Fermilab, CERN, DESY, KEK Theory Small, but global collaborations (avg 3 authors) Self-contained papers

8

9 1960’s - 1970’s HEP Lab Libraries store paper preprints Distributed via postal mail to major centers “Institute-pays” Open Access “SPIRES” catalogs (and distributes) preprints received at SLAC Centralized, community-driven model Users query SPIRES via terminal login accounts

10 1990’s CERN Invents WWW Users query SPIRES at SLAC via 1 st Web Site in the U.S.

11 1991: arXiv.org

12 Preprint Culture Connections/trust/expertise Infrastructure from Labs SPIRES, WWW, arXiv Researcher desire for rapid communication

13 2000’s 2007 survey of 2,000 physicists by CERN, DESY, Fermilab and SLAC. “What is your primary HEP Information Resource?” Gentil-Beccot et al, Information Resources in High-Energy Physics: Surveying the Present Landscape and Charting the Future Course. J.Am.Soc.Inf.Sci.60:150-160,2009 arXiv:0804.2701

14 97% of published literature freely available on arXiv No Mandates – No Debates

15 Researchers want speed

16 SPIRES counts: citations to/from preprints/articles Citation peaks at publications Scientific discourse proceeds on discipline repository

17 Citation Advantage When an arXiv paper is published, it has already surpassed the citation count a non-arXiv paper will have after 2 years

18 Read Journals? Gentil-Beccot et al. arxiv:0906.5418 As many scientists as analyzed here go straight to arXiv so 80% arXiv users becomes 90% arXiv users arXiv 82% Publisher server 18% ∼ 30,000 clicks (choice between arXiv and journal)

19 Benefits to Researchers Centralized discipline-based repository with curated metadata/search Includes Peer reviewed literature Links to every known copy dois, urls, arXiv

20 Numbers 834,049 (as of Oct 15) 50,077 (During 2008) 82,719 (Oct 15 - typical) 178 (Last week - typical)

21 What is SPIRES? Deep, carefully curated metadata Authors, Affiliations, Citations, Keywords Carefully, intentionally limited to HEP Associated community information Conferences, Institutions, People, Jobs

22

23

24

25

26

27 Future of HEP Information Conversations on arXiv Noting, but not waiting for peer review. blog/wiki - like Rapid turnaround Freely accessible content Community driven Use technology to tighten this relationship further…with an existing community

28 2010 Past 40 Years: Information Infrastructure in response to user needs Community Needs in 2010: Preserve Quality Promote Access Archive Research Artifacts

29 2010 Past 40 Years: Information Infrastructure in response to user needs Community Needs: Preserve Quality - SCOAP3 Promote Access - INSPIRE Archive Research Artifacts - INSPIRE/HEPData

30 Quality via Peer Review Peer Review and other journal services currently funded by HEP libraries paying for access...

31 ..to material that is freely available

32 HEP Open Access LHC scientists (8000 scientists from 54 countries): "We strongly […] support the principles of Open Access Publishing, which includes granting free access of our publications to all. Furthermore, we encourage all our members to publish papers in easily accessible journals, following the principles of the Open Access Paradigm."

33 SCOAP3 Model An international consortium to convert existing (and new) top-quality HEP journals to OA Libraries re-direct subscriptions to SCOAP3 SCOAP3 pays centrally for peer-review service Price-per-article established by call for tender Articles are (free and libre) Open Access

34 SCOAP3 Partnerships

35 SCOAP3 Outlook Reach critical mass Partnership in Asia and Latin America Engage publishers in a call for tender Go/No-Go decision

36 2010 Past 40 Years: Information Infrastructure in response to user needs Community Needs: Preserve Quality - SCOAP3 Promote Access - INSPIRE Archive Research Artifacts - INSPIRE/HEPData

37 Future of HEP Information Conversations on arXiv Noting, but not waiting for peer review. Rapid turnaround of freely accessible content Community driven Literature growing more complex Objects that aren’t papers, but are “information” “Datasets”, figures, tables, Computer code Use technology to tighten this relationship further…with an existing community

38 Guts...

39 SPIRES System PL360 Emulated in C! SPIRES (non-SQL DBMS + internal scripting language) And the clearest, least obfuscated, best documented part of the code base is......Perl!

40 INSPIRE Joint Project of CERN, DESY, Fermilab and SLAC Unify SPIRES content with Invenio platform Invenio = Open source digital library http://invenio-software.org http://inspirebeta.net

41 INSPIRE Philosophy Leverage Users Clean, maintainable, sharable codebase Open Source/Open Standards Continue manual curation......but utilize automation feeds where possible Utilize person-power to drive user participation exercise judgement (author ID, classification)

42 Invenio: Modern System Stable, modern, extensible software stack (LAMP)‏ Fast, even with large repository Focused on search Open Source (GPL) community Substantial HEP use (CERN, ILC, …)‏ Over 20 production instances worldwide Modular architecture Based on open standards MARCXML, OAI-PMH, etc

43 Opportunities Enhanced Search and Discovery Automated classification using taxonomy User tagging Organize your personal papers etc. Run a Journal Club Author identification Claim your papers

44

45

46

47

48

49

50 User tagging Hidden 20 FTE - Can be utilized via interactive techniques 2007 survey of 2,000 physicists by CERN, DESY, Fermilab and SLAC Gentil-Beccot et al, Information Resources in High-Energy Physics: Surveying the Present Landscape and Charting the Future Course. J.Am.Soc.Inf.Sci.60:150-160,2009 arXiv:0804.2701

51 Who do we know? HEPNames: 80K entries Affiliation history for 20K researchers Emails for 25K 800K papers with authors and (standardized) affiliations 5M ‘signatures’ on papers 350K unique name strings

52 Who Automatic Disambiguation Henning Weiler - PhD student@CERN On 963 documents, 21 real authors could be identified for the query "Chen, G". 22 orphans remain 98% identified

53

54

55 User Accounts Tied to academic affiliation...and ORCID.... Ability to correct information and claim papers Corrections still vetted by staff

56 Sources Source of 2008 additions Many papers have information from multiple sources Many arXiv papers will be published later

57 arXiv OAI-PMH Feed Rough Metadata (author/title/id) LaTeX and/or PDF parsing Citations, Authors, Affiliations, Keywords Parsed by Perl/Python Checked (or redone) by Humans

58 Journals

59 Publishers APS (Phys.Rev.D, Phys.Rev.Lett.) Elsevier (Phys.Lett.B, Nucl.Phys.B) Springer (Eur.Phys.C, JHEP(>2010)) IOP (J.Phys.G, JHEP (<2010))

60 Feeds APS in OAI-PMH Full Metadata + References Elsevier, Springer, JHEP In-house XML via FTP Rich Metadata, Most with References Fall back to screen-scraping HTML

61 Users In 2008: 173 Added papers directly from users 3,800 Papers with user updates/corrections to reference lists 4,000 User updated profiles (institutional history, etc)

62 Export DOIs, publication information to arXiv, ADS bidirectional exchange of XML Currently: Rough “API” with in-house XML formats for Physicists building apps INSPIRE:OAI-PMH interface, rich API NLM DTD MARCXML

63 2010 Past 40 Years: Information Infrastructure in response to user needs Community Needs: Preserve Quality - SCOAP3 Promote Access - INSPIRE Archive Research Artifacts - INSPIRE/HEPData

64 ~15,000 HEP scientists smash stuff at the speed of light to produce new stuff

65 …and it works! LHC re-discovering known particles for starters. First needles in the haystack: one in a million.

66 Data HEPData - Durham U. Stores Data “behind” figures/tables Submitted from Experiments INSPIRE partners with HEPData Provides access, linking and deposition in central community location Serve “long-tail” of theorists and others with “misc.” materials Enables access citation etc..

67 Existing Infrastructure

68

69

70

71

72

73 Data Trusted Community Infrastructure Future? DPHEP Study Group Continuing conversation with researchers to develop data preservation strategy

74 Conclusion Access, Quality, and Artifacts Emerging from community of researchers Aligned with community needs Target what scientists need Quality - Speed - Completeness Building on existing, trusted infrastructures

75 Infrastructure The basic facilities, services and installations needed for the functioning of a community or society wiktionary.org

76 Questions? For more information on INSPIRE see http://www.projecthepinspire.net http://inspirebeta.net http://www.projecthepinspire.net http://inspirebeta.net


Download ppt "SPIRES and INSPIRE Travis Brooks SLAC National Accelerator Laboratory INSPIRE Collaboration PPA Computing 1 July 2010."

Similar presentations


Ads by Google