Building an open library without walls : Archiving of particle physics data and results for long-term access and use Joanne Yeomans CERN Scientific Information Service ICHEP’06 Parallel session 12: GRID distributed analysis in high energy physics
published papers and preprints When we talk about archiving results, what does this mean? In the past it meant published papers and preprints Joanne Yeomans : CERN
This has been successful CERN preprints available as %age of total CERN papers published HEP arXiv submissions This has been successful Joanne Yeomans : CERN
What has been possible with preprints up to now? arXiv and institutional repositories around 70% of articles available now free in full-text These repositories of open access (OA) results allow added value which helps users find what they need: Interlinking of records Citation extraction and citation linking (backwards and forwards) Similarity analysis/automatic keyword extraction Commenting features OA solution will enable new computational analysis tools which might better automate information retrieval and analysis of results Different user interfaces possible for different experiments/different people Joanne Yeomans : CERN
Joanne Yeomans : CERN
Joanne Yeomans : CERN
Link to full conference details Authors Display formats Published version Link to full conference details Authors Other papers from the same conference Citation count from ISI Free OA preprint Link to experiment details Keywords Joanne Yeomans : CERN
But more is possible… Still more integration could be achieved eg pre-publication process - writing/internal editorial system eg publication process - publisher & peer review process and Data – long-term accessibility (for re-analysis) and integration with publications and educational materials What does this mean? Joanne Yeomans : CERN
HEP data Accessibility Use Publications HEP data is complex (more complex than eg astronomy) Raw data? Calibrated data? High level objects? Volume of data, volume/type of access Experiment/computing environment lifetimes Use of standards for cross-discipline linking Use Accountability issues Software, human knowledge not encoded Human aspects : access restrictions, extra work Need common and persistent vocabulary/ids for citation Publications Easier to enhance papers – first step? Joanne Yeomans : CERN
Molecular biology and bioinformatics : ArrayExpress Joanne Yeomans : CERN
LEP data example Development by IT of a "museum computing system", based and frozen on existing lxplus technology/software, with access possibilities to (at present CASTOR) mass storage where all data are stored. the safeguarding of 'standard' analysis framework software and of mini-data on a number of PC’s the development of a modern C++ analysis framework (in some cases) the establishment of rules for access to data by non-members of the Collaboration. http://pfeiffer.home.cern.ch/pfeiffer/LEP-Data-Archive/Scenarios.html Joanne Yeomans : CERN
The lessons learned with LEP data Regardless of open preprint culture, there is no real OA to HEP data Data becomes either little usable, or little useful (depending on level stored) Reliance on human memories Recovery of knowledge should not be prompted by surprises – what if new LHC discoveries require a re-analysis of LEP data? Issues are not just technical Physics community needs to work with IT and Library specialists Joanne Yeomans : CERN
First steps? Joanne Yeomans : CERN
Joanne Yeomans : CERN
Possible future Publish high-level objects behind each scientific paper and integrate with preprints Publish all high-level objects after the end of the collaboration for later reuse Use LEP as a case study More collaboration between information/library groups within different labs Consider different users and their requirements – researchers outside collaboration, training of students, “educated public” Joanne Yeomans : CERN
CERN Scientific Information Service Thankyou Joanne Yeomans CERN Scientific Information Service This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. Joanne Yeomans : CERN