Jos Engelen CERN HEP and its data What is the problem? A possible way forward Permanent Access to the Records of Science Brussels - November 15 th 2007.

Slides:



Advertisements
Similar presentations
How do High Energy Physics scholars search their information? Anne Gentil-Beccot, CERN – 11 December 2007, GL9 conference.
Advertisements

The Messy World of Grey Literature in Cyber Security 8 th Grey Literature Conference 4-5 December 2006 New Orleans, Louisiana Patricia Erwin – I3P Senior.
SCOAP 3 a new publishing model for High-Energy Physics Anne Gentil-Beccot, Salvatore Mele, Jens Vigen CERN European Organization for Nuclear Research scoap3.org.
Open Access Publishing at the Terascale R ü diger Voss/CERN Physics at the Terascale kick-off workshop, DESY, 3-5 December 2007 scoap3.org HEP & OA: a.
Towards Open Access publishing A practical approach for particle physics Robert Aymar, Director General, CERN February 15-16, 2007 Scientific Publishing.
SCOAP 3 Forum ACRL Seattle Sponsoring Consortium for Open Access Publishing in Particle Physics Salvatore Mele CERN European Organization for Nuclear.
A platform of for knowledge and services sharing Fernando Ferri IRPPS-CNR.
Welcome to CERN CERN – The European Organization for Nuclear Research, Geneva, Switzerland.
Rolf-Dieter Heuer DESY - Research Director HEP CERN - Director-General Elect APE2008 Berlin - January
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
An exciting Opportunity to…. Enhance your A Level studies Develop Key Communication Skills Make personal links with Research staff at a leading University.
1 2 HEP aims to understand how our Universe works: -Experimental HEP : builds the largest scientific instruments ever to reach.
Maximizing the benefit of research information in Particle Physics *** A user-driven story Anne Gentil-Beccot, CERN. EuroCris. 11 May 2010.
Citing and reading behaviours in High Energy Physics *** Learning from OA bibliometrics? Anne Gentil-Beccot, CERN. Uppsala. 17 November 2010.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
Randall Sobie The ATLAS Experiment Randall Sobie Institute for Particle Physics University of Victoria Large Hadron Collider (LHC) at CERN Laboratory ATLAS.
January 2011 David Toback, Texas A&M University Texas Junior Science and Humanities Symposium 1 David Toback Texas A&M University Texas Junior Science.
How to fill an institutional repository - winning scientists over – the example from CERN Joanne Yeomans CERN Scientific Information Group Geneva - Switzerland.
11/18/02Travis Brooks-ASIST The Unpublishing of High Energy Physics Travis Brooks SPIRES Scientific Databases Manager Stanford Linear Accelerator.
March 2011 David Toback, Texas A&M University Davidson Scholars 1 David Toback Texas A&M University Davidson Scholars March 2011 The Big Bang, Dark Matter.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
October 2011 David Toback, Texas A&M University Research Topics Seminar 1 David Toback Texas A&M University Research Topics Seminar September 2012 Cosmology.
The CERN Scientific Information Service presented in a few minutes Open access to literature and data Jens Vigen 10 October 2008 PDG Collaboration Meeting,
LHC’s Second Run Hyunseok Lee 1. 2 ■ Discovery of the Higgs particle.
International collaboration in high energy physics experiments  All large high energy physics experiments today are strongly international.  A necessary.
1 Albrecht Wagner, Snowmass 0805 Albrecht Wagner DESY and Hamburg University Challenges for Realising the ILC.
E-Infrastructures for scholarly communication A first step to OA. An indispensable step for e-Science The case of High-Energy Physics Jens Vigen – Head.
The Large Hadron Collider is the world's largest and highest-energy particle accelerator.
Particle Physics at UVic Research directions and initiatives Ministry of Advanced Education/UVic Round Table Friday 27 February 2004 M. Lefebvre Physics.
1Purdue Physics Funfest Kirk Arndt Have you ever wondered…. How often you could split a grain of sand into smaller pieces? What the universe is.
Particle Physics Quiz EPPOG Hands on Particle Physics Masterclasses 2011.
P5 and the HEP Program A. Seiden Fermilab June 2, 2003.
European Organization for Nuclear Research Organisation Européenne pour la Recherche Nucléaire High-Energy Physics Data Delivering Data in Science ICSTI.
… where the Web was born 11 November 2003 Wolfgang von Rüden, IT Division Leader CERN openlab Workshop on TCO Introduction.
The Library The HEP Databases & The Changing Science at SLAC.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
An exciting Opportunity to…. Enhance your Physics studies Develop Key Communication Skills Make personal links with Research staff at a leading University.
The KB e-Depot long-term preservation of scientific publications in practice Marcel Ras, National library of The Netherlands.
CERN What Happens at CERN? "In the matter of physics, the first lessons should contain nothing but what is experimental and interesting to see. A pretty.
The 2 nd CERN-UNESCO School on Digital Libraries Jens Vigen (CERN) CNRST, Rabat, Morocco, November 2010.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
DESY Photon Science XFEL official start of project: 5 June 2007 FLASH upgrade to 1 GeV done, cool down started PETRA III construction started 2 July 2007.
Working Together Scientific Collaboration or Conspiracy?
Experimental Particle Physics Do you want to discover… What is the origin of mass ? Discover the Higgs boson with ATLAS Why is there more matter than anti-matter.
Open Archive Workshop, CERN th March 2001 Peer Review - the HEP View Mick Draper, CERN ETT Division
Open CERN The context High Energy Physics information landscape Open Access: 3 myths to be dispelled Policies Some stats Licenses What’s next:
FSU Experimental HEP Faculty Todd Adams Susan Blessing Harvey Goldman S Sharon Hagopian Vasken Hagopian Kurtis Johnson Harrison Prosper Horst Wahl.
Peer review in the era of LHC experiments Experimental particle physics as a Big Science paradigm Rüdiger Voss Physics Department CERN, Geneva, Switzerland.
LHC Computing, CERN, & Federated Identities
ATTRACT is a proposal for an EU-funded R&D programme for sensor, imaging and related computing devlopment Its purpose is to demonstrate the value of European.
05 Novembre years of research in physics European Organization for Nuclear Research.
DESY. Status and Perspectives in Particle Physics Albrecht Wagner Chair of the DESY Directorate.
1 A collision in the CMS detector Particle trajectories are reconstructed with precision of few microns (1 μ = m)
LEP DATA PRESERVATION 11 years of data taking 4 Experiments Large Luminosity ~1200 Scientific Papers ALEPH Raw data 5 Terabytes DST 800 Gigabytes Mini.
Information Literacy & Open Access for Physics and Astronomy Graduate Students Jackie Werner, Science Librarian Georgia State University
Stanford Linear Accelerator
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
Tim Smith CERN Geneva, Switzerland
CERN-UNESCO School on Digital Libraries
Compilation of SCOAP supported papers
Interoperability of Digital Repositories
Gwyn P. Williams and Kim Kindrew Pizza Seminar, September 18, 2013
What is CERN? About CERN's Name from the Web
Particle Physics Theory
Stanford Linear Accelerator
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Building an open library without walls : Archiving of particle physics data and results for long-term access and use Joanne Yeomans CERN Scientific Information.
Stanford Linear Accelerator
What is CERN?.
Bard An algorithmic solution to the LHC interpretation problem
Presentation transcript:

Jos Engelen CERN HEP and its data What is the problem? A possible way forward Permanent Access to the Records of Science Brussels - November 15 th 2007

High-Energy Physics (or Particle Physics) HEP aims to understand how our Universe works: — by discovering the most elementary constituents of matter and energy — by probing their interactions — by exploring the basic nature of space and time In other words, try to answer two basic questions: — "What is the world made of?” — "What holds it together?” Build the largest scientific instruments ever to reach the highest energies; develop theories to predict and describe the observed phenomena Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

3 CERN: European Organization for Nuclear Research (since 1954) The leading HEP laboratory, Geneva (CH) 2500 staff (mostly engineers) 8000 users (mostly physicists) 3 Nobel prizes (Accelerators, Detectors, Discoveries) Invented the web Commissioning the 27-km LHC accelerator Runs a 1-million objects Digital Library CERN Convention (1953): ante-litteram Open Access mandate “… the results of its experimental and theoretical work shall be published or otherwise made generally available” Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

4 CERN

The Large Hadron Collider Largest scientific instrument ever built, 27km of circumference The “coolest” place in the Universe -271˚C people involved in its design and construction Collides protons to reproduce ‘extreme’ conditions...40 million times a second Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

6 Accelerator complex (1959) Grootste ring: 27 km omtrek

Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/ m 46 m.,..,..,,,...,,..,.,.,.....,..,,,...,,..,.,.,....,..,..,,,...,,..,.,.,.....,..,,,...,,..,.,.,....,..,..,,,...,,..,.,.,.....,..,,,...,,..,.,.,....,..,..,,,...,,..,.,.,.....,..,,,...,,..,.,., m Colliding beams

Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/ The LHC experiments: about 100 million “sensors” each [think of your 6MP digital camera......taking 40 million pictures a second] ATLAS five-storey building CMS Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

9 The LHC data 40 million events (pictures) per second Select (on the fly) the ~200 interesting events per second to write on tape “Reconstruct” data and convert for analysis: “physics data” [inventing the grid...] (x4 experiments x15 years) Per eventPer year Raw data1.6 MB3200 TB Reconstructed data1.0 MB2000 TB Physics data0.1 MB 200 TB Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

10 Preservation, re-use and (Open) Access to HEP data Problem Opportunity Challenge Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

11 Some other HEP facilities (recently stopped or about to stop) Energy frontier Precision frontier No real long-term archival strategy...

Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/ Why should we care? We have a reason to produce these data in the first place Unique, not easily reproducible Might need to go back to the past (it happened) A peculiar community (the web, arXiv, the grid...) “If it works here, will work in many other places”

Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/ Preservation, re-use and (open) access continua (who and when) The same researchers who took the data, after the closure of the facility (~1 year, ~10 years) Researchers working at similar experiments at the same time (~1 day, week, month, year) Researchers of future experiments (~20 years) Theoretical physicists who may want to re- interpret the data (~1 month, ~1 year, ~10 years) Theoretical physicists who may want to test future ideas (~1 year, ~10 years, ~20 years) Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

14 Data preservation, circa pages of tables

Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/ Data preservation, circa pages of tables Very cumbersome tables describe event features Technical needs of multi-dimensional data which cannot fit on paper! What a discovery might look like......“missing energy”......a few events of background noise which all theorists want to check L3

What is the trouble with preserving HEP data? Where to put them ? Hardware migration ? Software migration/emulation? Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

What is the trouble with preserving HEP data? Where to put them ? Hardware migration ? Software migration/emulation? Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

18 HEP, Open Access & Repositories HEP is decades ahead in thinking Open Access: –Mountains of paper preprints shipped around the world for 40 years (at author/institute expenses!) –Launched arXiv (1991), archetypal Open Archive –>90% HEP production self-archived in repositories –100% HEP production indexed in SPIRES(community run database, first WWW server on US soil) OA is second nature: posting on arXiv before submitting to a journal is common practice –No mandate, no debate. Author-driven. HEP scholars have the tradition of arXiving their output (helas, articles) somewhere Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

Towards an e-Infrastructure for HEP scholarly communication Common vision of all stakeholders 1.Build a complete HEP information platform 2.Enable text- and data- mining applications 3.Demonstrate and deploy Web2.0 applications 4.Preservation and re- use of research data There will be a place to archive the data Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

What is the trouble with preserving HEP data? Where to put them ? Hardware migration ? Software migration/emulation? Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

21 Storage and migration of data at the CERN computing centre 1993 ~150’000 9track  GB 1997 ~250’  Redwood 20GB 2001 ~25’000 Redwood  GB 2004 ~5’ A  9940B 200GB 2007 ~22’ B  T1000A 500GB 1984Begin of construction 1989Start of data taking 2000End of data taking 2002End of in-silico experiments 2005End of (most) data analysis Life-cycle of previous-generation CERN experiment L3 at LEP Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

What is the trouble with preserving HEP data? Where to put them ? Hardware migration ? Software migration/emulation? Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

23 Computing environment of the L3 experiment at LEP 1984Begin of construction 1989Start of data taking 2000End of data taking 2002End of in-silico experiments 2005End of (most) data analysis Life-cycle of previous-generation CERN experiment L3 at LEP VAX for data taking IBM for data analysis Apollo (HP) workstations SGI mainframe Linux boxes Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

What is the trouble with preserving HEP data? The HEP data ! Where to put them ? Hardware migration ? Software migration/emulation? Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

25 Preserving HEP data? Concorde (15 km) Balloon (30 km) CD stack with 1 year LHC data! (~ 20 km) Mt. Blanc (4.8 km) The HEP data model is highly complex. Data are traditionally not re-used as in Astronomy or Climate science. Raw data  calibrated data  skimmed data  high-level objects  physics analyses  results. All of the above duplicated for in-silico experiments, necessary to interpret the highly-complex data. Final results depend on the grey literature on calibration constants, human knowledge and algorithms needed for each pass...oral tradition! Years of training for a successful analysis Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

A possible way forward, introducing: The parallel way Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

27 HEP data: The “parallel way” to publish/preserve/re-use/OpenAccess In addition to experiment data models, elaborate a parallel format for (re-)usable high-level objects –In times of need (to combine data of “competing” experiments) this approach has worked –Embed the “oral” and “additional” knowledge A format eventually understandable and thus re-usable by practitioners in other experiments and theorists Start from tables and work back towards primary data How much additional work? 1%, 5%, 10%? Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

28 “Major” issues with the “parallel” way A small fraction of a big number gives a large number Need insider knowledge to produce parallel data Activity in competition with research time (waiting for the end of the experiment is not an option) Thousands of person-years behind the data model of the large collaborations: – enormous (impossible?) academic incentives to encourage the “parallel way” – additional (external) funds Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

29 “Minor” issues with the “parallel” way Publish high-level objects behind each scientific article (voluntarily? compulsory? after a time lapse?) Publish all high-level objects after disbanding a collaboration (ownership? impact metrics?) Address issues of (open) access, credit, accountability, reproducibility of results, "careless discovers", "careless measurements”, depth of peer-reviewing A monolithic way of doing business needs rethinking A culture shift, which can only come from consensus Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

Preservation, re-use and (open) access to HEP data... first steps! Outgrowing an institutionalized state of denial A difficult and costly way ahead An issue which starts surfacing on the agenda Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

31 Conclusions HEP spearheaded (Open) Access to Scientific Information: 50 years of preprints, 16 of repositories... but data preservation is not yet on the radar Heterogeneous ‘users’ to preserve data for No insurmountable technical problems The issue is the data model itself –(Primary) data intelligible only to the producers –Need to produce a “parallel” format for preservation, re-use and (open) access –Massive person-power costs Preservation, re-use and (open) access of HEP data is appearing on the agenda... will need cultural consensus and financial support Exciting times are ahead! Jos Engelen - Preservation, re-use and (open) access of HEP data - Brussels 15/11/2007

Jos Engelen CERN Permanent Access to the Records of Science Brussels - November 15 th 2007 Thank you!