Replicating Data from the Large Electron Positron (LEP) collider at CERN (Aleph Experiment) Under the DPHEP umbrella Marcello Maggi/INFN –Bari Tommaso.

Slides:



Advertisements
Similar presentations
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Advertisements

First results from the ATLAS experiment at the LHC
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
DATA PRESERVATION IN ALICE FEDERICO CARMINATI. MOTIVATION ALICE is a 150 M CHF investment by a large scientific community The ALICE data is unique and.
OCLC Online Computer Library Center Registry of Digital Masters A joint project of the Digital Library Federation and OCLC Taylor Surface, OCLC ALA Annual.
LHC’s Second Run Hyunseok Lee 1. 2 ■ Discovery of the Higgs particle.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Research Infrastructures – Proposal n The CHAIN-REDS.
Considering Open Access – Digital Preservation of arts research data: AKA Managing your “stuff” Open Repositories Conference 2015 Main Strand Dr Robin.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Cyberinfrastructure - Collaborative Portal for the IOOS Super-Regional Modeling Testbed Sara Graves, Manil Maskey, Ken Keiser Information Technology &
Storage Tank in Data Grid Shin, SangYong(syshin, #6468) IBM Grid Computing August 23, 2003.
Supercomputing in High Energy Physics Horst Severini OU Supercomputing Symposium, September 12-13, 2002.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
…building the next IT revolution From Web to Grid…
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Marco Cattaneo, Aleph plenary, 23rd April Long term archive of LEP data  LEPC working group report Purpose Assumptions Conclusions  Physics goals.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The pan-European.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
1 Gateways. 2 The Role of Gateways  Generally associated with primary sites in ESG-CET  Provides a community-facing web presence  Can be branded as.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
LHC Computing, CERN, & Federated Identities
Store and Share Research Data b2share.eudat.eu B2SHARE How to share and store research data using EUDAT’s B2SHARE This work is licensed under.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT EGI interoperability.
Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.
Preservation of LEP Data There is still hope Is there? Marcello Maggi, Ulrich Schwickerath, Matthias Schröder, , DPHEP7 1.
International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics RECODE - Final Workshop - January.
Dr Tim Smith CERN/IT For the visit of the Alliance of German Science Organizations.
Z & W Boson Reconstruction with Monte Carlo Simulations and ATLAS DATA Adam Lowery Supervisor: Dr. Jonas Strandberg Wednesday, August 11, 2010.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
CLARIN EUDAT2020 uptake plan Dieter Van Uytvanck CLARIN ERIC EUDAT User Forum, Rome.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Usecases: 1.ISIS Neutron Source 2.DP for HEP Matthew Viljoen STFC, UK APARSEN-EGI workshop: preserving big data for research Amsterdam Science Park 4-6.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The use of the.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
The Open Access Repository of INFN Roberto Barbera and Rita Ricceri – INFN
Capturing from the start: managing grey literature in a brand new research University Mohamed Ba-essa J. K. Vijayakumar.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
LEP DATA PRESERVATION 11 years of data taking 4 Experiments Large Luminosity ~1200 Scientific Papers ALEPH Raw data 5 Terabytes DST 800 Gigabytes Mini.
Discover ScholarSphere A repository service collaboration between the University Libraries and ITS.
4th School on High Energy Physics Prepared by Ahmed fouad Qamesh Cairo university HIGGS BOSON RECONSTRUCTION.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Services.
Agenda’s for Preservation Research Micah Altman MIT Libraries Prepared for SAA Research Forum Atlanta August 2016.
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
Accessing the VI-SEEM infrastructure
EUDAT’s engagement with the Earth Sciences
AAI for a Collaborative Data Infrastructure
ICOS on-demand atmospheric transport computation A use case for interoperability of EGI and EUDAT services Ute Karstens, André Bjärby, Oleg Mirzov, Roger.
Data Services at CSC ©2016 OKM ATT initiative Licensed under Creative Commons BY 4.0.
Job workflow Pre production operations:
 YongPyong-High Jan We appreciate that you give an opportunity to have this talk. Our Belle II computing group would like to report on.
European Research Data Services, Expertise & Technology Solutions
Break out group coordinator:
Federated Hierarchical Filter Grids
Building an open library without walls : Archiving of particle physics data and results for long-term access and use Joanne Yeomans CERN Scientific Information.
Joining the EOSC Ecosystem
EOSC-hub Contribution to the EOSC WGs
Presentation transcript:

Replicating Data from the Large Electron Positron (LEP) collider at CERN (Aleph Experiment) Under the DPHEP umbrella Marcello Maggi/INFN –Bari Tommaso Boccali/INFN-Pisa International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics

The HEP Scientist u u d d c c s s t t b b e e μ μ τ τ ντντ ντντ νμνμ νμνμ νeνe νeνe γ γ Z Z W W g g Quarks Leptons Force carriers h h The Just Discovered Piece Fermions Bosons The Standard Model Of Elementary Particles Dark Matter Matter/Anti-matter Asymmetry Super Symmetric Particles FROM MICROCOSM TO MACROCOSM

The use case The ALEPH Experiment took data from the CERN e + e − collider LEP from 1989 to 2000 Collaboration of ~500 scientists from all over the world More than 300 papers were published by the ALEPH Collaboration

ALEPH data – still valuable While the Collaboration practically stopped a couple of years later, some late papers were published, and we still get request by Theoretical Physicists for additional checks / studies on ALEPH Data Current policy: any paper can use ALEPH data if among the author there is at least one former ALEPH Member (moving to CC0?)

Data Sharing & Data Management Fundamental Issue Birth of CERN

Some Facts Looking at Reconstructed samples (the format closest to physics utilization) ALEPH data consists in: 150k files, avg size 200 MB 30 TB Split between real collected events and Monte Carlo simulated events Processing times (analysis) on recent machines is such that 1 week is enough to process all the sample Splitting between machines helps!

The Computing Environment Last blessed environment (Blessed = blessed for physics) is Linux SL4 GCC 3.4 G LIBC6 All the sw ALEPH uses will have a CC licence We can recompile everything on our own

Data workflows RAW DATA Reconstructed data Fortran + HBOOK User Code Kinematic Simulation Paeticle/Matter Simulation Reconstructed Simulated Events Real DATA Simulated Events Various tools Geant3 based“Julia”

Current Strategy Computing Environment via VM approach Currently using uCERN-VM Provides batch ready VMs, interactive ready VMs, development ready VMs Data to be served via POSIX to the executables Current approach (pre Eudat) was –Via WebDAV (Apache, Storm, …) –Seen by the VM as FUSE/DavFS2 mounted POSIX Filesystem

What is missing? Metadata! –What is a file containing? –Where is a file available? –“Give me all the data taken at 203 GeV in Spring 1999” –“Give me all the 203 GeV for hadronic W decays” We had a tool, SQL based –We still have the SQL dump –Tool only reachable via low level commands

LOCAL BATCH LOCAL BATCH HPC GRID The Picture Network Infrastruture Storage Disaster Recovery Disaster Recovery Data Curation Bit Preservation Bit Preservation Knowledge Preservation Knowledge Preservation Data Infrastructure (Ingestion, PID, Trust, Integrity, Replica, Sharing,…) CLOUD Experiment VM Experiment VM Fetch data & environment Data Discovery Data Discovery (OPEN?) Access (OPEN?) Access Digital Library (oa reps) Digital Library (oa reps) B2SAFE B2SHARE B2FIND B2STAGE

MetaData Ingestion Complexity is the norm… 1999 (Real Data)

Eudat? Plan is “simple” Move all the files WebDAV -> Eudat node –Already started, to CINECA-Eudat node Use B2FIND to re-implement a working metadata catalog (SQL to B2FIND??) Using other B2* tools to (we need to study here) –Prestage files on the VM before execution? (B2STAGE) –Access via streaming? (B2STREAM?) Using B2* to save outputs in a safe place, with correct metadata (B2SHARE)

Few Starting Feedbacks How it is now: 1) Provide identity 2) gridftp 3) copyback pids 4) build metadata 5) feed OA reps 6) give oai-pmh link Data MetaData (template needed) Eudata Node PID B2Find (to be integrated) Location cksum size pid USER OpenAccess Repository OAI-PMH 150,000 files… Missing bulk “B2Ingestion” B2SHARE, B2SAFE,B2STAGE do “just” part of it

OA Reps Data Reps OAI-PMH Harvester (running on grid/cloud ) Linked-data search engine Semantic-web enrichment End-points Harvester (running on grid/cloud) Data & Knowledge Infrastructure Roberto Barbera/INFN e Università, Catania

Knowledge Base Not only data ALEPH Virtual Environment Link to Digital Library (Open Access Reps) File Catalogues Documentation Instructions

HEP status ALEPH is now a small experiment KLOE has 1 PB CDF has 10 PB LHC has > 100 PB Data Nodes can federate, specially useful for sites where computing is available NETWORK Low level High level Eudat Node1 Eudat Node1 Eudat Node2 Eudat Node2 Eudat Node3 Eudat Node3

The (Big) DATA today 10 7 “sensors” produce 5 PByte/sec Complexity reduced by a Data Model Analytics in real time filters to 0.1−1 Gbyte/sec (Trigger) Data + Replica move with a Data Management Policy Analytics produce “Publication Data” that are Shared Finally the Publications Re-use of Data relies on Data Archive Management Plans