Astroparticle data transfer between China and EU

Slides:



Advertisements
Similar presentations
Data Management Expert Panel - WP2. WP2 Overview.
Advertisements

FP6−2004−Infrastructures−6-SSA Data Grid Infrastructure for YBJ-ARGO Cosmic-Ray Project Gang CHEN, Hongmei ZHANG - IHEP.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
Glite I/O Storm Testing in EDG-LCG Framework Elena Slabospitskaya, Vadim Petukhov, (IHEP, Russia) Gilbert Grosdidier, (CNRC, France) NEC'2005, Sept 16.
HEP Grid Computing in China Gang Workshop on Future PRC-U.S. Cooperation in High Energy Physics.
MySQL and GRID Gabriele Carcassi STAR Collaboration 6 May Proposal.
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
Daniela Anzellotti Alessandro De Salvo Barbara Martelli Lorenzo Rinaldi.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
INFSO-RI Enabling Grids for E-sciencE Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,
FRANEC and BaSTI grid integration Massimo Sponza INAF - Osservatorio Astronomico di Trieste.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation and performance analysis of.
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
FP6−2004−Infrastructures−6-SSA EUChinaGrid status report Giuseppe Andronico INFN Sez. Di Catania CERN – March 3° 2006.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Progress on first user scenarios Stephen.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
EGEE is a project funded by the European Union under contract IST Enabling bioinformatics applications to.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Database Issues Peter Chochula 7 th DCS Workshop, June 16, 2003.
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
First South Africa Grid Training June 2008, Catania (Italy) GILDA t-Infrastructure Valeria Ardizzone INFN Catania.
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
The status of IHEP Beijing Site WLCG Asia-Pacific Workshop Yaodong CHENG IHEP, China 01 December 2006.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
Management of the Data in Auger Jean-Noël Albert LAL – Orsay IN2P3 - CNRS ASPERA – Oct Lyon.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Servizi core INFN Grid presso il CNAF: setup attuale
Jean-Philippe Baud, IT-GD, CERN November 2007
Dynamic Extension of the INFN Tier-1 on external resources
Regional Operations Centres Core infrastructure Centres
BaBar-Grid Status and Prospects
Chapter 3 Installing and Learning Software
LCG Service Challenge: Planning and Milestones
ATLAS Use and Experience of FTS
StoRM: a SRM solution for disk based storage systems
NA4/medical imaging. Medical Data Manager Installation
U.S. ATLAS Grid Production Experience
Moving the LHCb Monte Carlo production system to the GRID
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
ALICE FAIR Meeting KVI, 2010 Kilian Schwarz GSI.
INFN-GRID Workshop Bari, October, 26, 2004
Data Management and Database Framework for the MICE Experiment
SRM2 Migration Strategy
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
The Monitoring System of the ARGO-YBJ Data Acquisition
Simulation use cases for T2 in ALICE
LQCD Computing Operations
Ákos Frohner EGEE'08 September 2008
GSAF Grid Storage Access Framework
The INFN Tier-1 Storage Implementation
#01 Client/Server Computing
Database Management System (DBMS)
The GENIUS portal and the GILDA t-Infrastructure
Status and plans for bookkeeping system and production tools
Site availability Dec. 19 th 2006
#01 Client/Server Computing
Presentation transcript:

Astroparticle data transfer between China and EU Paola Celio Dipartimento di Fisica Roma TRE - INFN Roma TRE EGEE 06 Conference, Geneva 28.09.2006

Outline The Argo experiment Argo and EuChinaGRID activities physics motivation, layout Argo and EuChinaGRID activities Description of the modules Argo specific features Data transfer Data processing MonteCarlo Reconstruction Issues Conclusions

The Argo experiment physics motivation, layout Argo and EuChinaGRID activities Description of the modules Argo specific features Data transfer Data processing MonteCarlo Reconstruction Issues Conclusions

The ARGO experiment Italy – China collaboration Cosmic Ray Telescope installed at 4300 m a.s.l. at YangBaJing (YBJ, Tibet) Physics goal: study cosmic air showers, especially at low energies (< 1 TeV) gamma ray astronomy gamma ray burst physics sun physics primary proton spectum, …

Argo and EuChinaGRID activities The Argo experiment physics motivation, layout Argo and EuChinaGRID activities Description of the modules Argo specific features Data transfer Data processing MonteCarlo Reconstruction Issues Conclusions

ARGO and EuChinaGRID Activity of ARGO group in EuChinaGRID is focused on data management: Data Transfer: efficient transport of data to both main sites (IHEP and CNAF) Data Processing MonteCarlo simulation Data Reconstruction users' analysis left for the future Argo specific features: only two sites (IHEP and CNAF), ... ...but both are “primary” want the full data set, both raw and reconstructed data

Data transfer The Argo experiment Argo and EuChinaGRID activities physics motivation, layout Argo and EuChinaGRID activities Description of the modules Argo specific features Data transfer Data processing MonteCarlo Reconstruction Issues Conclusions

Data transfer Experimental data collected at YBJ with full detector: 7.5 MB/s steady data flow, high uptime: ~20 TB/month, ~200 TB/year Data taking organized in runs, a period of data taking during which conditions are kept constant each run made of several data files (< 2 GB each) and one or two small auxiliary text files Informations about runs and files are stored in a relational database (postgres). Constrained by network connectivity at YBJ, which used to be limited at 8 Mb/s: use tapes as main method for transfer. Some network transfer to IHEP, barely sufficient even at present rate

Data transfer Data transfer before GRID

Data transfer Now we’ll have network link from YBJ to rest of the world upgraded to 155 Mb/s and network connectivity between China and Europe upgraded to two 2.5 Gb/s links This framework paves the way to the use of GRID technologies Main Objectives optimize resource usage, make data available sooner stimulate formation of more integrated working groups make sure everybody uses same software (calibrations, etc.) Requisites minimize complexity at YBJ.

Data transfer Rather simple configuration, three sites involved but have different roles: production site: YangBaJing Tier1 sites: CNAF and IHEP Will present in next slides a Grid-enabled implementation to cope with Argo “problem”: based on GILDA testbed (Glite3) Developing the application we have to remember that we need working system soon took what was more easily pluggable (available and working and what we managed to configure) expand/extend later if need be

Data transfer Local (relational) DB: see next slide YBJ: User Interface SRM (DPM) local DB Local (relational) DB: see next slide Lines are FTS channels, color is channel owner Simple configuration at YBJ No FTS server Redundancy: Two FTS Two independent LFC catalogues Transfer model: YBJ – push Tier1– pull User Interface SRM (DPM) FTS LFC local DB User Interface SRM (DPM) FTS LFC local DB

Data transfer YBJ is master site DAQ_DB YBJ is master site CNAF and IHEP are “replicas”, updated asynchronously with respect to data transfer no CNAF-IHEP updates Relational database for DAQ: store information about files and runs updated to reflect current transfer status of files synchronizations done using database tools or custom scripts Not yet “gridified”. Studying possibility to substitute (in total or in part) with AMGA or RGMA

Data transfer in detail YBJ procedures: promote DAQ files to GRID files: update local DB: associate logical name to each file cleanup DAQ disk to make sure DAQ can write data: delete files when found in both LFC catalogues if need disk space for any reason (LFC failure, unable to transfer,...) activate tape backup update local DB: flag file as “to-tape” or “deleted”, or reset “in transfer” flag if transfer is taking too long enqueue for transfer: select entries on local SRM yet to be transferred select available FTS server and channel (try IHEP first) queue to FTS update local DB: save guid and service used

Data transfer in detail IHEP (or CNAF) procedure (could be split): Purposes: register files transferred from YBJ or from other side, make sure each LFC contains both the local and the remote copies select all non-local files (local DB? FTS? ...); check local DPM; if file found: check local LFC, register if not present, next if file not found: check other side; if present at other side: queue to FTS, next if not present at other side: next (and hope YBJ re-queues it) if file is in local LFC, check other side if found at other site: register as replica, next if not found: next (try again later)

Data transfer in detail Above data transfer strategy implemented and working: Now need extensive tests Had some last-minute problems related to FTS channel setup Aim at having it in production soon! Data transfer in detail

Data processing MonteCarlo Reconstruction The Argo experiment physics motivation, layout Argo and EuChinaGRID activities Description of the modules Argo specific features Data transfer Data processing MonteCarlo Reconstruction Issues Conclusions

Data processing: MonteCarlo MonteCarlo simulation (easier things first...) Corsika (and Fluka): physics simulation Capitalize on experience by MAGIC, which already uses Corsika on the GRID RPMs created, under test: will start by installing them at Roma3 and CNAF Next step is porting of procedures for JDL creation, may be also porting of Magic's web interface ArgoG (based on GEANT3): detector simulation widely used by “anybody else”, so... Not expecting great difficulties

Data processing: Reconstruction Data reconstruction objectives: process all raw data as soon as possible make all processed data available to both sites benefits: optimize CPU usage transparent access to resources minimize human time spent on production activities (1 person for 2 sites)

Data processing: Reconstruction Key role for relational DB might be susbstituted with AMGA, RGMA DB synchronization and update more critical: want to avoid both sites submitting same jobs (in a stronger way than just by policy agreement) synchronization tool depends on database structure Planning for just one database that other sites will replicate replica: critical parts read-only, non-critical parts can be updated

Issues Conclusions The Argo experiment Argo and EuChinaGRID activities physics motivation, layout Argo and EuChinaGRID activities Description of the modules Argo specific features Data transfer Data processing MonteCarlo Reconstruction Issues Conclusions

Argo and GRID: issues VO exists but: mutual acceptance of CA,... implementation details: how many BDIIs? RB? ... technical details: how to run cksum on a file? which tools exist for catalogue synchronization? ... strategy for the definition of a “production group” ownership of files in the Grid ownership of FTS channels (also for querying) ... Need to agree upon: logic names conventions database structure software versions and configurations

Conclusions The first development has finished and now we need to make an intensive test Of the application to discover and to resolve some “features” to optimize the application. We are building a java interface to show the data transfer and to monitor in a very user friendly way the processes. Next steps well be the development of the same procedure using the API to interact in a deeply way with the grid environment and to build a monitor with a more general purpose. Furthermore we are investigating performances of database used and an optimization of the bandwidth.

[1] Fulvio Galeazzi - Application of GRID technologies to the ARGO experiment: data transfer and data reconstruction – EuChinaGRID Meeting June 2006 [2] Cristian Stanescu – EuChinaGrid Meeting Rome September 2006 References

Thank you for your kind attention !