Data Management for Physics Analysis in PHENIX (BNL, RHIC) Evaluation of Grid architecture components in PHENIX context Barbara Jacak, Roy Lacey, Saskia.

Slides:



Advertisements
Similar presentations
CPSCG: Constructive Platform for Specialized Computing Grid Institute of High Performance Computing Department of Computer Science Tsinghua University.
Advertisements

ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
Data Grids Darshan R. Kapadia Gregor von Laszewski
PostgreSQL Replicator – easy way to build a distributed Postgres database Irina Sourikova PHENIX collaboration.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
Workload Management Massimo Sgaravatto INFN Padova.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
A Distributed Computing System Based on BOINC September - CHEP 2004 Pedro Andrade António Amorim Jaime Villate.
Tech talk 20th June Andrey Grid architecture at PHENIX Job monitoring and related stuff in multi cluster environment.
STAR scheduling future directions Gabriele Carcassi 9 September 2002.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
MySQL and GRID Gabriele Carcassi STAR Collaboration 6 May Proposal.
TRASC Globus Application Launcher VPAC Development Team Sudarshan Ramachandran.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
PNPI HEPD seminar 4 th November Andrey Shevel Distributed computing in High Energy Physics with Grid Technologies (Grid tools at PHENIX)
CHEP Sep Andrey PHENIX Job Submission/Monitoring in transition to the Grid Infrastructure Andrey Y. Shevel, Barbara Jacak,
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
4/20/02APS April Meeting1 Database Replication at Remote sites in PHENIX Indrani D. Ojha Vanderbilt University (for PHENIX Collaboration)
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
HEPD sem 14-Dec Andrey History photos: A. Shevel reports on CSD seminar about new Internet facilities at PNPI (Jan 1995)
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
Quick Introduction to NorduGrid Oxana Smirnova 4 th Nordic LHC Workshop November 23, 2001, Stockholm.
Andrey Meeting 7 October 2003 General scheme: jobs are planned to go where data are and to less loaded clusters SUNY.
Laboratoire LIP6 The Gedeon Project: Data, Metadata and Databases Yves DENNEULIN LIG laboratory, Grenoble ACI MD.
Operated by the Southeastern Universities Research Association for the U.S. Depart. Of Energy Thomas Jefferson National Accelerator Facility Andy Kowalski.
PHENIX and the data grid >400 collaborators Active on 3 continents + Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
PPDG update l We want to join PPDG l They want PHENIX to join NSF also wants this l Issue is to identify our goals/projects Ingredients: What we need/want.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
February 28, 2003Eric Hjort PDSF Status and Overview Eric Hjort, LBNL STAR Collaboration Meeting February 28, 2003.
Databases for data management in PHENIX Irina Sourikova Brookhaven National Laboratory for the PHENIX collaboration.
January 26, 2003Eric Hjort HRMs in STAR Eric Hjort, LBNL (STAR/PPDG Collaborations)
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
VOX Project Status T. Levshina. 5/7/2003LCG SEC meetings2 Goals, team and collaborators Purpose: To facilitate the remote participation of US based physicists.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
10 March Andrey Grid Tools Working Prototype of Distributed Computing Infrastructure for Physics Analysis SUNY.
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA Grid2Win : gLite for Microsoft Windows Elisa Ingrà - INFN.
Use of HLT farm and Clouds in ALICE
Replica/File Catalog based on RAM SUNY
PROOF – Parallel ROOT Facility
Grid2Win: Porting of gLite middleware to Windows XP platform
Near Real Time Reconstruction of PHENIX Run7 Minimum Bias Data From RHIC Project Goals Reconstruct 10% of PHENIX min bias data from the RHIC Run7 (Spring.
Presentation transcript:

Data Management for Physics Analysis in PHENIX (BNL, RHIC) Evaluation of Grid architecture components in PHENIX context Barbara Jacak, Roy Lacey, Saskia Mioduszewski, Dave Morrison, Zhiping Qui, Andrey Shevel, Irina Sourikova

Reporter: Andrey 25 March Overview  Collaboration  Data volumes  Aims, conditions, expectations, schemes  Main needs: File/Replica Catalog, Job submission, data moving  Known systems under development  Working prototype setup  Conclusion and Future plans

Reporter: Andrey 25 March Brief info on PHENIX  Large, widely-spread collaboration (same scale as CDF and D0)  ~ 400 Collaborators  12 Nations  57 Institutions  11 U.S. Universities  Currently in third year of data-taking

Reporter: Andrey 25 March Distributed Data  100 TB/yr of raw data  100 TB/yr of reconstructed output  ~ 25 TB/yr microDST (analyzed output)  Primary event reconstruction occurs at BNL RCF (RHIC Computing Facility)  Complete copy of reconstructed output at CC-J (Computing Center in Japan) and most of reconstructed output at CC-F (France)  Multi-cluster environment and distributed data is our reality.

Reporter: Andrey 25 March Conditions and Aims  PHENIX is running experiment. That means it is not easy to test new program tools. We try to test new program environment at remote university – SUNY. We consider SUNY as typical example for remote site. Hopefully our experience could be used at all other remote midsized teams.  To install, evaluate and tune existing advanced GRID program tools to make robust and flexible distributed computing platform for physics analysis. We plan to keep it with minimum of maintenance efforts.

Reporter: Andrey 25 March Our Expectation on distributed computing  By distributing the data we hope to decrease the latency (response time) for remote users for simulation and data analysis.  Remote end users (physicists) will able to get more computing power in multi-cluster environment.

Reporter: Andrey 25 March General scheme: jobs are going where data are and to less loaded clusters SUNY RAM RCF Main Data Repository Partial Data Replica

Reporter: Andrey 25 March Replica/File catalog  Needs to maintain information about significant files in different locations (at BNL and remote sites). Expected total number of files is about 10**5 at remote site and is about 10**6 at central PHENIX location. The consistency of the catalog is the requirement.  At PHENIX there was created catalog ARGO based on DBMS Postgres (presented at CHEP03). At remote site (SUNY) there were tested replication of ARGO and adapted version of MAGDA (presented at CHEP03).

Reporter: Andrey 25 March Computing Resource Status and job submission  We need for simple and reliable tool to see current status of available computing resources in multi-cluster environment (GUI and CLI).

Reporter: Andrey 25 March Known systems under development  “ Chimera : A Virtual Data System for Representing, Querying, and Automating Data Derivation”  Clarens: “ The Clarens Remote Dataserver is a wide-area network system for remote analysis of data generated by the CMS” / /  AliEn AliEn is a GRID prototype created by the Alice Offline Group for Alice Environment.

Reporter: Andrey 25 March Need for prototype  All mentioned systems are not trivial, they include many components.  The underlying structure (Globus) is not trivial as well.  To be sure for base structure we have to test it first.  Prototyping with minimum of manpower - a balance between potential profit and achievable results.

Reporter: Andrey 25 March Job Submission: Initial Configuration In our case we used two computing clusters which are available for us  At SUNY (RAM: / ) /  At BNL PHENIX (RCF: / ) /  Globus gateways are used in both cases.

Reporter: Andrey 25 March Job Submission and retrieval Commands - Submission to SUNY - Submision to RCF - Job queue status at SUNY - Job queue status at RCF - Submission to less loaded cluster gsub script - Get job standard output + standard error output gget job-id - Submission where the file is located gsub-data script file - Job status gstat job-id

Reporter: Andrey 25 March Data moving  Our Conditions  From time to time we need to transfer a group of files (from about 10**2 to 10**4 files) in between different locations (in between SUNY and BNL). Apparently we need to keep newly copied files in Replica/File Catalog. Trace of all data transfers is required as well.  Now it is realized with our MAGDA distribution (with GRIDftp). To see SUNY data catalog based on our MAGDA distribution you can use “

Reporter: Andrey 25 March Requirements to deploy the prototype  To deploy the prototype of computing infrastructure for physics analysis we tested somebody needs:  PC, Linux 7.2/7.3 (it was tested);  Globus Tools 2.2.3;  To get two tarballs with scripts (including SUNY distribution for MAGDA): magda-client.tar.gz and gsuny.tar.gz. Dedicated site can be seen  MySql server (if required) + MySql client + Postgres + C++ interfaces + perl interfaces;  To get Globus certificates.

Reporter: Andrey 25 March CONCLUSION  Transition to GRID architecture could only follow the understanding in GRID computing model of all involved people. It is not business for one person.  GRID tools have to be available and supported on centralized computing resources.

Reporter: Andrey 25 March Future plans  To add another computing cluster to our prototype.  To evaluate more sophisticated and advanced middleware components.