VO Box discussion ATLAS NIKHEF - 24-25 January, 2006 Miguel Branco -

Slides:



Advertisements
Similar presentations
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Advertisements

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Authorizing Grid Resource Access and Consumption Erik Elmroth, Michał.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Summary of issues and questions raised. FTS workshop for experiment integrators Summary of use  Generally positive response on current state!  Now the.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
How to Install and Use the DQ2 User Tools US ATLAS Tier2 workshop at IU June 20, Bloomington, IN Marco Mambelli University of Chicago.
DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
Heterogeneous Database Replication Gianni Pucciani LCG Database Deployment and Persistency Workshop CERN October 2005 A.Domenici
Storage, Networks, Data Management Report on Parallel Session OSG Meet 8/2006 Frank Würthwein (UCSD)
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
Rutherford Appleton Lab, UK VOBox Considerations from GridPP. GridPP DTeam Meeting. Wed Sep 13 th 2005.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
INFSO-RI Enabling Grids for E-sciencE ATLAS DDM Operations - II Monitoring and Daily Tasks Jiří Chudoba ATLAS meeting, ,
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
EGEE is a project funded by the European Union under contract IST VO box: Experiment requirements and LCG prototype Operations.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
Distributed Data Management Miguel Branco 1 DQ2 discussion on future features BNL workshop October 4, 2007.
U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Distributed Data Management Miguel Branco DDM development Architecture and developers workplan on behalf of the DDM group.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.
Distributed Data Management Miguel Branco 1 DQ2 status & plans BNL workshop October 3, 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
ATLAS DDM Developing a Data Management System for the ATLAS Experiment September 20, 2005 Miguel Branco
Baseline Services Group Status of File Transfer Service discussions Storage Management Workshop 6 th April 2005 Ian Bird IT/GD.
Jean-Philippe Baud, IT-GD, CERN November 2007
The EDG Testbed Deployment Details
ATLAS Use and Experience of FTS
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
CREAM-CE/HTCondor site
Ákos Frohner EGEE'08 September 2008
VO Box Requirements and Future Plans
WLCG Collaboration Workshop;
Data Management cluster summary
The LHCb Computing Data Challenge DC06
Presentation transcript:

VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -

ATLAS DDM2 Outline Usage and Motivation for VO Box –Tier0 (Service Challenge) use case An improvement Conclusion

ATLAS DDM3 Usage of VO Box ATLAS data management system requires a set of VO services to be deployed per Storage –Currently hosted on each site VO BOX (ATLAS Data Management = DQ) VO services implemented by DQ: –Interact with Grid middleware: FTS / srmcp LRC SRM –Allow insertion, monitoring, cancellation data transfer requests to each storage –Perform per-file and per-dataset validation –Interact with central ATLAS-specific dataset catalogues –Implement the ATLAS data flow as defined by the computing model

4 Usage of VO Box A typical VO BOX installation contains a set of services associated with one (or more) storage DQ triggers transfers from the destination side –Either VO services at the site poll central catalogs for new or updated dataset transfer requests –… or … –Transfers (per file or dataset) are directly requested to the VO service at the site VO services are not replacements for existing Grid middleware but complement its functionality

ATLAS DDM5 Usage of VO Box DQ components: –“Request Handling”: Apache-hosted services where end-users/production managers can directly insert file/dataset transfer requests –“Fetcher”: task polling central ATLAS dataset subscription catalog to check if any new or updated dataset transfer request has been scheduled for the storage –“Agents” implementing state machine with a backend database containing queue of file transfer requests

ATLAS DDM6 Usage of VO Box ATLAS deploys onto the VO Box –A service container (Apache + mod_python) –Uses the security infrastructure (mod_gridsite + MyProxy client) –A persistent database (MySQL) –A set of ‘agents’ acting on the requests Currently run as cron jobs Using vobox-* utils for handling security: proxy cache

7 Usage of VO Box MySQL Request Handling Fetcher Transfer Agents FTSLRCSRM VO Box ATLAS Central Dataset CataloguesDQ client tools Local Site Services Apache front-end services ATLAS Central Dataset Cat. or other VO Services

ATLAS DDM8 Motivation for VO Box FTS: A huge leap but difficult to use in isolation Use Case: ATLAS wants to move data from site A to site B “Insert FTS request” ? –What about intermediary sites (hop) ? –And what prevents multiple similar (or the same) transfer requests from being inserted multiple times ? –What prevents similar (or the same) set of files from being requested to ‘stage to disk’ many times over? Big lesson from DC-2/Rome: putting grid ‘business logic’ onto job scripts at the worker node is inefficient: –very difficult to control (e.g. stop), difficult to monitor, difficult to “bulk” requests to Grid m/w (FTS, LFC), …

ATLAS DDM9 Motivation for VO Box ATLAS has an LRC per site/storage –Remote connection to the LFC takes ~1.4s –Local connection to the LFC takes ~0.4s –LFC did not support well too many parallel sessions All above is an issue. Was a bottleneck during SC3 Non-LCG sites: FTS vs. srmcp –Missing FTS/FPS functionality VO services at each site are the ones contacting the site middleware services –Allow bulk requests: single stage request for many files, as opposed to e.g. having each job do an SRM request Currently: –WNs do not contact VO Box directly but this may be reconsidered Jobs would use the normal Apache front-end

ATLAS DDM10 Motivation for VO Box VO Services have helped improving: –Quota management –Space cleanup –LRC SE integrity –(Real-time) Monitoring –Error handling –Dynamic data management By building upon baseline services such as SRM and existing FTS functionality

ATLAS DDM11 Data handling scenario for Tier0 exercise (scenario covering transfers up to the Tier1s) Files are processed at CERN and attached to ATLAS datasets –10 Tier1s: 85k files/day transferred to Tier1s Datasets are subscribed at sites Subscriptions may be cancelled and changed to other sites (dynamically due to site errors) VO services poll for new dataset subscriptions and insert FTS requests to transfer data –Each VO service then handles requests until completion, allowing real-time monitoring, cancellation points –Not running centrally! Tier0 only: generating 85k requests/day –Compared to DC2/Rome model (single central DB) it is a more scalable and manageable approach

12 Security overview Current scenario: –Apache + mod_gridsite 2 ports open: 8000 (insecure: HTTP GET), 8443 (GSI security: HTTP POST) –vobox-* Proxy cache; Users must go to MyProxy first –Transfer Agents: Set security environment (from vobox-* proxy cache) and trigger requests to grid services –MySQL database: Must be reachable from ‘Transfer Agents’ and ‘Request Handling’ only –Logging: Apache (hosting services visible to the outside world) logs requests (requester IP, GSI User DN) Fetcher logs all contacts to central ATLAS dataset catalogues Request Handling DB: logs all requests (including User DN) Transfer Agents: single log file; agents depend only on local MySQL DB and act as clients to grid middleware –But depend on proxy cache (e.g. similarly to the RB)

ATLAS DDM13 Suggested small improvement Maintain VO Box principle: the ability for the VO to run its services –If services are hosted away from the site, performance suffers ATLAS experimented with “central” VO BOXes at CERN during SC3, hosted on LSF cluster Consider adding more generic middleware: –Apache, mod_python, mod_gridsite –Vobox-* –MySQL –LCG UI We have suggested on many forums the need for a standard model to develop, deploy and manage VO services

14 Conclusion 1/3 Naïve to assume the ‘grid’ middleware will handle all scenarios, all usage patterns on all conditions –Experiments need the flexibility to handle the middleware in a way that’s efficient - according to the experiment’s needs - and secure! –Difficult - impossible? - to find generic usage pattern across all experiments, just because experiments are different E.g. ATLAS has its own dataset and metadata catalogs, monitoring –Turnaround time for developing or improving Grid middleware is clearly a concern now! ATLAS commissioning

15 Conclusion 2/3 Are the services running on the VO Box generic? –Some of them, yes (eg. the ‘FTS’ babysitters) They should go to the m/w –But not all of them are: some depend on eg. ATLAS dataset definition, datablock definition, ATLAS metadata model An important point is that even if they were generic there is no uniform (and secure) way to deploy and use them –RB has different security handling from FTS (regarding MyProxy usage)… and these are LCG services!

16 Conclusion 3/3 VO box is a work-around as it stands –It is just a box close to the site services to run our VO services Some of its services may be provided by the grid m/w Work on Dynamic, Secure Service Containers would be welcomed as a long-term solution Developers should focus on baseline services and let each experiment handle more complex usage scenarios with their own VO services