Magda Distributed Data Manager Torre Wenaus BNL October 2001.

Slides:



Advertisements
Similar presentations
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Advertisements

Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
Grid Collector: Enabling File-Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani.
Experience with ATLAS Data Challenge Production on the U.S. Grid Testbed Kaushik De University of Texas at Arlington CHEP03 March 27, 2003.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Grappa: Grid access portal for physics applications Shava Smallen Extreme! Computing Laboratory Department of Physics Indiana University.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
File and Object Replication in Data Grids Chin-Yi Tsai.
PPDG and ATLAS Particle Physics Data Grid Ed May - ANL ATLAS Software Week LBNL May 12, 2000.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
INFNGrid Constanza Project: Status Report A.Domenici, F.Donno, L.Iannone, G.Pucciani, H.Stockinger CNAF, 6 December 2004 WP3-WP5 FIRB meeting.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
29 May 2002Joint EDG/WP8-EDT/WP4 MeetingClaudio Grandi INFN Bologna LHC Experiments Grid Integration Plans C.Grandi INFN - Bologna.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
Magda Distributed Data Manager Status Torre Wenaus BNL ATLAS Data Challenge Workshop Feb 1, 2002 CERN.
US ATLAS Grid Projects Rob Gardner Indiana University Mid Year Review of US ATLAS Computing NSF Headquarters, Arlington VA June 20, 2002
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003.
Magda status and related work in PPDG year 2 Torre Wenaus, BNL/CERN US ATLAS Core/Grid Software Workshop, BNL May 6-7, 2002 CERN.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Atlas Grid Status - part 1 Jennifer Schopf ANL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
Metadata Mòrag Burgon-Lyon University of Glasgow.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.
Cole David Ronnie Julio. Introduction Globus is A community of users and developers who collaborate on the use and development of open source software,
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
ATLAS Magda Distributed Data Manager Torre Wenaus BNL PPDG Robust File Replication Meeting Jefferson Lab January 10, 2002.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
STAR C OMPUTING Plans for Production Use of Grand Challenge Software in STAR Torre Wenaus BNL Grand Challenge Meeting LBNL 10/23/98.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
LCG Distributed Databases Deployment – Kickoff Workshop Dec Database Lookup Service Kuba Zajączkowski Chi-Wei Wang.
Magda Distributed Data Manager Prototype Torre Wenaus BNL September 2001.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Data Management The European DataGrid Project Team
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL DOE/NSF Review of US LHC Software and Computing Fermilab Nov 29, 2001.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
1 Application status F.Carminati 11 December 2001.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
10 March Andrey Grid Tools Working Prototype of Distributed Computing Infrastructure for Physics Analysis SUNY.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
WP5 – Infrastructure Operations Test and Production Infrastructures StratusLab kick-off meeting June 2010, Orsay, France GRNET.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Oxana Smirnova, Jakob Nielsen (Lund University/CERN)
U.S. ATLAS Grid Production Experience
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Module 01 ETICS Overview ETICS Online Tutorials
Presentation transcript:

Magda Distributed Data Manager Torre Wenaus BNL October 2001

Oct 2001 Torre Wenaus, BNL 2 ATLAS PPDG Program  Principal ATLAS Particle Physics Data Grid deliverables:  Year 1: Production distributed data service deployed to users. Will exist between CERN, BNL, and at least four US grid testbed sites (ANL, LBNL, Boston U, Indiana, Michigan, Oklahoma, Arlington)  Year 2: Production distributed job management service  Year 3: Create ‘transparent’ distributed processing capability integrating distributed services into ATLAS software  Magda is focused on the principal PPDG year 1 deliverable.  Draws on grid middleware development while delivering immediately useful capability to ATLAS  This area – looking beyond data storage to the larger issue of data management – has not received much attention in ATLAS up to now  Now changing with the DCs approaching, and Magda is intended to help

Oct 2001 Torre Wenaus, BNL 3  Initiated (as ‘DBYA’) in 3/01 for rapid prototyping of distributed data management. Approach:  A flexible infrastructure allowing quick development of components to support users quickly,  with components later substituted easily by external (eg grid toolkit) tools for evaluation and long term use  Stable operation cataloging ATLAS files since 5/01  Replication incorporated 7/01  Deployed at CERN, BNL, ANL, LBNL  Developers are currently T. Wenaus, W. Deng (BNL) Magda Background Info: The system:

Oct 2001 Torre Wenaus, BNL 4 Architecture & Schema  MySQL database at the core of the system  DB interaction via perl, C++, java, cgi (perl) scripts  C++ and Java APIs autogenerated off the MySQL DB schema  User interaction via web interface and command line  Principal components:  File catalog covering arbitrary range of file types  Data repositories organized into sites and locations  Computers with repository access: a host can access a set of sites  Logical files can optionally be organized into collections  Replication, file access operations organized into tasks  To serve environments from production (DCs) to personal (laptops)

Oct 2001 Torre Wenaus, BNL 5 Architecture Diagram Location Site Location Site Location Site Host 2 Location Cache Disk Site Location Mass Store Site Source to cache stagein Source to dest transfer MySQL Synch via DB Host 1 Replication task Collection of logical files to replicate Spider scp, gsiftp Register replicas Catalog updates

Oct 2001 Torre Wenaus, BNL 6 Files and Collections  Files & replicas  Logical name is arbitrary string, but in practice logical name is filename  In some cases with partial path (eg. for code, path in CVS repository)  Logical name plus virtual organization (=atlas) defines unique logical file  File instances include a replica number  Zero for the master instance; N=locationID for other instances  Notion of master instance is essential for cases where replication must be done off of a specific (trusted or assured current) instance  Not currently supported by Globus replica catalog, incidentally  Collections  Logical collections: arbitrary user-defined set of logical files  Location collections: all files at a given location  Key collections: files associated with a key or SQL query  Catalog is in MySQL; Globus replica catalog loader written but untested

Oct 2001 Torre Wenaus, BNL 7 Distributed Catalog  Catalog of ATLAS data at CERN, BNL (plus ANL, LBNL recently added)  Supported data stores: CERN Castor, CERN stage, BNL HPSS (rftp service), AFS and NFS disk, code repositories, web sites  Current content: TDR data, test beam data, ntuples, code, ATLAS and US ATLAS web content, …  About 150k files currently cataloged representing >2TB data  Has run without problems with ~1.5M files cataloged  ‘Spiders’ crawl data stores to populate and validate catalogs  Catalog entries can also be added or modified directly  ‘MySQL accelerator’ provides good catalog loading performance between CERN and BNL; 2k files cataloged in <1sec (W. Deng)

Oct 2001 Torre Wenaus, BNL 8 Data (distinct from file) Metadata  Keys  User-defined attributes (strings) associated with a logical file  Used to tag physics channels, for example  Logical file versions  Version string associated with logical file distinguishes updated versions  Eg. for source code, version is the CVS version number of the file  ‘Application metadata’ (run/event # etc.) not included  Separate (Grenoble) catalog to be interfaced via logical file  To come: Integration as metadata layer into ‘hybrid’ (ROOT+RDBMS) implementation of ATLAS DB architecture  To come: Data signature (‘object histories’), object cataloging  Longer term R&D

Oct 2001 Torre Wenaus, BNL 9 File Replication  Supports multiple replication tools as needed and available  Automated CERN-BNL replication incorporated 7/01  CERN stage  cache  scp  cache  BNL HPSS  stagein, transfer, archive scripts coordinated via database  Transfers user-defined collections keyed by (e.g.) physics channel  Recently extended to US ATLAS testbed using Globus gsiftp  Currently supported testbed sites are ANL, LBNL, Boston U  BNL HPSS  cache  gsiftp  testbed disk  BNL or testbed disk  gsiftp  testbed disk  gsiftp not usable to CERN; no grid link until CA issues resolved  Plan to try other data movers as well  GDMP (flat file version), bbcp (BaBar), …

Oct 2001 Torre Wenaus, BNL 10 Data Access and Production Support  Command line tools usable in production jobs  getfile – under test  Retrieve file via catalog lookup and (as necessary) staging or (still to come) remote replication  Local soft link to cataloged file instance in a cache or location  Usage count maintained in catalog to manage deletion  releasefile – under development  Removes local soft link, decrements usage count in catalog, deletes instance (optionally) if usage count goes to zero  putfile – under development  Archive output files (eg. in Castor or HPSS) and register them in catalog  Adaptation to support simu production environment  Working with Pavel on simu production scenario  Callable APIs for catalog usage and update to come  Collaboration with David Malon on Athena integration

Oct 2001 Torre Wenaus, BNL 11 Near Term Schedule  Deploy data access/registration command line tools  Create prototype (simu) production scenario using Magda  Incorporate into Saul Youssef’s pacman package manager  Interface to other DC tools (Grenoble etc.)  Apply to cataloging/replication/user access of DC0 data  Adaptation/implementation for DC1  Integration into hybrid DB architecture implementation  Evaluation/use of other grid tools: Globus RC, GDMP, …  Athena integration