Magda Distributed Data Manager Prototype Torre Wenaus BNL September 2001.

Slides:



Advertisements
Similar presentations
WP2: Data Management Gavin McCance University of Glasgow November 5, 2001.
Advertisements

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
Grid Collector: Enabling File-Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani.
Experience with ATLAS Data Challenge Production on the U.S. Grid Testbed Kaushik De University of Texas at Arlington CHEP03 March 27, 2003.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Grappa: Grid access portal for physics applications Shava Smallen Extreme! Computing Laboratory Department of Physics Indiana University.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
File and Object Replication in Data Grids Chin-Yi Tsai.
PPDG and ATLAS Particle Physics Data Grid Ed May - ANL ATLAS Software Week LBNL May 12, 2000.
LHCb and DataGRID - the workplan for 2001 Eric van Herwijnen Wednesday, 28 march 2001.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
29 May 2002Joint EDG/WP8-EDT/WP4 MeetingClaudio Grandi INFN Bologna LHC Experiments Grid Integration Plans C.Grandi INFN - Bologna.
Magda Distributed Data Manager Status Torre Wenaus BNL ATLAS Data Challenge Workshop Feb 1, 2002 CERN.
US ATLAS Grid Projects Rob Gardner Indiana University Mid Year Review of US ATLAS Computing NSF Headquarters, Arlington VA June 20, 2002
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
ATLAS Detector Description Database Vakho Tsulaia University of Pittsburgh 3D workshop, CERN 14-Dec-2004.
Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003.
Magda status and related work in PPDG year 2 Torre Wenaus, BNL/CERN US ATLAS Core/Grid Software Workshop, BNL May 6-7, 2002 CERN.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Atlas Grid Status - part 1 Jennifer Schopf ANL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
January 26, 2003Eric Hjort HRMs in STAR Eric Hjort, LBNL (STAR/PPDG Collaborations)
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
ATLAS Magda Distributed Data Manager Torre Wenaus BNL PPDG Robust File Replication Meeting Jefferson Lab January 10, 2002.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
STAR C OMPUTING Plans for Production Use of Grand Challenge Software in STAR Torre Wenaus BNL Grand Challenge Meeting LBNL 10/23/98.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
10 May 2001WP6 Testbed Meeting1 WP5 - Mass Storage Management Jean-Philippe Baud PDP/IT/CERN.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Data Management The European DataGrid Project Team
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL DOE/NSF Review of US LHC Software and Computing Fermilab Nov 29, 2001.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
1 Application status F.Carminati 11 December 2001.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
DOE/NSF Quarterly review January 1999 Particle Physics Data Grid Applications David Malon Argonne National Laboratory
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Magda Distributed Data Manager Torre Wenaus BNL October 2001.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
Oxana Smirnova, Jakob Nielsen (Lund University/CERN)
Replica/File Catalog based on RAM SUNY
U.S. ATLAS Grid Production Experience
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Module 01 ETICS Overview ETICS Online Tutorials
Presentation transcript:

Magda Distributed Data Manager Prototype Torre Wenaus BNL September 2001

Sep 2001 Torre Wenaus, BNL 2 ATLAS PPDG Program  Principal ATLAS Particle Physics Data Grid deliverables:  Year 1: Production distributed data service deployed to users. Will exist between CERN, BNL, and at least four US grid testbed sites (ANL, LBNL, Boston U, Indiana, Michigan, Oklahoma, Arlington)  Year 2: Production distributed job management service  Year 3: Integration of all distributed services into ‘transparent’ distributed processing capability integrated into ATLAS software  This work is focused on the principal PPDG year 1 deliverable.  Enables us to participate in and benefit from grid middleware development while delivering immediately useful capability to ATLAS  Looking beyond data storage to the larger issue of data management has received little attention in ATLAS to date.

Sep 2001 Torre Wenaus, BNL 3  DBYA rapid prototyping tool for distributed data cataloging initiated 3/01 to jump start data manager development  Stable operation cataloging ATLAS files since 5/01  Globus integration: gsiftp in use, replica catalog in progress  Deployed to ANL, LBNL in addition to BNL, CERN 8/01  Good basis for developing the distributed data manager to fulfill the main ATLAS PPDG year 1 deliverable  DBYA now branching into Magda (MAnager for Grid-based DAta) to be developed as this production data manager  DBYA itself reverts to a rapid prototyping tool (sandbox)  Developers are currently T. Wenaus, W. Deng (BNL) DBYA  Magda See

Sep 2001 Torre Wenaus, BNL 4 Architecture & Schema  Partly based on NOVA distributed analysis project used in STAR  MySQL database at the core of the system  DB interaction via perl, C++, java, cgi (perl) scripts  C++ and Java APIs autogenerated off the MySQL DB schema  User interaction via web interface and unix commands  Principal components  File catalog covering arbitrary range of file types  Data repositories organized into sites and locations  Computers with repository access: a host can access a set of sites  Logical files can optionally be organized into collections  Replication, file access operations organized into tasks  To serve environments from production (DCs) to personal (laptops)

Sep 2001 Torre Wenaus, BNL 5 Architecture Diagram Location Site Location Site Location Site Host 2 Location Cache Disk Site Location Mass Store Site Source to cache stagein Source to dest transfer MySQL Synch via DB Host 1 Replication task Collection of logical files to replicate Spider scp, gsiftp Register replicas Catalog updates

Sep 2001 Torre Wenaus, BNL 6 Files and Collections  Files  Logical name is filename, without path except for CVS-based files (code) and web files for which logical name includes path within repository  Logical name plus virtual organization defines unique logical file in system  File instances include a replica number  Zero for the master instance; N=locationID for other instances  Notion of master instance is essential for cases where replication must be done off of a specific (trusted or assured current) instance  Collections  Several types supported:  Logical collections: arbitrary user-defined set of logical files  Location collections: all files at a given location  Key collections: files associated with a key or SQL query

Sep 2001 Torre Wenaus, BNL 7 Distributed Catalog  Catalog of ATLAS data at CERN, BNL  Supported data stores: CERN Castor, CERN stage, BNL HPSS (rftp service), AFS and NFS disk, code repositories, web sites  Current content: TDR data, test beam data, ntuples, code, ATLAS and US ATLAS web content, …  Some content (code, web) added more to jack up the file count to test scalability and file type diversity than because they represent useful content at present  About 300k files cataloged representing >2TB data  Has also run without problems with ~1.5M files cataloged  Running stably since May ’01  Coverage recently extended to ANL, LBNL  Other US ATLAS testbed sites should follow soon  ‘Spiders’ at all sites crawl the data stores to populate and validate catalogs  ‘MySQL accelerator’ implemented to improve catalog loading performance between CERN and BNL by >2 orders of magnitude; 2k files cataloged in 2 orders of magnitude; 2k files cataloged in <1sec

Sep 2001 Torre Wenaus, BNL 8 Data (distinct from file) Metadata  Keys  Arbitrary user-defined attributes (strings) associated with a logical file  Used to tag physics channels, histogram files, etc.  Logical file versions  Version string can be associated with logical file to distinguish updated versions of a file  Currently in use only for source code (version is the CVS version number of the file)  Data signature, object cataloging  Coming R&D; nothing yet

Sep 2001 Torre Wenaus, BNL 9 File Replication  Supports multiple replication tools as needed and available  Automated CERN-BNL replication incorporated 7/01  CERN stage  cache  scp  cache  BNL HPSS  stagein, transfer, archive scripts coordinated via database  Transfers user-defined collections keyed by (e.g.) physics channel  Just extended to US ATLAS testbed using Globus gsiftp  Currently supported testbed sites are ANL, LBNL, Boston U  BNL HPSS  cache  gsiftp  testbed disk  BNL or testbed disk  gsiftp  testbed disk  gsiftp not usable to CERN; no available grid node there (til ’02?!)  Will try GridFTP, probably also GDMP (flat files) when available

Sep 2001 Torre Wenaus, BNL 10 Data Access Services  Command line tools usable in production jobs: under test  getfile  Retrieve file via catalog lookup and (as necessary) staging or (still to come) remote replication  Local soft link to cataloged file instance in a cache or location  Usage count maintained in catalog to manage deletion  releasefile  Removes local soft link, decrements usage count in catalog, deletes instance (optionally) if usage count goes to zero  Callable APIs for catalog usage and update: to come  Collaboration with David Malon on Athena integration

Sep 2001 Torre Wenaus, BNL 11 Near Term Schedule  Complete and deploy data access services (2 wks)  Globus integration and feedback  Remote command execution (2 wks)  Test Globus replica catalog integration (1 mo)  Incorporate into Saul Youssef’s pacman package manager (1 mo)  ATLAS framework (Athena) integration (with D.Malon) (2 mo)  Look at GDMP for replication (requires flat file version) (2 mo)  Application and testing in coming DCs, at least in US  Offered to intl ATLAS (Gilbert, Norman) as a DC data management tool  Design and development of metadata and data signature (6 mo)  Study scalability of single-DB version, and investigate scalability via multiple databases (9 mo)