SAM and D0 Grid Computing Igor Terekhov, FNAL/CD.

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

SAM-Grid Status Core SAM development SAM-Grid architecture Progress Future work.
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
High Performance Computing Course Notes Grid Computing.
Rod Walker IC 13th March 2002 SAM-Grid Middleware  SAM.  JIM.  RunJob.  Conclusions. - Rod Walker,ICL.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Meta-Computing at DØ Igor Terekhov, for the DØ Experiment Fermilab, Computing Division, PPDG ACAT 2002 Moscow, Russia June 28, 2002.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
The Sam-Grid project Gabriele Garzoglio ODS, Computing Division, Fermilab PPDG, DOE SciDAC ACAT 2002, Moscow, Russia June 26, 2002.
Workload Management Massimo Sgaravatto INFN Padova.
JIM Deployment for the CDF Experiment M. Burgon-Lyon 1, A. Baranowski 2, V. Bartsch 3,S. Belforte 4, G. Garzoglio 2, R. Herber 2, R. Illingworth 2, R.
26-28 th April 2004BioXHIT Kick-off Meeting: WP 5.2Slide 1 WorkPackage 5.2: Implementation of Data management and Project Tracking in Structure Solution.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
WP-8, ZIB WP-8: Data Handling And Visualization Review Meeting Report Felix Hupfeld, Andrei Hutanu, Andre Merzky, Thorsten Schütt, Brygg Ullmer Zuse-Institute-Berlin.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
September 4,2001Lee Lueking, FNAL1 SAM Resource Management Lee Lueking CHEP 2001 September 3-8, 2001 Beijing China.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
D0RACE: Testbed Session Lee Lueking D0 Remote Analysis Workshop February 12, 2002.
16 September GridPP 5 th Collaboration Meeting D0&CDF SAM and The Grid Act I: Grid, Sam and Run II Rick St. Denis – Glasgow University Act II: Sam4CDF.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 DØ Grid PP Plans – SAM, Grid, Ceiling Wax and Things Iain Bertram Lancaster University Monday 5 November 2001.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
Metadata Mòrag Burgon-Lyon University of Glasgow.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
Adapting SAM for CDF Gabriele Garzoglio Fermilab/CD/CCF/MAP CHEP 2003.
Grid Job, Information and Data Management for the Run II Experiments at FNAL Igor Terekhov et al FNAL/CD/CCF, D0, CDF, Condor team.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
D0 File Replication PPDG SLAC File replication workshop 9/20/00 Vicky White.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Workload Management Workpackage
Distributed Data Access and Resource Management in the D0 SAM System
Wide Area Workload Management Work Package DATAGRID project
Lee Lueking D0RACE January 17, 2002
Production Manager Tools (New Architecture)
Presentation transcript:

SAM and D0 Grid Computing Igor Terekhov, FNAL/CD

Plan of Attack Overview of projects and relevance to d0 remote analysis Distributed aspects of SAM D0 Grid directions and promises D0 Grid involvement of d0race players

SAM and D0 Grid projects These overlap (SAM is de facto a data grid by its design, some people in both) but are different Project type and members SAM is large, mature, strictly CD staff with some D0 (and CDF?) participation D0 Grid is emerging, has newly funded PPDG component, has active offsite D0 collaborators (you) Scopes SAM primarily (historically) deals with central issues, from Enstore to FNAL security issues D0Grid is strictly about distributed issues Both aim at enabling distributed analysis (with foci on data and job handling, respectively and overlap in metadata)

SAM and distributed (remote) analysis Central database for the meta-data catalog Logical (scientific) description of data Physical handling of data (replica catalog, resource allocation and usage) Reliable bookkeeping of data contents and processing history Is however a potential bottleneck and a single point of failure Stations are fully distributed Resource manager (optimizer) to be distributed

SAM as a grid of stations Stations use the replica catalog to retrieve files from anywhere: Although Enstore at FNAL is the first and foremost MSS, other (multiple) MSS’s are supported Stations may (and do) retrieve data from other stations. Replica selection is possible Many different file transfer mechanisms are supported (rcp, bbftp, Globus gridftp, encp, etc), others may be added

Global Data Forwarding and Replication Storage from remote centers is fully operational today: MC import from UK, NL, FR. Data retrieval via interim station is being enabled symmetrically Initial (pilot) implementation being tested with Lancaster, IC London, others are encouraged to join. Limitations in some second-order effects Full-fledged implementation in the works

Routing + Caching = Replication Data Site WAN Data Flow

Routing + Caching = Replication Data Site WAN Data Flow

Data Site WAN Data Flow

Summary of Remote analysis with SAM today One can use the existing station software to process locally (on site) cached data in exactly the same way it’s used on d0mino Replica selection – discourage or disable impossible retrievals, e.g., direct from Enstore; use local MSS when appplicable Initial data caching at the site: use the routing mechanisms to obtain the data e.g. from Enstore via d0mino.

SAM enhancements for the distributed (remote) processing Full-fledged global routing and replication Hierarchy of the resource managers – MSS-specific->site-wide->global De-centralization of the Information services (replica and other MD catalog, CORBA naming service)

D0 Grid Goals Adopt the standard, elemental Grid technologies, to “grid-enable” SAM and D0 Inter-operability with other Grids Job management Logical: specification, description, structuring (MC, chained processing, high-level checkpointing) Physical: scheduling and Resource Management Monitoring (and Information Services) Individual job status Grid (sub)system as a whole (what’s going on?)

D0Grid Deliverables for Analysis Job Description Language (6 months) Describe application in a portable, universal way Multiple stages “chained” (DAGged) Parallelism Handling of output (store, define a dataset, etc)

D0Grid Deliverables for Analysis Submit and execute job (months) Sam submit (0)-> d0grid submit(6) -> Grid submit(24) System will co-locate jobs and data Execution at an island site (network disconnect from FNAL/the rest of D0 grid) (9 months) Whole job brokering (system selects based on data availability, resource usage etc) Advisory service only (6 months) Automated global dispatch (station selection – 9-12 months) Decomposition and global distribution of job parts for ultimate throughput (24 months)

D0Grid Deliverables for Analysis Information Services (SAM internal metadata) decentralized – 9 months Monitor individual jobs, throughput at a site/regional center, or the whole system Submitted->pending->done->etc progress (6m?) Where it was dispatched, remote monitoring of status (6m?) Historical mining (what did I do 2 weeks ago?), including that at an island – 9-12 months Monitor system performance and resource usage (24 months)

How YOU can contribute Use cases, use cases, use cases, before any design is attempted How you want to define your (analysis) job What execution conditions must be created at the target site? (D0 release tree built, binaries/solibs shipped, RCP files copied or accessed via DB) How you want it executed (local/global, parallel, etc. Can you handle distributed job output? Can you combine output? Can you checkpoint your analysis program?) How you want to see what’s going on with the job or with the (sub)system (site)

Summary SAM is a functional datagrid providing distributed data access to D0 D0 Grid aspires to enable globally distributed computing and analysis, on top of SAM as the data delivery system Analysis on the Grid is a top priority Will need your input at earlier stages as well as effort synergies