David Colling GridPP Edinburgh 6th November 2001 SAM... an overview (Many thanks to Vicky White, Lee Lueking and Rod Walker)

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

Physics with SAM-Grid Stefan Stonjek University of Oxford 6 th GridPP Meeting 30 th January 2003 Coseners House.
SAM-Grid Status Core SAM development SAM-Grid architecture Progress Future work.
Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
ICHEP visit to NIKHEF. D0 Monte Carlo farm hoeve (farm) MC request schuur (barn) SAM MHz 2-CPU nodes 50 * 40 GB 1.2 TB.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
A Computation Management Agent for Multi-Institutional Grids
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Meta-Computing at DØ Igor Terekhov, for the DØ Experiment Fermilab, Computing Division, PPDG ACAT 2002 Moscow, Russia June 28, 2002.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
The Sam-Grid project Gabriele Garzoglio ODS, Computing Division, Fermilab PPDG, DOE SciDAC ACAT 2002, Moscow, Russia June 26, 2002.
Workload Management Massimo Sgaravatto INFN Padova.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
SAMGrid – A fully functional computing grid based on standard technologies Igor Terekhov for the JIM team FNAL/CD/CCF.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Deploying and Operating the SAM-Grid: lesson learned Gabriele Garzoglio for the SAM-Grid Team Sep 28, 2004.
SAM Job Submission What is SAM? sam submit …… Data Management Details. Conclusions. Rod Walker, 10 th May, Gridpp, Manchester.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
September 4,2001Lee Lueking, FNAL1 SAM Resource Management Lee Lueking CHEP 2001 September 3-8, 2001 Beijing China.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
SAM and D0 Grid Computing Igor Terekhov, FNAL/CD.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
ORBMeeting July 11, Outline SAM Overview and Station description Resource Management Station Cache Station Prioritized Fair Share Job Control File.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
D0RACE: Testbed Session Lee Lueking D0 Remote Analysis Workshop February 12, 2002.
Dzero MC production on LCG How to live in two worlds (SAM and LCG)
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 DØ Grid PP Plans – SAM, Grid, Ceiling Wax and Things Iain Bertram Lancaster University Monday 5 November 2001.
CLRC and the European DataGrid Middleware Information and Monitoring Services The current information service is built on the hierarchical database OpenLDAP.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
Lee Lueking 1 The Sequential Access Model for Run II Data Management and Delivery Lee Lueking, Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo,
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Adapting SAM for CDF Gabriele Garzoglio Fermilab/CD/CCF/MAP CHEP 2003.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
D0 File Replication PPDG SLAC File replication workshop 9/20/00 Vicky White.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Introduction to the SAM System at DØ Physics 5391 July 1, 2002 Mark Sosebee U.T. Arlington.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Overview of gLite, the EGEE middleware Mike Mineter Training Outreach Education National.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Accessing the VI-SEEM infrastructure
Workload Management Workpackage
Distributed Data Access and Resource Management in the D0 SAM System
The DZero/PPDG D0/PPDG mission is to enable fully distributed computing for the experiment, by enhancing SAM as the distributed data handling system of.
Presentation transcript:

David Colling GridPP Edinburgh 6th November 2001 SAM... an overview (Many thanks to Vicky White, Lee Lueking and Rod Walker)

David Colling GridPP Edinburgh 6th November 2001 SAM stands for Sequential Access to Data via Metadata. Where sequential refers to the events stored within files. Lauri Loebel-Carpenter, Lee Lueking*, Carmenita Moore, Igor Terekhov, Julie Trumbo, Sinisa Veseli, Matthew Vranicar, Stephen P. White, Victoria White*. (*project leaders) The current SAM development team include: Recently some work in the UK by Rod Walker

David Colling GridPP Edinburgh 6th November 2001 History of SAM Project started in 1997 Built for the DØ virtual organisation (~500 physicists, 72 institutions, 18 countries) SAMs objectives are: to provide a world wide system of shareable computing and storage resources. So providing a solution to the common problem of extracting physics results from about a Petabyte of data (c. 2003) to provide a large degree of transparency to the user. Who makes requests for datasets, submits jobs and stores files (together with extensive metadata about the processing steps etc.)

David Colling GridPP Edinburgh 6th November 2001 Currently SAMs storage and delivery of data is far more advanced than its job submission. SAM is an operational prototype of many of the concepts being developed for Grid computing.

David Colling GridPP Edinburgh 6th November 2001 Database Server(s) (Central Database) Name Server Global Resource Manager(s) Log server Station 1 Servers Station 2 Servers Station 3 Servers Station n Servers Mass Storage System(s) Shared Globally Local Shared Locally Arrows indicate Control and data flow Overview of SAM

David Colling GridPP Edinburgh 6th November 2001 Name Sever allows all components to find each other by name The Database server has numerous methods which process transactions and retrieve information from the central database The Resource manager control efficient use of resources such as tape stores The Log server gathers information from the entire system for monitoring and debugging All communication is via CORBA

David Colling GridPP Edinburgh 6th November 2001 The SAM station A SAM station is deployed on local processing platforms A station is unshared outside its set of CPU and disk resources. Stations can communicate directly with each other, and data cached at one stations cache can be replicated at other stations upon demand. Local groups of stations can, at a physical site, can share a locally available mass storage system (e.g. FermiLab)

David Colling GridPP Edinburgh 6th November 2001 The SAM station The stations resposibilities include: Storing and retrieving data files from mass storage and other stations. Managing data stored on cache disk. Launching Project managers which oversee the processing of data requests by consumers in well defined projects. All these functions are provided by the servers within a station.(See next slide)

David Colling GridPP Edinburgh 6th November 2001 File Stager(s) Station & Cache Manager File Storage Server Project Managers /Consumers eworkers File Storage Clients MSS or Other Station MSS or Other Station Data flow Control Producers/ Cache Disk Temp Disk The SAM Station

David Colling GridPP Edinburgh 6th November 2001 The SAM Station The Station Manager oversees the removal of filescached on disk, and instructs the File Stager to add new files. All processing projects are started through the Station Server which starts Project Managers. Files are added to the system through the File Storage Server (FSS), which uses the Stagers to initiate transfers to the available MSS or another station.

David Colling GridPP Edinburgh 6th November 2001 A Station Job Manager provides services to execute a user application, script, or series of jobs, potentially as parallel processe either interactively or by use of a local batch system. Currently supported are LSF and FBS, Condor and PBS adapters are under constructed and are being tested. The station Cache Manager and Job Manager are implemented as a single Station Master server. Job submission and synchronization between job execution and data delivery is currently part of SAM. Jobs are put on hold in batch system queues until data files are available to the job. At present jobs submitted at one station may only be run using the batch system(s) available at that Station. The SAM Station

David Colling GridPP Edinburgh 6th November 2001 The User Interface UIs are provided add data, access data, set configurations parameters and monitor the system. These take the forms of Unix command line, Web GUIs and Python API. There is also a C++ interface for accessing data through a standard DØ framework package.

David Colling GridPP Edinburgh 6th November 2001 Defining a dataset

David Colling GridPP Edinburgh 6th November 2001 Examining a predefined dataset

David Colling GridPP Edinburgh 6th November 2001 Querying Cached Files

David Colling GridPP Edinburgh 6th November 2001 The SAM station Real Data files from FNAL MC files from NIKHEF

David Colling GridPP Edinburgh 6th November 2001 The SAM station sam submit --defname=run129194_reco --cpu-per-event=2m --group=dzero --batch-system-flags="--universe=vanilla --output=condor.out --log=condor.log --error=condor.error --initialdir=/home/walker/TestSam/blife/BLifetime_x-run13264x_reco_p arguments='-rcp framework.rcp -input_file SAMInput: -output_file outputfile -out BLifetime_x.out -log BLifetime_x.log -time -mem'" --framework-exe=./BLifetime_x The SAM submit command Starts project and submits job to Condor BS

David Colling GridPP Edinburgh 6th November 2001 MSU Columbia UTA 64 Lyon/IN2P3 100 Prague 32 Imperial College Lancaster 200 NIKHEF 50 Fermilab SuperJanet SURFnet ESnet Abilene = MC production centers The DØ SAM World Also a UCL-CDF-test station

David Colling GridPP Edinburgh 6th November 2001 SAM Works now! #Transfers initiated between 9:30 and 12:30 (Thursday 25 Oct 2001) | from station | to station | #files | tot_size (KB)| | ccin2p3-analysis | central-analysis | 51 | | central-analysis | clued0 | 43 | | central-analysis | enstore | 138 | | central-analysis | imperial-test | 19 | | datalogger-d0olb | enstore | 54 | | datalogger-d0olc | enstore | 34 | | enstore | central-analysis | 20 | | enstore | clued0 | 20 | | enstore | linux-analysis-cluster-1 | 27 | | hoeve | central-analysis | 67 | | lancs | central-analysis | 21 | | prague-test-station | central-analysis | 2 | | uta-hep | central-analysis | 5 |

David Colling GridPP Edinburgh 6th November 2001 Compute systems and Storage systems in US – Fermilab, UTA, Columbia, MSU, France/Lyon-IN2P3, UK/Lancaster and Imperial College, Netherlands/NIKHEF, Czech Republic/Prague Many other sites are expected to provide additional compute and storage resources when the experiment moves from commissioning to physics data taking. Storage systems consist of disk storage elements at all locations and robotically controlled tape libraries at Fermilab, Lyon and Nikhef and Lancaster (almost) All storage elements support the basic functions of storing or retrieving a file. Some support parallel transfer protocols, currently via bbftp The underlying storage management systems for tape storage elements are different at Fermilab, Lyon and Nikhef. Fermilab tape storage management system, Enstore, provides the ability to assign priorities and file placement instructions to file requests and provides reports about placement of data on tape, queue wait time, transfer time and other information that can be used for resource management. The Fabric

David Colling GridPP Edinburgh 6th November 2001 Interim Conclusions SAM is a sophisticated tool for data transfer, and a less sophisticated tool for job submission. SAM works now, and has real users! SAM is an operational prototype of many of the concepts being developed for Grid computing.

David Colling GridPP Edinburgh 6th November 2001 Interim Conclusions However, significant parts of SAM will have to be enhanced (or replaced) before it can truly claim to be a data grid. This work will happen as part of the Particle Physics Data Grid (PPDG) project. Current status will be in black, planned enhancements will be in bold red. The following slides are extracts from Vicky Whites Talk SAM and PPDG CHEP 2001

Fabric Tape Storage Elements Request Formulator and Planner Client Applications Compute Elements Indicates component that will be replaced Disk Storage Elements LANs and WANs Resource and Services Catalog Replica Catalog Meta-data Catalog Authentication and Security GSI SAM-specific user, group, node, station registrationBbftp cookie Connectivity and Resource CORBAUDP File transfer protocols - ftp, bbftp, rcp GridFTP Mass Storage systems protocols e.g. encp, hpss Collective Services Catalog protocols Significant Event LoggerNaming ServiceDatabase ManagerCatalog Manager SAM Resource Management Batch Systems - LSF, FBS, PBS, Condor Data Mover Job Services Storage ManagerJob ManagerCache ManagerRequest Manager Dataset Editor File Storage ServerProject MasterStation Master Web Python codes, Java codes Command line D0 Framework C++ codes StagerOptimiser Code Repostory Name in quotes is SAM-given software component name or addedenhancedusing PPDG and Grid tools

David Colling GridPP Edinburgh 6th November 2001 Enhancing SAM The Job Manager is limited and can only submit to local resources. The specification of user jobs, including their characteristics and input datasets, is a major component of the PPDG work. The intention is to provide Grid job services components that replace the SAM job services components. This will support job submission (including composite and parallel jobs) to suitable SAM Station(s) and eventually any available Grid computing resource.

David Colling GridPP Edinburgh 6th November 2001 Unix user names, physics groups, nodes, domains and stations are registered. Valid combinations of these must be provided to obtain services. Station servers at one station provide service on behalf of their local users and are trusted by other Station servers or Database Servers. Globus core Security Infrastructure services is a planned PPDG enhancement of the system. Service registration and discovery is implemented using a CORBA naming service, with namespace by station name. APIs to services in SAM are all defined using CORBA Interface Definition Language and have multiple language bindings (C++, Python, Java) and, in many cases, a shell interface. Use of GridFTP and other standard protocols to access storage elements is a planned PPDG modification to the system. Integration with grid monitoring tools and approaches is a PPDG area of research. Registration of resources and services using a standardized Grid registration or enquiry protocol is a PPDG enhancement to the system. Enhancing SAM

David Colling GridPP Edinburgh 6th November 2001 Database Managers provide access to the Replica Catalog, Metadata Catalog, SAM Resource and configuration catalog and Transformation catalog. All catalogs currently are tables in a central Oracle database; a matter that is hidden from their clients. Replication of some catalogs in two or more locations worldwide is a planned enhancement to the system. Database managers will need to be enhanced to adapt SAM-specific APIs and catalog protocols onto Grid catalog APIs using PPDG- supported Grid protocols so that information may be published and retrieved in the wider Physics Data Grid that spans several virtual organizations. A central Logging server receives significant events. This will be refined to receive only summary level information, with more detailed monitoring information held at each site. Work in the context of PPDG will examine how to use a Grid Monitoring Architecture and tools. Enhancing SAM

David Colling GridPP Edinburgh 6th November 2001 Resource manager services are provided by an Optimization service. File transfer actions are prioritized and authorized prior to being executed. The current primitive functionality of re-ordering and grouping file requests, primarily to optimize access to tapes, will need to be greatly extended, redesigned and re-implemented to better deal with co-location of data with computing elements and fair-shares and policy- driven use of all computing, storage and network resource. This is a major component of the SAM/PPDG work, to be carried out in collaboration with the Condor team. Enhancing SAM

David Colling GridPP Edinburgh 6th November 2001 Enhancing SAM Other enhancement also needed for scalability e.g. relies on a single Oracle database, which is a single point of failure. Needs replication/cache. Etc etc...

David Colling GridPP Edinburgh 6th November 2001 Conclusions SAM already does a lot and planned enhancements will give it far greater functionality.