Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sept 13 2004Wyatt Merritt Run II Computing Review1 Status of SAMGrid / Future Plans for SAMGrid  Brief introduction to SAMGrid  Status and deployments.

Similar presentations


Presentation on theme: "Sept 13 2004Wyatt Merritt Run II Computing Review1 Status of SAMGrid / Future Plans for SAMGrid  Brief introduction to SAMGrid  Status and deployments."— Presentation transcript:

1 Sept 13 2004Wyatt Merritt Run II Computing Review1 Status of SAMGrid / Future Plans for SAMGrid  Brief introduction to SAMGrid  Status and deployments of SAM  Status and deployments of JIM  Problems encountered/solved/unresolved  The SAMGrid team  SAMGrid services & development plans

2 Sept 13 2004Wyatt Merritt Run II Computing Review2 Introduction to SAMGrid  Data handling system with following capabilities: File storage from online and processing systems Routed file delivery to (possibly parallel) processes Dataset creation based on file metadata Dataset creation based on processing metadata Local and remote job submission User authentication Local and remote monitoring capabilities  Looking at SAM in operation - SAM TV @ DØ SAM TV @ CDFSAM TV @ DØSAM TV @ CDF Currently created from log files Version in development is created from MIS database, filled by new MIS server

3 Sept 13 2004Wyatt Merritt Run II Computing Review3 Status and deployments of SAM  Active stations: 40 DØ (9 @ FNAL) 36(9) 26 CDF (2 @ FNAL) 9(0)  Accomplishments since September 03: DØ Reprocessing Nov03-Jan04 Deployment of V5.1 SAM schema (1 st common version) in production for CDF & DØ Deployment of new access methods to V5.1 schema, new station to match, new python and C++ api’s Station changes in response to operational issues Deployment of new test harness Deployment of new installation procedures Deployment beginning for MINOS New Monitoring & Information Server in testing SAM schema being tested by USCMS Sep 03

4 Sept 13 2004Wyatt Merritt Run II Computing Review4 Station Fixes/Enhancements Sep 03 – Sep 04  Numerous changes for field issues: recent example – reconfiguration of cache replacement algorithm  Dramatic improvement in station ability to handle large request queues - supports much larger projects than 1 yr ago; processes requests an order of magnitude faster overall to catch up with CDF CAF data throughput  Other developments for SAM on CAF  Interface changes that allow for a greater flexibility in station clients. Such as : TCP callbacks, TCP polls (yes, client can be behind one-way firewall and still get files).  Changes required to work with new dbserver  Station integration with SRM – code ready, simple demo performed

5 Sept 13 2004Wyatt Merritt Run II Computing Review5 Usage Statistics for D0 SAM 250K 0 70K D0 CAB GB/month 120K 00 D0 CAB1GB/month 30K 0 D0 Farm GB/monthD0 ALL GB/month Sum = 2.1 PB; 50B evts

6 Sept 13 2004Wyatt Merritt Run II Computing Review6 Sam Statistics - DØ Files delivered by month 19992000200120022003 Run II Begins

7 Sept 13 2004Wyatt Merritt Run II Computing Review7 Usage Statistics for CDF SAM CDF-SAM GB/month 250K 0 30K 0 CDF-SAM GB/day CDF-ALL GB/month CDF-ALL GB/day 0 35K 450K 0 Sum = 1.5 PB; 12B evts

8 Sept 13 2004Wyatt Merritt Run II Computing Review8 Total CDF Files To User 1000 TB 200220032004

9 Sept 13 2004Wyatt Merritt Run II Computing Review9 SAM Statistics - Operations Data  Time between Request Next File and Open File  For CAB and CABSRV1 50% of enstore transfers occur within 10 minutes. 75% within 20 minutes 95% within 1 hour  For CENTRAL-ANALYSIS and CLUED0 95% of enstore transfers within 10 minutes Station CABCABSRV1CLUED0CA % no wait 30%40%38%18%

10 Sept 13 2004Wyatt Merritt Run II Computing Review10 Status and deployments of JIM  Active execution sites: 10 DØ (1 @ FNAL) 0 1 CDF in testing 0  Accomplishments since September 03: Solutions provided for sandboxing and merging Deployment for Monte Carlo production for DØ at multiple sites Testbed established at Fermilab on General Purpose farm In testing for CDF MC production In testing for DØ reprocessing from raw data Demonstration of use of sam_client on LCG site Sep 03

11 Sept 13 2004Wyatt Merritt Run II Computing Review11 The SAM-Grid Team Project Co-Leaders: Wyatt Merritt (CD/Run II); Rick St. Denis (CDF/ U Glasgow) Technical Co-Leaders: Rob Kennedy (CD/CCF); Sinisa Veseli (CD/Run II) Core Developers (SAM components): Lauri Loebel Carpenter, Andrew Baranovski, Steve White, Carmenita Moore,* Adam Lyon, Petr Vokac,*** Mariano Zimmler***, Matt Leslie Core Developers (JIM components): Igor Terekhov,** Gabriele Garzoglio, Sankalp Jain,** Aditya Nishandar** Support for CDF Migration: Fedor Ratnikov, Randy Herber, Art Kreymer, Morag Burgon- Lyon,** Valeria Bartsch, Stefan Stonjek, Krzysztof Genser Database support: Anil Kumar Associated external projects: PPDG, GridPP, SBIR II Core Development Currently @ 7 FTEs * Deceased | ** Left project | *** Summer Students

12 Sept 13 2004Wyatt Merritt Run II Computing Review12 Problems Encountered/Solved/Unresolved  Grid operational stability in DØ MC deployment: initial efficiency 80% took ~ 2 months  Installation difficulty for CDF: doc week, initial config man subproject  followups in both areas  Major operational issues Sep 03 – Sep 04 DØ – hardware problems with production database machine (central point of failure) Dec03-Jan04; 15% drop in file deliv  Contentious design issues Sep 03 – Sep 04 CDF – file name as GUID no change to model CDF – interface into experiment framework work in SAM CDF – communication with dcache work in SAM, future work CDF – use of dimensions and parameters proposed work in SAM CDF – process bookkeeping future work in SAM MINOS – file delivery ordering & grouping no change to model

13 Sept 13 2004Wyatt Merritt Run II Computing Review13 SAMGrid & Grid Services  Distributable sam_client provides access to: VO storage service (sam store command, interfaced to sam_cp) VO metadata service (sam translate constraints) VO replica location service (sam get next file) Process bookkeeping services  JIM components provide: Job submission service via Globus Job Manager, augmented by some VO requirements Job monitoring service from remote infrastructure Authentication services

14 Sept 13 2004Wyatt Merritt Run II Computing Review14 Grid Discussions & Activities  SAMGrid design discussions - ongoing  CDF Grid Workshops – January 04, Florida April 04, FNAL September 04, FNAL  DØ Grid Workshops - April 04, London August 04, Wuppertal  LHC Metadata Working Group – April 04, Glasgow  CMS RTAG input & prototyping activity – August 04  12 CHEP papers on SAMGrid work (6 talks, 6 posters)  FermiGrid discussions – beginning

15 Sept 13 2004Wyatt Merritt Run II Computing Review15 Extra Slides

16 Sept 13 2004Wyatt Merritt Run II Computing Review16 What is SAM?  Data handling system for Run II DØ and CDF  SAM manages file storage (replica catalogs) Data files are stored in tape systems at FNAL and elsewhere (most use ENSTORE at FNAL) Files are cached around the world for fast access  SAM manages file delivery Users at FNAL and remote sites retrieve files out of file storage. SAM handles caching for efficiency You don't care about file locations  SAM manages file meta-data cataloging SAM DB holds meta-data for each file. You don't need to know the file names to get data  SAM manages analysis bookkeeping SAM remembers what files you ran over, what files you processed successfully, what applications you ran, when you ran them and where  Designed for PETABYTE (10 15 ) sized experiment datasets (that's us)!

17 Sept 13 2004Wyatt Merritt Run II Computing Review17 SAM Terms and Concepts  A project runs on a station and requests delivery of a dataset to one or more consumers on that station.  Station: Processing power + disk cache + (connection to tape storage) + network access to SAM catalog and other station caches Example: a linux analysis cluster at D0  Dataset: metadata description which is resolved through a catalog query to a list of files. Datasets are named. Examples: (syntax not exact) data_type physics and run_number 78904 and data_tier raw request_id 5879 and data_tier thumbnail  Consumer: User application (one or many exe instances) Examples: script to copy files; reconstruction job

18 Sept 13 2004Wyatt Merritt Run II Computing Review18 SAM Terms and Concepts  A project runs on a station and requests delivery of a dataset to one or more consumers on that station.  Station: Processing power + disk cache + (connection to tape storage) + network access to SAM catalog and other station caches Example: a linux analysis cluster at D0  Dataset: metadata description which is resolved through a catalog query to a list of files. Datasets are named. Examples: (syntax not exact) data_type physics and run_number 78904 and data_tier raw request_id 5879 and data_tier thumbnail  Consumer: User application (one or many exe instances) Examples: script to copy files; reconstruction job

19 Sept 13 2004Wyatt Merritt Run II Computing Review19 SAM Statistics - Operations Data

20 Sept 13 2004Wyatt Merritt Run II Computing Review20 The following two slides are the two-year work plan presented at last year’s review, updated as follows. Bullets in green are finished. Bullets in blue are in progress. Bullets left black aren’t started yet.

21 Sept 13 2004Wyatt Merritt Run II Computing Review21 SAM-Grid: The work plan for the next 2 years  Implement Schema Update I. (file_type, runs changes)  Automate MC production; understand issues in automating job distribution for re-processing and analysis.  Revise caching strategies. (local vs fileserving; merging operations; connections w/ other layers)  Implement Schema Update II. (processing requirements for jobs, group info)  Equip optimizers and job brokers to deal w/ info in Schema Update II.  Sort out parallelization issues.  Implement Virtual Organization tools.  Implement Monitoring and Information server on the SAM side  Provide for distributed database: two parts, file location info and processing info.; equip servers for more autonomous operation

22 Sept 13 2004Wyatt Merritt Run II Computing Review22 SAM-Grid: The work plan for the next 2 years  Evaluate technology changes/upgrades Improvements for installation/config management? Move to VDT suite (production version of Condor, Globus, etc.) Possible CORBA replacements – WebServices? XML-based logging – will this be the way to go? Which solution for distributed DB’s?  Plan for interoperability Merge SAM catalog w/ other replica schemas? Follow example of DØ/CDF merge? Interoperation with other replica catalogs? GLUE schema for resource description; job description language Sam_batch_adapter technology Working with SRM’s – await outcome of caching strategy discussions Interactions of tools w/ data handling system: cf. mc_runjob & d0tools w/ JIM and CAF(CDF) VO organization issues Security issues (VO, file transfer, job submission)


Download ppt "Sept 13 2004Wyatt Merritt Run II Computing Review1 Status of SAMGrid / Future Plans for SAMGrid  Brief introduction to SAMGrid  Status and deployments."

Similar presentations


Ads by Google