Download presentation
Presentation is loading. Please wait.
Published byRodney Riley Modified over 9 years ago
1
18 Feb 2004Computing Division Project Status Report1 Project Status Report : SAMGrid SAMGrid Management, Status, Operations – Merritt SAMGrid Development I. – Veseli SAMGrid Development II. – Kennedy SAMGrid Future Plans – St. Denis
2
18 Feb 2004Computing Division Project Status Report2 SAMGrid Project Description Purpose: Provide data handling services to Run II experiments and other interested experiments with similar problems. These services should scale in performance and convenience for cataloging and delivery of Petabyte-sized datasets, and should evolve to availability in relevant Grid environments. Current stakeholders: CDF, DØ, MINOS, CD Duration: Development effort is expected to extend through ’07 as the components move to become Grid services. High-level maintenance ( I.e., effort that includes capability to respond to feature requests) is expected to continue through at least the data collection lifetime of the stakeholders.
3
18 Feb 2004Computing Division Project Status Report3 The SAM-Grid Team Revised Management Plan went into effect Dec 03 Project Co-Leaders: Wyatt Merritt CD/DØCA Rick St. Denis CDF/ U Glasgow Project Technical Co-Leaders: Rob Kennedy CD/CDF Sinisa Veseli CD/DØCA CCF:Andrew Baranovski, Gabriele Garzoglio, Igor Terekhov CEPA: Carmenita Moore, Steve White (0.5 FTE) CDF:Randy Herber, Art Kreymer, Stefan Stonjek (GS) DØCA:Lauri Loebel Carpenter, Robert Illingworth, Adam Lyon
4
18 Feb 2004Computing Division Project Status Report4 The SAM-Grid Team - Extended Database support (CSS-DSG): Diana Bonham, Anil Kumar Associated internal projects: RUNJOB (with CMS) Authorization Project (with CMS, still being defined) Associated external projects: PPDG Sankalp Jain, Aditya Nishandar GridPP Morag Burgon-Lyon, Valeria Bartsch, Iain Bertram, Dave Evans, Peter Love SBIR II Matt Vranicar, Jeremy Simmons, Josh Gramlich, Ngan MacDonald, John Grace
5
18 Feb 2004Computing Division Project Status Report5 SAMGrid Project Management & Organization Project co-leaders Represent largest stakeholders: requirements & priorities Run weekly design meetings Project technical leaders Run weekly operations meeting Conduct subproject assessments Active Subprojects: C++ API, DBServer, JIM, H Stream Reco for CDF, Caching, Chains&Links, CDF DFC, Test Harness, Linux deploy of DBServers, Config Man Planned Subprojects: Request system, Autodest, Further monitoring (MIS) Related Subprojects: d0tools, SBIR II, Condor mods, workflow packages for CDF & D0, Authorization & Accounting Recently completed Subprojects: Python API, V5.1 Schema Design, Batch Adapter, D0 Online dcache TDP, 1 st Gen Monitoring Tools, Data Dimensions Grammar
6
18 Feb 2004Computing Division Project Status Report6 SAMGrid Components Event/File Catalog for metadata (contents & processing) and locations Dbservers for accessing catalog Station servers for file delivery to projects Optimizer File storage server Interface to station cache and MSS (samcp) JIM components for Grid job submission & monitoring User API C++ client API
7
18 Feb 2004Computing Division Project Status Report7 Status and deployments of SAMGrid For DØ Operational @ FNAL: online, reco farm, d0mino, cab, new cab, clued0 Operational @ Monte Carlo production sites Operational @ remote analysis sites: ~20 active, ~40 deployed Operational 11/03 – 2/04 for remote reconstruction: IN2P3, UKGrid (Manchester/ICL/RAL), WestGRID, GridKA, NIKHEF -- 97M events reprocessed remotely Stats: ~78K proj FNAL, >14K proj remote (since 1/1/03) 60 billion evts, 3 PB, 8 M files consumed (all D0 stations)
8
18 Feb 2004Computing Division Project Status Report8 D0
9
18 Feb 2004Computing Division Project Status Report9 D0 Files 4000-8000 Files/Day
10
18 Feb 2004Computing Division Project Status Report10 D0 Files Per Month By Year 19992000200120022003 100,000 files Run II Start
11
18 Feb 2004Computing Division Project Status Report11 D0 Total Files 2.5Million Files Served
12
18 Feb 2004Computing Division Project Status Report12 D0 Total Data Moved 700TB moved
13
18 Feb 2004Computing Division Project Status Report13 Operational 24/7 to store online metadata Operational at remote stations: ~15 active, ~30 deployed Large recent increase: Fla. Wkshp! In testing for Monte Carlo production File delivery tests up to 20 TB on testcaf Statistics: ~3000 proj total (since 1/1/03) Note CDF usage pattern is different from DØ: CDF moves more GB (but not more events) because it does not use small summary format like DØ thumbnail. Status and deployments of SAMGrid For CDF
14
18 Feb 2004Computing Division Project Status Report14 CDF Florida DH Workshop 11 installations in about 2 hours. Integrated with dCAF in 2 cases in 2 days. 3 in Asia, 4 in Europe 6 sites committed to summer 2004 usage of their facilities for all of CDF (mostly MC) Sam installation now: initsam cdf Follow-up on April 1. Each site has a local user support person to reduce load on core development team. Generally: Security ate 80% of the effort! Now 20!
15
18 Feb 2004Computing Division Project Status Report15
16
18 Feb 2004Computing Division Project Status Report16 2TB/Day: Karlsruhe
17
18 Feb 2004Computing Division Project Status Report17 CDF Dcache on CAF ALL CDF on CAF reads 20TB/Day
18
18 Feb 2004Computing Division Project Status Report18
19
18 Feb 2004Computing Division Project Status Report19
20
18 Feb 2004Computing Division Project Status Report20 Job broker, execution and submission site software, job monitor, client software for grid job submission Deployment plan for DØ Monte Carlo Test at 3 sites (Manchester, CCIN2P3, Wisconsin) with basic functionality and measure efficiency of job completion Verify use by experimenter for job submission (this week) Add merging Move to production at these 3 sites (DØ milestone: Mar 1) Add remainder of DØ MC sites (Lancaster, SAR, NIKHEF, Prague) Improve brokering algorithm Status and deployments of JIM ManchesterCCIN2P3Wisconsin JIM eff >99% Site eff x Code eff ~85%~60%
21
18 Feb 2004Computing Division Project Status Report21 JIM Issues Site operational requirements (e.g. clock synch, disk & node reliability, OS issues) Experiment operational requirements (e.g. code footprint may exceed site capability and is variable w/ release) File transfer capabilities & policies: cf. mtg this week w/ GridKA rep Allocation of services to head node vs worker nodes Sandboxing mechanisms (last week design mtg) Merging mechanism, brokering (this week design mtg)
22
18 Feb 2004Computing Division Project Status Report22 Operational Model Experiments provide shifters for 1 st line problem fielding and solving Project provides on-call list from developers At DØ, on average ~60 – 80% of problems are answered by shifters Classes of problems Routine jobs like adding info to database Less routine: cleanup after failed stores Answering user questions regarding usage Updating documentation Investigating user reports of problems, and problems visible in project monitoring tools Providing solutions for problems
23
18 Feb 2004Computing Division Project Status Report23 Operations Outlook Improve documentation with aim of improving shifter & user ability to diagnose/solve problems Expect doubling of central station capacity at DØ Expect transition to more SAM usage at CDF Expect Grid operations in production for simulation, first at DØ then at CDF
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.