Download presentation
Presentation is loading. Please wait.
Published byRobyn Skinner Modified over 9 years ago
1
Grid Testbed Activities in US-CMS Rick Cavanaugh University of Florida 1. Infrastructure 2. Highlight of Current Activities 3. Future Directions NSF/DOE Review LBNL, Berkeley 14 January, 2003
2
14.01.2003NSF/DOE Review2 l Fermilab –1+5 PIII dual 0.700 GHz processor machines l Caltech –1+3 AMD dual 1.6 GHz processor machines l San Diego –1+3 PIV single 1.7 GHz processor machines l Florida –1+5 PIII dual 1 GHz processor machines l Wisconsin –5 PIII single 1 GHz processor machines l Total: l ~41 1 GHz dedicated processors l Operating System: Red Hat 6 –Required for Objectivity US-CMS Development Grid Testbed
3
14.01.2003NSF/DOE Review3 US-CMS Integration Grid Testbed l Fermilab –40 PIII dual 0.750 GHz processor machines l Caltech –20 dual 0.800 GHz processor machines –20 dual 2.4 GHz processor machines l San Diego –20 dual 0.800 GHz processor machines –20 dual 2.4 GHz processor machines l Florida –40 PIII dual 1 GHz processor machines l CERN (LCG site) –72 dual 2.4 GHz processor machines l Total: l 240 0.85 GHz processors: Red Hat 6 l 152 2.4 GHz processors: Red Hat 7 UCSD Florida Caltech Fermilab CERN
4
14.01.2003NSF/DOE Review4 DGT Participation by other CMS Institutes Encouraged! UCSD Florida Caltech Fermilab Wisconsin MIT Rice Minnesota Belgium Brazil South Korea Expression of interest:
5
14.01.2003NSF/DOE Review5 Grid Middleware l Testbed Based on the Virtual Data Toolkit 1.1.3 –VDT Client: –Globus Toolkit 2.0 –Condor-G 6.4.3 –VDT Server: –Globus Toolkit 2.0 –mkgridmap –Condor 6.4.3 –ftsh –GDMP 3.0.7 l Virtual Organisation Management –LDAP Server deployed at Fermilab –Contains the DN’s for all US-CMS Grid Users –GroupMAN (from PPDG and adapted from EDG) used to manage the VO –Investigating/evaluting the use of VOMS from the EDG –Use D.O.E. Science Grid certificates –Accept EDG and Globus certificates
6
14.01.2003NSF/DOE Review6 Non-VDT Software Distribution l DAR (can be installed “on the fly”) –CMKIN –CMSIM –ORCA/COBRA –Represents a crucial step forward in CMS distributed computing! l Working to deploy US-CMS Pacman Caches for: –CMS Software (DAR, etc) –All other non-VDT Software required for the Testbed –GAE/CAIGEE (Clarens, etc), GroupMAN, etc
7
14.01.2003NSF/DOE Review7 l MonaLisa (Caltech) –Currently deployed on the Test-bed –Dynamic information/resource discovery mechanism using agents –Implemented in >Java / Jini with interfaces to SNMP, MDS, Ganglia, and Hawkeye >WDSL / SOAP with UDDI –Aim to incorporate into a “Grid Control Room” Service for the Testbed Monitoring and Information Services
8
14.01.2003NSF/DOE Review8 Other Monitoring and Information Services l Information Service and Config. Monitoring: MDS (Globus) –Currently deployed on the Testbed in a hierarchical fashion –Aim to deploy the GLUE Schema when released by iVDGL/DataTAG –Developing API's to and from MonaLisa l Health Monitoring: Hawkeye (Condor) –Leverages the ClassAd system of collecting dynamic information on large pools –Will soon incorporate Heart Beat Monitoring of Grid Services –Currently deployed at Wisconsin and Florida
9
14.01.2003NSF/DOE Review9 Existing US-CMS Grid Testbed Client-Server Scheme User VDT ClientVDT Server Monitoring
10
14.01.2003NSF/DOE Review10 Existing US-CMS Grid Testbed Client-Server Scheme Replica Management Storage Resource User Compute Resource Reliable Transfer VDT ClientVDT Server Monitoring
11
14.01.2003NSF/DOE Review11 Existing US-CMS Grid Testbed Client-Server Scheme Replica Management Storage Resource Virtual Data Sys. Executor User Compute Resource Reliable Transfer VDT ClientVDT Server MOP Monitoring
12
14.01.2003NSF/DOE Review12 Existing US-CMS Grid Testbed Client-Server Scheme Replica Management Storage Resource Virtual Data Sys. Executor User Compute Resource Reliable Transfer VDT ClientVDT Server MOP Monitoring PerformanceHealthInfo.&Config.
13
14.01.2003NSF/DOE Review13 Replica Management Storage Resource Virtual Data Sys. Executor User DAGMan Condor-G / Globus Local Grid Storage Compute Resource Replica Catalogue Reliable Transfer Globus GRAM / Condor Pool ftsh wrapped GridFTP GDMP VDT ClientVDT Server MOP Monitoring PerformanceHealthInfo.&Config. MonaLisaMDSHawkeye Existing US-CMS Grid Testbed Client-Server Scheme
14
14.01.2003NSF/DOE Review14 Existing US-CMS Grid Testbed Client-Server Scheme Replica Management Storage Resource Virtual Data Sys. Executor User DAGMan Condor-G / Globus Local Grid Storage Compute Resource Replica Catalogue Reliable Transfer Globus GRAM / Condor Pool ftsh wrapped GridFTP GDMP VDT ClientVDT Server MOP mop_submitter Monitoring PerformanceHealthInfo.&Config. MonaLisaMDSHawkeye
15
14.01.2003NSF/DOE Review15 Replica Management Storage Resource Virtual Data Sys. Executor User Virtual Data Catalogue Concrete Planner Abstract Planner DAGMan Condor-G / Globus Local Grid Storage Compute Resource Replica Catalogue Reliable Transfer Globus GRAM / Condor Pool ftsh wrapped GridFTP GDMP VDT ClientVDT Server MOP Monitoring PerformanceHealthInfo.&Config. MonaLisaMDSHawkeye Existing US-CMS Grid Testbed Client-Server Scheme
16
14.01.2003NSF/DOE Review16 User ClientServer Monitoring Storage Resource Relational Database Performance MonaLisa Data Analysis ROOT/ Clarens Data Movement Clarens ROOT files Existing US-CMS Grid Testbed Client-Server Scheme
17
14.01.2003NSF/DOE Review17 Commissioning the Development Grid Testbed with "Real Production" l MOP (from PPDG) Interfaces the following into a complete prototype: –IMPALA/MCRunJob CMS Production Scripts –Condor-G/DAGMan –GridFTP –( mop_submitter is generic) l Using MOP to "commission" the Testbed –Require large scale, production quality results! >Run until the Testbed "breaks" >Fix Testbed with middleware patches >Repeat procedure until the entire Production Run finishes! –Discovered/fixed many fundamental grid software problems in Globus and Condor-G (close cooperation with Condor/Wisconsin) >huge success from this point of view alone VDT Client VDT Server 1 MCRunJob DAGMan/ Condor-G Condor GridFTP VDT Server N Condor GridFTP Globus GridFTP mop-submitter LinkerScriptGen Config Req. Self Desc. Master Globus
18
14.01.2003NSF/DOE Review18 Integration Grid Testbed Success Story l Production Run Status for the IGT MOP Production –Assigned 1.5 million events for “eGamma Bigjets” >~500 sec per event on 750 MHz processor; all production stages from simulation to ntuple –2 months continuous running across 5 testbed sites l Demonstrated at Supercomputing 2002
19
14.01.2003NSF/DOE Review19 Integration Grid Testbed Success Story l Production Run Status for the IGT MOP Production –Assigned 1.5 million events for “eGamma Bigjets” >~500 sec per event on 750 MHz processor; all production stages from simulation to ntuple –2 months continuous running across 5 testbed sites l Demonstrated at Supercomputing 2002 1.5 Million Events Produced ! (nearly 30 CPU years)
20
14.01.2003NSF/DOE Review20 Interoperability work with EDG/DataTAG (1-1) Stage-in/out jobmanger grid015.pd.infn.it/jobmanager-fork (SE) or grid011.pd.infn.it/jobmanager-lsf-datatag (CE) (1-2) GLOBUS_LOCATION=/opt/globus (1-3) Shared directory for mop files: /shared/cms/MOP (on SE and NFS exported to CE) (2-1) Run jobmanager: grid011.pd.infn.it/jobmanager-lsf-datatag (2-2) location of CMS DAR installation: /shared/cms/MOP/DAR, (3-1) GDMP install directory = /opt/edg (3-2) GDMP flat file directory = /shared/cms (3-3) GDMP Objectivity file directory (not needed for CMSIM production) (4-1) GDMP job manager: grid015.pd.infn.it/jobmanager-fork MOP Worker Site Configuration File for Padova (WorldGrid): MOP jobs successfully sent from a U.S. VDT WoldGrid site to Padova EDG site EU CMS production jobs successfully sent from EDG site to U.S. VDT WorldGrid site ATLAS Grappa jobs successfully sent from US to a EU Resource Broker and run on US-CMS VDT WorldGrid site.
21
14.01.2003NSF/DOE Review21 Chimera: The GriPhyN Virtual Data System Abs. Plan VDC RC C. Plan. DAX DAGMan DAG VDL Logical Physical XML l Chimera currently provides the following prototypes: –Virtual Data Language (VDL) >describes virtual data products –Virtual Data Catalogue (VDC) >used to store VDL –Abstract Job Flow Planner >creates a logical DAG (in XML) called DAX –Concrete Job Flow Planner >interfaces with a Replica Catalogue >provides a physical DAG submission file to Condor-G/DAGMan l Generic and flexible: multiple ways to use Chimera –as a toolkit and/or a framework –in a Grid environment or just locally
22
14.01.2003NSF/DOE Review22 Direction of US-CMS Chimera Work l Monte Carlo Production Integration –RefDB/MCRunJob –Already able to perform all production steps –"Chimera Regional Centre" >For quality assurance and scalability testing >To be used with low priority actual production assignments l User Analysis Integration –GAE/CAIGEE work (Web Services, Clarens) –Other generic data analysis packages l Two equal motivations: –test a generic product for which CMS (and ATLAS, etc) will find useful ! –Experiment with Virtual Data and Data Provenance: CMS is an excellent use-case ! ! l Encouraging and inviting more CMS input –Ensure that the Chimera effort fits within CMS efforts and solves real (current and future) CMS needs ! Generator Simulator Formator Reconstructor ESD AOD Analysis Production Analysis params exec. data
23
14.01.2003NSF/DOE Review23 Many promising alternatives: currently in the process of prototyping and choosing. User Physics Query flow Local analysis tool: PAW/ROOT/… Production system and data repositories ORCA analysis farm(s) (or distributed `farm’ using grid queues) RDBMS based data warehouse(s) PIAF/Proof/.. type analysis farm(s) Data extraction Web service(s) Query Web service(s) Web browser Local disk TAGs/AODs data flow Production data flow TAG and AOD extraction/conversion/transport (Clarens) Clarens based Plugin module Picture taken from Koen Holtman and Conrad Steenberg See Julian Bunn's Talk l Data Processing Tools –interactive visualisation and data analysis (ROOT, etc) l Data Catalog Browser –allows a physicist to find collections of data at the object level l Data Mover –embeded window allowing physicist to customise data movement l Network Performance Monitor –allows a physicist to optimise data movement by dynamically monitoring network conditions l Computation resource browser, selector and monitor –allows a physicist to view available resources (primarily for dev. stages of Grid) l Storage resource browser –enables a physicist to ensure that enough disk space is available l Log browser –enables a physicist to get direct feedback from jobs indicating success/failure, etc Building a Grid-enabled Physics Analysis Desktop
24
14.01.2003NSF/DOE Review24 How CAIGEE plans to use the Testbed Catalog Web Client Grid Services Web Server Execution Priority Manager Grid Wide Execution Service GDMP Concrete Planner Abstract Planner Web Client Virtual Data Catalogue Materialised Data Catalogue Grid Processes Monitoring l Based on client-server scheme –one or more inter- communicating servers –small set of of clients logically associated with each server l Scalable tiered architecture: –Servers can delegate execution to another server (same or higher level) on the Grid l Servers offer "web-based services" –ability to dynamically add or improve
25
14.01.2003NSF/DOE Review25 High Speed Data Transport l R&D work from Caltech, SLAC and DataTAG on data transport is approaching ~1 Gbit/sec per GbE port over long distance networks l Expect to deploy (including disk to disk) on the US-CMS Testbed in 4-6 months l Anticipate progressing from 10 to 100 MByte/sec and eventually 1 GByte/sec over long distance networks (RTT=60 msec across the US)
26
14.01.2003NSF/DOE Review26 Future R&D Directions l Workflow generator/planning (DISPRO) l Grid-wide scheduling l Strengthen monitoring infrastructure l VO Policy definition and enforcement l Data analysis framework (CAIGEE) l Data derivation and data provenance (Chimera) l Peer-to-peer collaborative environments l High speed data transport l Operations (what does it mean to operate a Grid?) l Interoperability tests between E.U. and U.S. solutions
27
14.01.2003NSF/DOE Review27 Conclusions l US-CMS Grid Activities reaching a healthy "critical mass" in several areas: –Testbed infrastructure (VDT, VO, monitoring, etc) –MOP has been (and continues to be) enormously successful –US/EU interoperability is beginning to be tested –Virtual Data is beginning to be seriously implemented/explored –Data Analysis efforts are rapidly progressing and being prototyped l Interaction with computer scientists has been excellent ! l Much of the work is being done in preparation for the LCG milestone of 24x7 production Grid milestone l We have a lot of work to do, but we feel we are making excellent progress and we are learning a lot !
28
14.01.2003NSF/DOE Review28 Question: Data Flow and Provenance Raw ESD AOD TAG Plots, Tables, Fits Comparisons Plots, Tables, Fits Real Data Simulated Data l Provenance of a Data Analysis l "Check-point" a Data Analysis l Audit a Data Analysis
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.