Grid Testbed Activities in US-CMS Rick Cavanaugh University of Florida 1. Infrastructure 2. Highlight of Current Activities 3. Future Directions NSF/DOE.

Slides:



Advertisements
Similar presentations
Claudio Grandi INFN Bologna DataTAG WP4 meeting, Bologna 14 jan 2003 CMS Grid Integration Claudio Grandi (INFN – Bologna)
Advertisements

Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
Data Management Expert Panel - WP2. WP2 Overview.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics,
CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea.
A conceptual model of grid resources and services Authors: Sergio Andreozzi Massimo Sgaravatto Cristina Vistoli Presenter: Sergio Andreozzi INFN-CNAF Bologna.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
R. Cavanaugh GriPhyN Analysis Workshop Caltech, June, 2003 Virtual Data Toolkit.
Experience with ATLAS Data Challenge Production on the U.S. Grid Testbed Kaushik De University of Texas at Arlington CHEP03 March 27, 2003.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Workload Management Massimo Sgaravatto INFN Padova.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Experiment Applications: applying the power of the grid to real science Rick Cavanaugh University of Florida GriPhyN/iVDGL External Advisory Committee.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
Grappa: Grid access portal for physics applications Shava Smallen Extreme! Computing Laboratory Department of Physics Indiana University.
6-Mar-2003Grids NOW 1 Grids Now: The USCMS Integration Grid Testbed and the European Data Grid Michael Ernst.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
Experiment Requirements for Global Infostructure Irwin Gaines FNAL/DOE.
High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
OSG Services at Tier2 Centers Rob Gardner University of Chicago WLCG Tier2 Workshop CERN June 12-14, 2006.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
GriPhyN Status and Project Plan Mike Wilde Mathematics and Computer Science Division Argonne National Laboratory.
Ruth Pordes, Fermilab CD, and A PPDG Coordinator Some Aspects of The Particle Physics Data Grid Collaboratory Pilot (PPDG) and The Grid Physics Network.
Grid Workload Management Massimo Sgaravatto INFN Padova.
1 Grid Related Activities at Caltech Koen Holtman Caltech/CMS PPDG meeting, Argonne July 13-14, 2000.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
CPT Demo May Build on SC03 Demo and extend it. Phase 1: Doing Root Analysis and add BOSS, Rendezvous, and Pool RLS catalog to analysis workflow.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
High Energy Physics and Grids at UF (Dec. 13, 2002)Paul Avery1 University of Florida High Energy Physics.
US CMS Centers & Grids – Taiwan GDB Meeting1 Introduction l US CMS is positioning itself to be able to learn, prototype and develop while providing.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
VO Privilege Activity. The VO Privilege Project develops and implements fine-grained authorization to grid- enabled resources and services Started Spring.
Alain Roy Computer Sciences Department University of Wisconsin-Madison Condor & Middleware: NMI & VDT.
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Korea Workshop May GAE CMS Analysis (Example) Michael Thomas (on behalf of the GAE group)
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
STAR Scheduler Gabriele Carcassi STAR Collaboration.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
EDG Project Conference – Barcelona 13 May 2003 – n° 1 A.Fanfani INFN Bologna – CMS WP8 – Grid Planning in CMS Outline  CMS Data Challenges  CMS Production.
U.S. ATLAS Grid Production Experience
CMS report from FNAL demo week Marco Verlato (INFN-Padova)
Patrick Dreher Research Scientist & Associate Director
Presentation transcript:

Grid Testbed Activities in US-CMS Rick Cavanaugh University of Florida 1. Infrastructure 2. Highlight of Current Activities 3. Future Directions NSF/DOE Review LBNL, Berkeley 14 January, 2003

NSF/DOE Review2 l Fermilab –1+5 PIII dual GHz processor machines l Caltech –1+3 AMD dual 1.6 GHz processor machines l San Diego –1+3 PIV single 1.7 GHz processor machines l Florida –1+5 PIII dual 1 GHz processor machines l Wisconsin –5 PIII single 1 GHz processor machines l Total: l ~41 1 GHz dedicated processors l Operating System: Red Hat 6 –Required for Objectivity US-CMS Development Grid Testbed

NSF/DOE Review3 US-CMS Integration Grid Testbed l Fermilab –40 PIII dual GHz processor machines l Caltech –20 dual GHz processor machines –20 dual 2.4 GHz processor machines l San Diego –20 dual GHz processor machines –20 dual 2.4 GHz processor machines l Florida –40 PIII dual 1 GHz processor machines l CERN (LCG site) –72 dual 2.4 GHz processor machines l Total: l GHz processors: Red Hat 6 l GHz processors: Red Hat 7 UCSD Florida Caltech Fermilab CERN

NSF/DOE Review4 DGT Participation by other CMS Institutes Encouraged! UCSD Florida Caltech Fermilab Wisconsin MIT Rice Minnesota Belgium Brazil South Korea Expression of interest:

NSF/DOE Review5 Grid Middleware l Testbed Based on the Virtual Data Toolkit –VDT Client: –Globus Toolkit 2.0 –Condor-G –VDT Server: –Globus Toolkit 2.0 –mkgridmap –Condor –ftsh –GDMP l Virtual Organisation Management –LDAP Server deployed at Fermilab –Contains the DN’s for all US-CMS Grid Users –GroupMAN (from PPDG and adapted from EDG) used to manage the VO –Investigating/evaluting the use of VOMS from the EDG –Use D.O.E. Science Grid certificates –Accept EDG and Globus certificates

NSF/DOE Review6 Non-VDT Software Distribution l DAR (can be installed “on the fly”) –CMKIN –CMSIM –ORCA/COBRA –Represents a crucial step forward in CMS distributed computing! l Working to deploy US-CMS Pacman Caches for: –CMS Software (DAR, etc) –All other non-VDT Software required for the Testbed –GAE/CAIGEE (Clarens, etc), GroupMAN, etc

NSF/DOE Review7 l MonaLisa (Caltech) –Currently deployed on the Test-bed –Dynamic information/resource discovery mechanism using agents –Implemented in >Java / Jini with interfaces to SNMP, MDS, Ganglia, and Hawkeye >WDSL / SOAP with UDDI –Aim to incorporate into a “Grid Control Room” Service for the Testbed Monitoring and Information Services

NSF/DOE Review8 Other Monitoring and Information Services l Information Service and Config. Monitoring: MDS (Globus) –Currently deployed on the Testbed in a hierarchical fashion –Aim to deploy the GLUE Schema when released by iVDGL/DataTAG –Developing API's to and from MonaLisa l Health Monitoring: Hawkeye (Condor) –Leverages the ClassAd system of collecting dynamic information on large pools –Will soon incorporate Heart Beat Monitoring of Grid Services –Currently deployed at Wisconsin and Florida

NSF/DOE Review9 Existing US-CMS Grid Testbed Client-Server Scheme User VDT ClientVDT Server Monitoring

NSF/DOE Review10 Existing US-CMS Grid Testbed Client-Server Scheme Replica Management Storage Resource User Compute Resource Reliable Transfer VDT ClientVDT Server Monitoring

NSF/DOE Review11 Existing US-CMS Grid Testbed Client-Server Scheme Replica Management Storage Resource Virtual Data Sys. Executor User Compute Resource Reliable Transfer VDT ClientVDT Server MOP Monitoring

NSF/DOE Review12 Existing US-CMS Grid Testbed Client-Server Scheme Replica Management Storage Resource Virtual Data Sys. Executor User Compute Resource Reliable Transfer VDT ClientVDT Server MOP Monitoring PerformanceHealthInfo.&Config.

NSF/DOE Review13 Replica Management Storage Resource Virtual Data Sys. Executor User DAGMan Condor-G / Globus Local Grid Storage Compute Resource Replica Catalogue Reliable Transfer Globus GRAM / Condor Pool ftsh wrapped GridFTP GDMP VDT ClientVDT Server MOP Monitoring PerformanceHealthInfo.&Config. MonaLisaMDSHawkeye Existing US-CMS Grid Testbed Client-Server Scheme

NSF/DOE Review14 Existing US-CMS Grid Testbed Client-Server Scheme Replica Management Storage Resource Virtual Data Sys. Executor User DAGMan Condor-G / Globus Local Grid Storage Compute Resource Replica Catalogue Reliable Transfer Globus GRAM / Condor Pool ftsh wrapped GridFTP GDMP VDT ClientVDT Server MOP mop_submitter Monitoring PerformanceHealthInfo.&Config. MonaLisaMDSHawkeye

NSF/DOE Review15 Replica Management Storage Resource Virtual Data Sys. Executor User Virtual Data Catalogue Concrete Planner Abstract Planner DAGMan Condor-G / Globus Local Grid Storage Compute Resource Replica Catalogue Reliable Transfer Globus GRAM / Condor Pool ftsh wrapped GridFTP GDMP VDT ClientVDT Server MOP Monitoring PerformanceHealthInfo.&Config. MonaLisaMDSHawkeye Existing US-CMS Grid Testbed Client-Server Scheme

NSF/DOE Review16 User ClientServer Monitoring Storage Resource Relational Database Performance MonaLisa Data Analysis ROOT/ Clarens Data Movement Clarens ROOT files Existing US-CMS Grid Testbed Client-Server Scheme

NSF/DOE Review17 Commissioning the Development Grid Testbed with "Real Production" l MOP (from PPDG) Interfaces the following into a complete prototype: –IMPALA/MCRunJob CMS Production Scripts –Condor-G/DAGMan –GridFTP –( mop_submitter is generic) l Using MOP to "commission" the Testbed –Require large scale, production quality results! >Run until the Testbed "breaks" >Fix Testbed with middleware patches >Repeat procedure until the entire Production Run finishes! –Discovered/fixed many fundamental grid software problems in Globus and Condor-G (close cooperation with Condor/Wisconsin) >huge success from this point of view alone VDT Client VDT Server 1 MCRunJob DAGMan/ Condor-G Condor GridFTP VDT Server N Condor GridFTP Globus GridFTP mop-submitter LinkerScriptGen Config Req. Self Desc. Master Globus

NSF/DOE Review18 Integration Grid Testbed Success Story l Production Run Status for the IGT MOP Production –Assigned 1.5 million events for “eGamma Bigjets” >~500 sec per event on 750 MHz processor; all production stages from simulation to ntuple –2 months continuous running across 5 testbed sites l Demonstrated at Supercomputing 2002

NSF/DOE Review19 Integration Grid Testbed Success Story l Production Run Status for the IGT MOP Production –Assigned 1.5 million events for “eGamma Bigjets” >~500 sec per event on 750 MHz processor; all production stages from simulation to ntuple –2 months continuous running across 5 testbed sites l Demonstrated at Supercomputing Million Events Produced ! (nearly 30 CPU years)

NSF/DOE Review20 Interoperability work with EDG/DataTAG (1-1) Stage-in/out jobmanger grid015.pd.infn.it/jobmanager-fork (SE) or grid011.pd.infn.it/jobmanager-lsf-datatag (CE) (1-2) GLOBUS_LOCATION=/opt/globus (1-3) Shared directory for mop files: /shared/cms/MOP (on SE and NFS exported to CE) (2-1) Run jobmanager: grid011.pd.infn.it/jobmanager-lsf-datatag (2-2) location of CMS DAR installation: /shared/cms/MOP/DAR, (3-1) GDMP install directory = /opt/edg (3-2) GDMP flat file directory = /shared/cms (3-3) GDMP Objectivity file directory (not needed for CMSIM production) (4-1) GDMP job manager: grid015.pd.infn.it/jobmanager-fork MOP Worker Site Configuration File for Padova (WorldGrid): MOP jobs successfully sent from a U.S. VDT WoldGrid site to Padova EDG site EU CMS production jobs successfully sent from EDG site to U.S. VDT WorldGrid site ATLAS Grappa jobs successfully sent from US to a EU Resource Broker and run on US-CMS VDT WorldGrid site.

NSF/DOE Review21 Chimera: The GriPhyN Virtual Data System Abs. Plan VDC RC C. Plan. DAX DAGMan DAG VDL Logical Physical XML l Chimera currently provides the following prototypes: –Virtual Data Language (VDL) >describes virtual data products –Virtual Data Catalogue (VDC) >used to store VDL –Abstract Job Flow Planner >creates a logical DAG (in XML) called DAX –Concrete Job Flow Planner >interfaces with a Replica Catalogue >provides a physical DAG submission file to Condor-G/DAGMan l Generic and flexible: multiple ways to use Chimera –as a toolkit and/or a framework –in a Grid environment or just locally

NSF/DOE Review22 Direction of US-CMS Chimera Work l Monte Carlo Production Integration –RefDB/MCRunJob –Already able to perform all production steps –"Chimera Regional Centre" >For quality assurance and scalability testing >To be used with low priority actual production assignments l User Analysis Integration –GAE/CAIGEE work (Web Services, Clarens) –Other generic data analysis packages l Two equal motivations: –test a generic product for which CMS (and ATLAS, etc) will find useful ! –Experiment with Virtual Data and Data Provenance: CMS is an excellent use-case ! ! l Encouraging and inviting more CMS input –Ensure that the Chimera effort fits within CMS efforts and solves real (current and future) CMS needs ! Generator Simulator Formator Reconstructor ESD AOD Analysis Production Analysis params exec. data

NSF/DOE Review23 Many promising alternatives: currently in the process of prototyping and choosing. User Physics Query flow Local analysis tool: PAW/ROOT/… Production system and data repositories ORCA analysis farm(s) (or distributed `farm’ using grid queues) RDBMS based data warehouse(s) PIAF/Proof/.. type analysis farm(s) Data extraction Web service(s) Query Web service(s) Web browser Local disk TAGs/AODs data flow Production data flow TAG and AOD extraction/conversion/transport (Clarens) Clarens based Plugin module Picture taken from Koen Holtman and Conrad Steenberg See Julian Bunn's Talk l Data Processing Tools –interactive visualisation and data analysis (ROOT, etc) l Data Catalog Browser –allows a physicist to find collections of data at the object level l Data Mover –embeded window allowing physicist to customise data movement l Network Performance Monitor –allows a physicist to optimise data movement by dynamically monitoring network conditions l Computation resource browser, selector and monitor –allows a physicist to view available resources (primarily for dev. stages of Grid) l Storage resource browser –enables a physicist to ensure that enough disk space is available l Log browser –enables a physicist to get direct feedback from jobs indicating success/failure, etc Building a Grid-enabled Physics Analysis Desktop

NSF/DOE Review24 How CAIGEE plans to use the Testbed Catalog Web Client Grid Services Web Server Execution Priority Manager Grid Wide Execution Service GDMP Concrete Planner Abstract Planner Web Client Virtual Data Catalogue Materialised Data Catalogue Grid Processes Monitoring l Based on client-server scheme –one or more inter- communicating servers –small set of of clients logically associated with each server l Scalable tiered architecture: –Servers can delegate execution to another server (same or higher level) on the Grid l Servers offer "web-based services" –ability to dynamically add or improve

NSF/DOE Review25 High Speed Data Transport l R&D work from Caltech, SLAC and DataTAG on data transport is approaching ~1 Gbit/sec per GbE port over long distance networks l Expect to deploy (including disk to disk) on the US-CMS Testbed in 4-6 months l Anticipate progressing from 10 to 100 MByte/sec and eventually 1 GByte/sec over long distance networks (RTT=60 msec across the US)

NSF/DOE Review26 Future R&D Directions l Workflow generator/planning (DISPRO) l Grid-wide scheduling l Strengthen monitoring infrastructure l VO Policy definition and enforcement l Data analysis framework (CAIGEE) l Data derivation and data provenance (Chimera) l Peer-to-peer collaborative environments l High speed data transport l Operations (what does it mean to operate a Grid?) l Interoperability tests between E.U. and U.S. solutions

NSF/DOE Review27 Conclusions l US-CMS Grid Activities reaching a healthy "critical mass" in several areas: –Testbed infrastructure (VDT, VO, monitoring, etc) –MOP has been (and continues to be) enormously successful –US/EU interoperability is beginning to be tested –Virtual Data is beginning to be seriously implemented/explored –Data Analysis efforts are rapidly progressing and being prototyped l Interaction with computer scientists has been excellent ! l Much of the work is being done in preparation for the LCG milestone of 24x7 production Grid milestone l We have a lot of work to do, but we feel we are making excellent progress and we are learning a lot !

NSF/DOE Review28 Question: Data Flow and Provenance Raw ESD AOD TAG Plots, Tables, Fits Comparisons Plots, Tables, Fits Real Data Simulated Data l Provenance of a Data Analysis l "Check-point" a Data Analysis l Audit a Data Analysis