All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

Slides:



Advertisements
Similar presentations
WP1 Grid Workload Management Massimo Sgaravatto INFN Padova
Advertisements

DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
21 Sep 2005LCG's R-GMA Applications R-GMA and LCG Steve Fisher & Antony Wilson.
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
WP2: Data Management Gavin McCance University of Glasgow November 5, 2001.
Workload Management David Colling Imperial College London.
WP 1 Members of Wp1: INFN Cesnet DATAMAT PPARC. WP 1 What does WP1 do? Broker Submission mechanism JDL/JCL and other UIs Logging computational economics.
SAM-Grid Status Core SAM development SAM-Grid architecture Progress Future work.
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
Andrew McNab - Manchester HEP - 22 April 2002 EU DataGrid Testbed EU DataGrid Software releases Testbed 1 Job Lifecycle Authorisation at your site More.
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Hands on Job Services.
Presenter Name Facility Name EDG Testbed Status Moving to Testbed Two.
Database System Concepts and Architecture
Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.
Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
Job Submission The European DataGrid Project Team
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
The Sam-Grid project Gabriele Garzoglio ODS, Computing Division, Fermilab PPDG, DOE SciDAC ACAT 2002, Moscow, Russia June 26, 2002.
Workload Management Massimo Sgaravatto INFN Padova.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Computational grids and grids projects DSS,
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
- Iain Bertram R-GMA and DØ Iain Bertram RAL 13 May 2004 Thanks to Jeff Templon at Nikhef.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
The Plan for this morning: Description of the EDG WP 1 software: How it works, basic commands, how to get started etc Example of how to submit jobs: From.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Grid Workload Management Massimo Sgaravatto INFN Padova.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
CMS Stress Test Report Marco Verlato (INFN-Padova) INFN-GRID Testbed Meeting 17 Gennaio 2003.
Job Submission and Resource Brokering WP 1. Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL The.
13 May 2004EB/TB Middleware meeting Use of R-GMA in BOSS for CMS Peter Hobson & Henry Nebrensky Brunel University, UK Some slides stolen from various talks.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
1 DØ Grid PP Plans – SAM, Grid, Ceiling Wax and Things Iain Bertram Lancaster University Monday 5 November 2001.
TERENA 2003, May 21, Zagreb TERENA Networking Conference, 2003 MOBILE WORK ENVIRONMENT FOR GRID USERS. TESTBED Miroslaw Kupczyk Rafal.
CLRC and the European DataGrid Middleware Information and Monitoring Services The current information service is built on the hierarchical database OpenLDAP.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
Presenter Name Facility Name UK Testbed Status and EDG Testbed Two. Steve Traylen GridPP 7, Oxford.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
UK Grid Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid Prototype and Globus Technical Meeting QMW, 22nd November 2000 Glenn Patrick (RAL)
2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6
Andrew McNab - Manchester HEP - 17 September 2002 UK Testbed Deployment Aim of this talk is to the answer the questions: –“How much of the Testbed has.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
User Interface UI TP: UI User Interface installation & configuration.
Workload Management Workpackage
UK GridPP Tier-1/A Centre at CLRC
Status and plans for bookkeeping system and production tools
Information Services Claudio Cherubino INFN Catania Bologna
Presentation transcript:

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing, I will try to describe the how we are trying to solve it… Developing an operational Grid

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 2 GridPP is involved in two projects: EU DataGrid Project SAMGrid Project These projects are build on top common products such as the globus toolkit and CondorG. I will describe some of the aspects of the middleware developed in these projects and how we are deploying it. Developing an operational Grid

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 3 In 15 minutes this will have to be a very brief overview for more information see the many posters we have here, talk to the people on the booth or look at the GridPP website Developing an operational Grid

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 4 The DataGrid Project Applications Physical Fabric Middleware Information and Monitoring Services HEP Apps EO Apps Bio Apps Workload Management Data Management Globus Middleware Computing Fabric Storage Element Mass Storage Management Network Services Fabric Management Major UK Involvement Networking Fabric I will only talk about a few of these boxes

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 5 The DataGrid Project Resource Management: The user describes their jobs using a set of Condor ClassAds. The job is then submitted to a resource broker from any User Interface (UI) Machine. Resource broker (RB) is at the centre of the resource management. The RB matches the requirement of the job to the resources. This uses the Condor ClassAd Libraries. Information about available resources is cached by the Information Index (II) which the RB queries. II in turn acquire their information by interrogating individual GRISes and National GIISes Information about the location of data is stored in a replica catalogue.

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 6 The DataGrid Project An example: Executable = "WP1testF"; StdOutput = "sim.out"; StdError = "sim.err"; InputSandbox = {"/home/datamat/sim.exe", "/home/datamat/DATA/*"}; OutputSandbox = {"sim.err","sim.err","testD.out"}; Rank = other.TotalCPUs * other.AverageSI00; Requirements = other.LRMSType == "PBS" \ && (other.OpSys == "Linux RH 6.1" || other.OpSys == "Linux RH 6.2"); RetryCount = 2; Arguments = "file1"; InputData = "LF:test "; ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica Catalog,dc=sunlab2g, dc=cnaf, dc=infn, dc=it"; DataAccessProtocol = "gridftp"; OutputSE = "grid001.cnaf.infn.it";

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 7 The DataGrid Project Resource Management: If the RB is able to match the job to a resource it then passes the job over to the Job Submission Service (JSS), which then submits the job to the selected resource. The JSS is based on CondorG. Logging information is kept at each stage. All user interaction is via UI and he/she is able list resources that match their requirements, submit jobs, examine the status of submitted jobs, access all logging information about their jobs and cancel jobs.

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 8 The DataGrid Project Information Services (R-GMA): Design and implementation based Grid Monitoring Architecture of the GGF…with the term directory replaced with registry to avoid any implied structure.

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 9 The DataGrid Project Information Services (R-GMA): The current implementation uses servelet technology, with APIs in Java, C++, C, Perl and Python.

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 10 Information Services (R-GMA): The DataGrid Project Gives the impression of one RDBMS per VO. Currently there are 2 types of producer: A circular buffer producer: No RDBMS is used and SQL queries are handled by the code. A consumer may miss records if it is too slow A data base producer: Uses a databse to hold data so data is never lost, however it is slower and requires a clean up strategy to avoid indefinite growth. More producers are being implement at the request of the users.

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 11 Information Services (R-GMA): An Example: CPU load at various sites. Note all information is timestamped. SELECT * FROM CPULoad WHERE Country = UK AND Site = RAL Would give the output of producer 1. The DataGrid Project

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 12 The DataGrid Project Network Monitoring: Essential if network information is to be used in brokering. SLACs IEPM uses Pinger to measure round trip time, iperf, bbftp and bbcp to measure TCP throughput and UDPmon to measure UDP throughput. Sample results from IEPM monitoring between SLAC and Daresbury.

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 13 The DataGrid Project Network Monitoring: The network monitoring information can then be published via an LDAP service.

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 14 The DataGrid Project Installation and configuration (LCFG): Configuration of large numbers of different machines can be very troublesome. DataGrid uses LCFG. Each Machine has its own profile which can include general site profiles and individual configuration opinions Profile then published in XML

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 15 /* gw31 ============================================== BARRY'S WN */ /* Host specific definitions */ #define HOSTNAME gw31 /* Some useful macros */ #include "macros-cfg.h" /* Site specific definitions */ #include "site-cfg-farm.h.ic" /* Linux default resources */ #include "linuxdef-cfg.h /* LCFG client specific resources */ #include "client_testbed-cfg.h" /* Well, obviously, if you read the title !!!!!!! */ #include "WorkerNode-cfg.h … /* Specific NIC */ +update.modlist label +update.mod_label alias eth0 eepro100 XML published profile The DataGrid Project

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 16 How do we know what is working? We monitor each site.

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 17 SAMGrid SAM is used by the DØ experiment …so SAM is operational but but is it a Grid? Currently SAM is mainly data management tool. Locations of replicas are stored in a central database. Files are moved to a running job as and when needed. Currently 1TB/day. SAM can only submit jobs to local resources. Has been modified to use gridftp. Being updated for remote submission using CondorG ready very soon (in testing now).

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 18 Real Data files from FNAL MC files from NIKHEF SAMGrid

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 19 Developing an operational Grid Conclusions DataGrid being deployed across Europe and just starting to be used … ATLAS Experiment last week. SAM already being used by many users and is being modified for remote submission and to use transfer protocols such as gridftp.

All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 20 Database Server(s) (Central Database) Name Server Global Resource Manager(s) Log server Station 1 Servers Station 2 Servers Station 3 Servers Station n Servers Mass Storage System(s) Shared Globally Local Shared Locally Arrows indicate Control and data flow