Olof Bärring – WP4 summary- 4/9/2002 - n° 1 Partner Logo WP4 report Plans for testbed 2 [Including slides prepared by Lex Holt.]

Slides:



Advertisements
Similar presentations
GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 1 Fabric monitoring for LCG-1 in the CERN Computer Center Jan van Eldik CERN-IT/FIO/SM 7 th GridPP.
Advertisements

26/05/2004HEPIX, Edinburgh, May Lemon Web Monitoring Miroslav Šiket CERN IT/FIO
Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.
19/06/2002WP4 Workshop - CERN WP4 - Monitoring Progress report
DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – Towards automation of computing fabrics... – n° 1 Towards automation.
German Cancio – WP4 developments Partner Logo WP4-install plans WP6 meeting, Paris project conference
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
ASIS et le projet EU DataGrid (EDG) Germán Cancio IT/FIO.
27-29 September 2002CrossGrid Workshop LINZ1 USE CASES (Task 3.5 Test and Integration) Santiago González de la Hoz CrossGrid Workshop at Linz,
GRID job tracking and monitoring Dmitry Rogozin Laboratory of Particle Physics, JINR 07/08/ /09/2006.
WP4-install task report WP4 workshop Barcelona project conference 5/03 German Cancio.
Managing Mature White Box Clusters at CERN LCW: Practical Experience Tim Smith CERN/IT.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
7/2/2003Supervision & Monitoring section1 Supervision & Monitoring Organization and work plan Olof Bärring.
C. Loomis – Testbed Status – 28/01/2002 – n° 1 Future WP6 Tasks Charles Loomis January 28, 2002
EDG LCFGng: concepts Fabric Management Tutorial - n° 2 LCFG (Local ConFiGuration system)  LCFG is originally developed by the.
1 Linux in the Computer Center at CERN Zeuthen Thorsten Kleinwort CERN-IT.
KNMI Applications on Testbed 1 …and other activities.
October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.
Olof Bärring – WP4 summary- 6/3/ n° 1 Partner Logo WP4 report Status, issues and plans
Large Computer Centres Tony Cass Leader, Fabric Infrastructure & Operations Group Information Technology Department 14 th January and medium.
EDG WP4: installation task LSCCW/HEPiX hands-on, NIKHEF 5/03 German Cancio CERN IT/FIO
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Partner Logo DataGRID WP4 - Fabric Management Status HEPiX 2002, Catania / IT, , Jan Iven Role and.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2
May PEM status report. O.Bärring 1 PEM status report Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Partner Logo German Cancio – WP4-install LCFG HOW-TO - n° 1 LCFGng configuration examples Updated 10/2002
EDG Testbed installation and configuration with LCFGng Maite Barroso - WP4
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
05/29/2002Flavia Donno, INFN-Pisa1 Packaging and distribution issues Flavia Donno, INFN-Pisa EDG/WP8 EDT/WP4 joint meeting, 29 May 2002.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
German Cancio – WP4 developments Partner Logo System Management: Node Configuration & Software Package Management
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Deployment work at CERN: installation and configuration tasks WP4 workshop Barcelona project conference 5/03 German Cancio CERN IT/FIO.
20-May-2003HEPiX Amsterdam EDG Fabric Management on Solaris G. Cancio Melia, L. Cons, Ph. Defert, I. Reguero, J. Pelegrin, P. Poznanski, C. Ungil Presented.
G. Cancio, L. Cons, Ph. Defert - n°1 October 2002 Software Packages Management System for the EU DataGrid G. Cancio Melia, L. Cons, Ph. Defert. CERN/IT.
Maite Barroso – WP4 Barcelona – 13/05/ n° 1 -WP4 Barcelona- Closure Maite Barroso 13/05/2003
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
May http://cern.ch/hep-proj-grid-fabric1 EU DataGrid WP4 Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
Resource Management Task Report Thomas Röblitz 19th June 2002.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
German Cancio – WP4 developments Partner Logo WP4 / ATF ATF meeting, 9/4/2002
C. Aiftimiei, E. Ferro / January LCFGng server installation Cristina Aiftimiei, Enrico Ferro INFN-LNL.
Olof Bärring – EDG WP4 status&plans- 22/10/ n° 1 Partner Logo EDG WP4 (fabric mgmt): status&plans Large Cluster.
German Cancio – WP4 developments Partner Logo WP4-install progress CERN, 19/6/2002 for WP4-install.
Maite Barroso - 10/05/01 - n° 1 WP4 PM9 Deliverable Presentation: Interim Installation System Configuration Management Prototype
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
The EDG Testbed The European DataGrid Project Team
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
CERN 19/06/2002 Kickstart file generator Andrea Chierici (INFN-CNAF) Enrico Ferro (INFN-LNL) Marco Serra (INFN-Roma)
Quattor tutorial Introduction German Cancio, Rafael Garcia, Cal Loomis.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Partner Logo Olof Bärring, WP4 workshop 10/12/ n° 1 (My) Vision of where we are going WP4 workshop, 10/12/2002 Olof Bärring.
Maite Barroso – WP4 Workshop – 10/12/ n° 1 -WP4 Workshop- Developers’ Guide Maite Barroso 10/12/2002
Managing Large Linux Farms at CERN OpenLab: Fabric Management Workshop Tim Smith CERN/IT.
Monitoring and Fault Tolerance
Status of Fabric Management at CERN
WP4 Fabric Management 3rd EU Review Maite Barroso - CERN
WP4-install status update
German Cancio CERN IT .quattro architecture German Cancio CERN IT.
Presentation transcript:

Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2 [Including slides prepared by Lex Holt.]

Olof Bärring – WP4 summary- 4/9/ n° 2 Summary u Reminder on how it all fits together u What’s in R1.2 (deployed and not-deployed but integrated) u Piled up software from R1.3, R1.4 u Timeline for R2 developments and beyond u Conclusions

Olof Bärring – WP4 summary- 4/9/ n° 3 How it all fits together (job management) Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Monitoring Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) - Submit job - Optimized selection of site -Authorize -Map grid  local credentials -Authorize -Map grid  local credentials -Select an optimal batch queue and submit -Return job status and output -Select an optimal batch queue and submit -Return job status and output - publish resource and accounting information

Olof Bärring – WP4 summary- 4/9/ n° 4 How it all fits together (system mgmt) WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation - Update configuration templates - Node malfunction detected -Remove node from queue -Wait for running jobs(?) -Remove node from queue -Wait for running jobs(?) - Trigger repair - Repair (e.g. restart, reboot, reconfigure, …) - Node OK detected -Put back node in queue Automation

Olof Bärring – WP4 summary- 4/9/ n° 5 How it all fits together (node autonomy) Cfg cache Monitoring Buffer Correlation engines Node mgmt components Monitoring Measurement Repository Configuration Data Base Central (distributed) Buffer copy Node profile Local recover if possible (e.g. restarting daemons) Automation

Olof Bärring – WP4 summary- 4/9/ n° 6 What’s in R1.2 (and deployed) u Gridification: n Library implementation of LCAS

Olof Bärring – WP4 summary- 4/9/ n° 7 What’s in R1.2 but not used/deployed u Resource management n Information provider for Condor (not fully tested because you need a complete testbed including a Condor cluster) u Monitoring n Agent + first prototype repository server + basic linuxproc sensors n No LCFG object  not deployed u Installation mgmt n LCFG light exists in R1.2. Please provide us feedback on any problems you have with it.

Olof Bärring – WP4 summary- 4/9/ n° 8 Piled up software from R1.3, R1.4 u Everything mentioned here is ready, unit tested and documented (and rpms are built by autobuild) n Gridification s LCAS with dynamic plug-ins. (already in R1.2.1???) n Resource mgmt s Complete prototype enterprise level batch system management with proxy for PBS. Includes LCFG object. n Monitoring s New agent. Production quality. Already used on CERN production clusters sampling some 110 metrics/node. Has also been tested on Solaris. s LCFG object n Installation mgmt s Next generation LCFG: LCFGng for RH6.2 (RH7.2 almost ready)

Olof Bärring – WP4 summary- 4/9/ n° 9 New LCFG [Lex Holt] u EDG release 1.3: more recent LCFG version (LCFGng) u Many improvements: n Supports Red Hat 7.2 as well as 6.2 n Install/boot: full DHCP support, PXE support, can mix init.d scripts \& LCFG components n Single LCFG server can configure machines in multiple domains Spanning maps: profile generator ( mkxprof ) can gather individual machine data (e.g., MAC addresses) and publish to component (e.g., DHCP server) n Component method semantics clarified; native Perl components possible; EDG-style monitoring support

Olof Bärring – WP4 summary- 4/9/ n° 10 LCFG Migration [Lex Holt] u Clients require reinstallation u Will be guidelines for migrating servers without reinstallation--- some manual tweaking necessary, e.g.: n Locations (pathnames) changed n Resources changed or moved as a consequence of component changes u Component writers/maintainers need to absorb a few technical changes

Olof Bärring – WP4 summary- 4/9/ n° 11 Timeline for R2 developments u Configuration management: complete central part of framework n High Level Definition Language: 30/9/2002 n PAN compiler: 30/9/2002 n Configuration Database (CDB): 31/10/2002 u Installation mgmt n LCFGng for RH72: 30/9/2002 u Monitoring: Complete final framework n TCP transport: 30/9/2002 n Repository server: 30/9/2002 n Repository API WSDL: 30/9/2002 n Oracle DB support: 31/10/2002 n Alarm display: 30/11/2002 n Open Source DB (MySQL or PostgreSQL): mid-December 2002

Olof Bärring – WP4 summary- 4/9/ n° 12 Timeline for R2 developments u Resource mgmt n GLUE info providers: 15/9/2002 n Maintenance support API (e.g. enable/disable a node in the queue): 30/9/2002 n Provide accounting information to WP1 accounting group: 30/9/2002 n Support Maui as scheduler u Fault tolerance framework n Various components already delivered n Complete framework by end of November

Olof Bärring – WP4 summary- 4/9/ n° 13 Beyond release 2 u Conclusion from WP4 workshop, June 2002: LCFG is not the future for EDG (see WP4 quarterly report for 2Q02) because: n Inherent LCFG constraints on the configuration schema (per-component config) n LCFG is a project of its own and our objectives do not always coincide n We have learned a lot from LCFG architecture and we continue to collaborate with the LCFG team u EDG future: first release by end-March 2003 n Proposal for a common schema for all fabric configuration information to be stored in the configuration database, implemented using the HLDL. n New configuration client and node management replacing LCFG client (the server side is already delivered in October). n New software package management (replacing updaterpms) split into two modules: an OS independent part and an OS dependent part (packager).

Olof Bärring – WP4 summary- 4/9/ n° 14 WP4 plans (sketch/snapshot) [Lex Holt] u Caveat: installation & configuration tasks only u Release 2 to allow (but not require) use of the new high-level description language (HLDL) u Release 3: LCFG architecture roughly retained, but n HLDL replaces LCFG source file syntax n HLDL files accessed via new configuration database (akin to API wrapper round CVS repository) n XML profile much as before n Redesigned (probably Perl) components interpret profile through more substantial API/libraries (registration, dependency analysis, …) Single Configure() call to component does everything Generalized updaterpms may handle non-RPM formats

Olof Bärring – WP4 summary- 4/9/ n° 15 Summary u Substantial amount of s/w piled up from R1.3, R1.4 to be deployed now u R2 also includes two large components: n LCFGng – migration is non-trivial but we already perform as much as the non-trivial part ourselves so TB integration should be smooth n Complete monitoring framework u Beyond R2: LCFG is not future for EDG WP4. First version of new configuration and node management system in March 2003