Presentation is loading. Please wait.

Presentation is loading. Please wait.

Olof Bärring – WP4 summary- 4/9/2002 - n° 1 Partner Logo WP4 report Plans for testbed 2 [Including slides prepared by Lex Holt.]

Similar presentations


Presentation on theme: "Olof Bärring – WP4 summary- 4/9/2002 - n° 1 Partner Logo WP4 report Plans for testbed 2 [Including slides prepared by Lex Holt.]"— Presentation transcript:

1 Olof Bärring – WP4 summary- 4/9/2002 - n° 1 Partner Logo WP4 report Plans for testbed 2 Olof.Barring@cern.ch [Including slides prepared by Lex Holt.]

2 Olof Bärring – WP4 summary- 4/9/2002 - n° 2 Summary u Reminder on how it all fits together u What’s in R1.2 (deployed and not-deployed but integrated) u Piled up software from R1.3, R1.4 u Timeline for R2 developments and beyond u Conclusions

3 Olof Bärring – WP4 summary- 4/9/2002 - n° 3 How it all fits together (job management) Farm A (LSF)Farm B (PBS ) Grid User (Mass storage, Disk pools) Local User Monitoring Fabric Gridification Resource Management Grid Info Services (WP3) WP4 subsystems Other Wps Resource Broker (WP1) Data Mgmt (WP2) Grid Data Storage (WP5) - Submit job - Optimized selection of site -Authorize -Map grid  local credentials -Authorize -Map grid  local credentials -Select an optimal batch queue and submit -Return job status and output -Select an optimal batch queue and submit -Return job status and output - publish resource and accounting information

4 Olof Bärring – WP4 summary- 4/9/2002 - n° 4 How it all fits together (system mgmt) WP4 subsystems Other Wps Farm A (LSF)Farm B (PBS ) Installation & Node Mgmt Configuration Management Monitoring & Fault Tolerance Resource Management Information Invocation - Update configuration templates - Node malfunction detected -Remove node from queue -Wait for running jobs(?) -Remove node from queue -Wait for running jobs(?) - Trigger repair - Repair (e.g. restart, reboot, reconfigure, …) - Node OK detected -Put back node in queue Automation

5 Olof Bärring – WP4 summary- 4/9/2002 - n° 5 How it all fits together (node autonomy) Cfg cache Monitoring Buffer Correlation engines Node mgmt components Monitoring Measurement Repository Configuration Data Base Central (distributed) Buffer copy Node profile Local recover if possible (e.g. restarting daemons) Automation

6 Olof Bärring – WP4 summary- 4/9/2002 - n° 6 What’s in R1.2 (and deployed) u Gridification: n Library implementation of LCAS

7 Olof Bärring – WP4 summary- 4/9/2002 - n° 7 What’s in R1.2 but not used/deployed u Resource management n Information provider for Condor (not fully tested because you need a complete testbed including a Condor cluster) u Monitoring n Agent + first prototype repository server + basic linuxproc sensors n No LCFG object  not deployed u Installation mgmt n LCFG light exists in R1.2. Please provide us feedback on any problems you have with it.

8 Olof Bärring – WP4 summary- 4/9/2002 - n° 8 Piled up software from R1.3, R1.4 u Everything mentioned here is ready, unit tested and documented (and rpms are built by autobuild) n Gridification s LCAS with dynamic plug-ins. (already in R1.2.1???) n Resource mgmt s Complete prototype enterprise level batch system management with proxy for PBS. Includes LCFG object. n Monitoring s New agent. Production quality. Already used on CERN production clusters sampling some 110 metrics/node. Has also been tested on Solaris. s LCFG object n Installation mgmt s Next generation LCFG: LCFGng for RH6.2 (RH7.2 almost ready)

9 Olof Bärring – WP4 summary- 4/9/2002 - n° 9 New LCFG [Lex Holt] u EDG release 1.3: more recent LCFG version (LCFGng) u Many improvements: n Supports Red Hat 7.2 as well as 6.2 n Install/boot: full DHCP support, PXE support, can mix init.d scripts \& LCFG components n Single LCFG server can configure machines in multiple domains Spanning maps: profile generator ( mkxprof ) can gather individual machine data (e.g., MAC addresses) and publish to component (e.g., DHCP server) n Component method semantics clarified; native Perl components possible; EDG-style monitoring support

10 Olof Bärring – WP4 summary- 4/9/2002 - n° 10 LCFG Migration [Lex Holt] u Clients require reinstallation u Will be guidelines for migrating servers without reinstallation--- some manual tweaking necessary, e.g.: n Locations (pathnames) changed n Resources changed or moved as a consequence of component changes u Component writers/maintainers need to absorb a few technical changes

11 Olof Bärring – WP4 summary- 4/9/2002 - n° 11 Timeline for R2 developments u Configuration management: complete central part of framework n High Level Definition Language: 30/9/2002 n PAN compiler: 30/9/2002 n Configuration Database (CDB): 31/10/2002 u Installation mgmt n LCFGng for RH72: 30/9/2002 u Monitoring: Complete final framework n TCP transport: 30/9/2002 n Repository server: 30/9/2002 n Repository API WSDL: 30/9/2002 n Oracle DB support: 31/10/2002 n Alarm display: 30/11/2002 n Open Source DB (MySQL or PostgreSQL): mid-December 2002

12 Olof Bärring – WP4 summary- 4/9/2002 - n° 12 Timeline for R2 developments u Resource mgmt n GLUE info providers: 15/9/2002 n Maintenance support API (e.g. enable/disable a node in the queue): 30/9/2002 n Provide accounting information to WP1 accounting group: 30/9/2002 n Support Maui as scheduler u Fault tolerance framework n Various components already delivered n Complete framework by end of November

13 Olof Bärring – WP4 summary- 4/9/2002 - n° 13 Beyond release 2 u Conclusion from WP4 workshop, June 2002: LCFG is not the future for EDG (see WP4 quarterly report for 2Q02) because: n Inherent LCFG constraints on the configuration schema (per-component config) n LCFG is a project of its own and our objectives do not always coincide n We have learned a lot from LCFG architecture and we continue to collaborate with the LCFG team u EDG future: first release by end-March 2003 n Proposal for a common schema for all fabric configuration information to be stored in the configuration database, implemented using the HLDL. n New configuration client and node management replacing LCFG client (the server side is already delivered in October). n New software package management (replacing updaterpms) split into two modules: an OS independent part and an OS dependent part (packager).

14 Olof Bärring – WP4 summary- 4/9/2002 - n° 14 WP4 plans (sketch/snapshot) [Lex Holt] u Caveat: installation & configuration tasks only u Release 2 to allow (but not require) use of the new high-level description language (HLDL) u Release 3: LCFG architecture roughly retained, but n HLDL replaces LCFG source file syntax n HLDL files accessed via new configuration database (akin to API wrapper round CVS repository) n XML profile much as before n Redesigned (probably Perl) components interpret profile through more substantial API/libraries (registration, dependency analysis, …) Single Configure() call to component does everything Generalized updaterpms may handle non-RPM formats

15 Olof Bärring – WP4 summary- 4/9/2002 - n° 15 Summary u Substantial amount of s/w piled up from R1.3, R1.4 to be deployed now u R2 also includes two large components: n LCFGng – migration is non-trivial but we already perform as much as the non-trivial part ourselves so TB integration should be smooth n Complete monitoring framework u Beyond R2: LCFG is not future for EDG WP4. First version of new configuration and node management system in March 2003


Download ppt "Olof Bärring – WP4 summary- 4/9/2002 - n° 1 Partner Logo WP4 report Plans for testbed 2 [Including slides prepared by Lex Holt.]"

Similar presentations


Ads by Google