INFSO-RI Enabling Grids for E-sciencE GridICE: status and plans for gLite integration and user level job monitoring Sergio Andreozzi
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi OUTLINE GridICE: –The current release status short-term development activities (for LCG 2.5) Current deployment GridICE Architecture –Moving from LCG 2.x to gLite
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi GridICE release - sensors status –release pl2 –Integrated in LCG –YAIM based installation and configuration –Quattor-based installation, validation in progress –SL3/IA64 porting Short-term development activities –LRMSinfo integration this sensor provides aggregated information about a farm using LRMS status commands (e.g. pbsnodes, bhosts, bqueue,...) Per-farm info: total slots, free slots, average CPU load, list of WN’s in down state –integration of latest LEMON release
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi GridICE release - Server status –release 1.8.0b new presentation layer the final release will be released after the validation phase validation installation on
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi GridICE release- Server web interface improvements: –new navigation –duality HTML/XML for monitoring data (support for both persons and machines as consumers of monitoring data) –site name as an administrative concept –refactoring of all views –new charts section generation of custom graphs (jobs statistics, jobs w/t vs time, jobs R/Q vs time)
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi GridICE release - server notification service [2] alarms about hosts/services faults scalable up to thousand of users/subscriptions thanks to an efficient XML filtering engine (Yfilter) development activities –web interface for configuring notifications –new geographical view –promote users’ feedback, collecting new requirements to improve the usability of the interface and the easy access to the data of interest
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi GridICE release - deployment GridICE deployment status –Grid.it project –GILDA –LCG –LCG South-West federation –LCG South-East federation –SEE-GRID –E-grid project –CMS experiment –ATLAS experiment egee007.cnaf.infn.it (development GridICE server) currently is monitoring EGEE+INFNGRID sites GRIS's clusters batch queues - O(100) collective services (CE, SE, RB, BDII, GC,...)
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi GridICE from LCG 2.x to gLite
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi The four main steps of monitoring 1.Generation of events 2.Processing 3.Distributing 4.Presenting [3]
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi GridICE Architecture ResourceSite Publisher Sensor event collector event provider consumer publisher WAN LAN publishers logical componentsroles GridICE ServerConsumer WAN consumers
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi Monitoring: generating events generation of events, that is, sensors enquiring entities and encoding the measurements according to a given schema; –need for a schema defining structure and semantics of the events of interest –attributes can be categorized according to their dynamics over time, thus implying different timing for sensors –passive/active measurement –intrusive/non-intrusive measurement this activity is typically referred as METERING
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi Monitoring: generating events generation of events in GridICE: –Sensors: typically perl scripts –Schema: GLUE Schema v.1.1 GridICE extension for jobs/daemons/network –All sensors are executed in a periodic fashion
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi Monitoring: distributing Distribution refers to the transmission of the events from the source to any interested parties; –consumer patterns may vary from sparse interactions to long lived subscriptions for receiving a constant stream of events –data delivery model: [4] push vs. pull; periodic vs. aperiodic; unicast vs. l-to-N –security: certain scenarios may require a monitoring service to support security services (e.g., access control, single or mutual authentication of parties, and secure transport)
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi Monitoring: processing processing of generated events is application-specific and may take place during any stage of the monitoring process –E.g., filtering, aggregation, grouping
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi Monitoring: presenting Presentation –an overwhelming number of events are available to the final consumer; –A key feature of a monitoring system is to provide a series of abstractions in order to enable an end-user to draw conclusions about the operation of the monitored system
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi GridICE Architecture ResourceSite Publisher Sensor event collector event provider consumer publisher WAN LAN publishers Lemon srv Lemon agt LDAP Client MDS GRIS scripts HTTP: HTML/XML NS GridICE on LCG 2logical componentsroles GridICE ServerConsumer WAN xML: pull,aperiodic,unicast NS: push,aperiodic,unicast Browser Data delivery model pull,periodic,unicast push,periodic,unicast application consumers
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi towards gLite Addressing security!!! The security of monitoring data is becoming an high- priority requirement Required changes: –Enable site managers to filter outgoing monitoring data based on the Grid credentials of the requestor –In particular, support for VOMS extensions in order to be able that, for instance, “a super-user of CMS can collect all information about jobs for its VO”
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi Possible scenario GridICE server https+user proxy cert. Secure connection between the final consumer and the GridICE server Filtering data based on user identity/role/vo
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi Possible scenario GridICE server Secure connection between GridICE server and Producers of monitoring data ATLAS VOMS server CMS VOMS server generating a valid proxy query resources
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi towards gLite Losing a common Grid Information Service the Globus MDS 2 is not anymore “the Grid Information Service” Need for dealing with different producers of information –different data models, different interfaces Required changes: –GridICE server: plugin system to support different producers of data –Site Publisher: replace the MDS GRIS with a new service, possibly a gLite component
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi towards gLite Avoiding proliferation of sensors More and more applications have their own metering service Performance issue: –Intrusive measurements can affect the efficiency of applications (e.g., batch system) Privacy issue: –Not all sites allow to access log files Collaborate for common metering systems (see GridICE-DGAS use case) Make agreements with sites on a common strategy to access the data of interest
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi GridICE and DGAS Common Metering for Grid jobs DGAS is an accounting system, therefore is interested in knowing the usage-related parameters of a job after its execution GridICE is a monitoring system, therefore is interested in knowing the job-related information since the job is created in the queue –The information should be updated frequently and provided to users respecting the security concerns queued running aborted deleted executed GridICE DGAS
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi Worker Node CE WMS LRMS CE Node Job wrapper gianduia WN API call Job execution gianduia DGAS CE Push CEMon JobMon 5 GridICE-DGAS metering integration
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi GridICE on gLite Resource Site Publisher Sensor event collector event provider consumer publisher WAN LAN publishers Lemon srv Lemon agent CEMon scripts HTTP: HTML/XML NS GridICE on gLitelogical componentsroles GridICE Server Consumer WAN xML: pull,aperiodic,unicast NS: push,aperiodic,unicast Browser Data delivery model pull,periodic,unicast push,periodic,unicast RGMA application consumers MDS2 consumers G
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi GridICE Server Persistent storage DiscoveryConsumersScheduler XSLT->HTML XML abstraction XMLNotification S.Charts components that need to be revised when migrating to gLite
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi Summary GridICE in LCG 2.x GridICE in gLite SchemaGLUE 1.1++GLUE 1.1++/GLUE Local Area DistributionLemon UDP/TCP Lemon UDP/TCP Site PublisherMDS GRIS (LDAP) no security CEMon/R-GMA http+gsi+proxy+voms ext. DiscoveryBDII (LDAP) Service Discovery API Wide Area Data Distribution LDAP/PullSOAP/pull (push) Notificationfixed number of eventscontent-based subscription
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi Timeline for gLite integration Phase 0 (end July’05) –DGAS/GridICE metering integration for Grid jobs –Replacement of the GridICE GRIS with CEMon (secure transport) –GridICE server supporting both LCG and gLite resources Phase 1 (Sep/Oct’05) –Discovery based on service discovery API –CEMon with authorization based on VOMS extensions to the proxy –GridICE server restricting access to monitoring data based on the user’s proxy certificate using the server certificate to create a VOMS-based proxy in order to query the publishers of data
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi CONCLUSION GridICE is a stable monitoring tool for the LCG production system The evolution of LCG into gLite: –requires some change in the producer interface –the need for re-engineering will be exploited to satisfy new requirements to improve the current system, e.g.: to double-check the current monitoring schema to optimize metering service
Enabling Grids for E-sciencE INFSO-RI EGEE/LCG Operation Workshop - Bologna, 25 May S. Andreozzi References [1] S. Andreozzia, N. De Bortoli, S. Fantinel, A. Ghiselli, G. L. Rubini, G. Tortone, M. C. Vistoli GridICE: a monitoring service for Grid systems, Future Generation Computer System [2] S. Andreozzi, N. De Bortoli, S. Fantinel, G.L. Rubini, G. Tortone. Design and Implementation of a Notification Model for Grid Monitoring Events. CHEP04, Interlaken (CH), Sep [3] S. Zanikolas, R. Sakellariou, A taxonomy of grid monitoring systems, Future Generation Computer Systems 21 (2005) 163–188 [4] M. Franklin, S. Zdonik, “Data In Your Face”: Push Technology in Perspective, ACM SIGMOD ’98, Seattle, WA, USA