EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks The Grid Observatory: goals and challenges.

Slides:



Advertisements
Similar presentations
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Advertisements

INFSO-RI Enabling Grids for E-sciencE Grid Applications -- Cyprus Contribution to EGEE Organization: HPCL, University Of Cyprus.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks NA3 – Training & Induction Robin McConnell,
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
1 Martin Schulz, Lawrence Livermore National Laboratory Brian White, Sally A. McKee, Cornell University Hsien-Hsin Lee, Georgia Institute of Technology.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The network monitoring in grid context Operations.
INFSO-RI Enabling Grids for E-sciencE ES applications in EGEEII – M. Petitdidier –11 February 2008 Earth Science session Wrap up.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ?? Athens, May 5-6th 2009 Community Support.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks PPS All sites Meeting: Introduction & Agenda.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks, An Overview of the GridWay Metascheduler.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Bazaar Vision Ideas of RC/VO coordination,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE Gergely Sipos
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
Enabling Grids for E- sciencE EGEE and gLite are registered trademarks EGEE-III INFSO-RI Analysis of Overhead and waiting times.
Towards a Global Service Registry for the World-Wide LHC Computing Grid Maria ALANDES, Laurence FIELD, Alessandro DI GIROLAMO CERN IT Department CHEP 2013.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE II: an eInfrastructure for Europe and.
CEOS WGISS-21 CNES GRID related R&D activities Anne JEAN-ANTOINE PICCOLO CEOS WGISS-21 – Budapest – 2006, 8-12 May.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Grid Observatory Cluster NA4 F2F meeting 03/28/2008.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Design of an Expert System for Enhancing.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Services for advanced workflow programming.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Torsten.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Gergely Sipos Activity Deputy Manager MTA.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Antonio Retico CERN, Geneva 19 Jan 2009 PPS in EGEEIII: Some Points.
INFSO-RI Enabling Grids for E-sciencE GridICE: Grid and Fabric Monitoring Integrated for gLite-based Sites Sergio Fantinel INFN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGI Operations Tiziana Ferrari EGEE User.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
The Vision of Autonomic Computing Self-Management Unit 7-2 Managing the Digital Enterprise Kephart, and Chess.
Julia Andreeva on behalf of the MND section MND review.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Resource Allocation in EGEEIII Overview &
Università di Perugia Enabling Grids for E-sciencE Status of and requirements for Computational Chemistry NA4 – SA1 Meeting – 6 th April.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Deliverable DSA1.4 Jules Wolfrat ARM-9 –
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks User Support for Distributed Computing Infrastructures.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CharonGUI A Graphical Frontend on top of.
INFSO-RI SA2 ETICS2 first Review Valerio Venturi INFN Bruxelles, 3 April 2009 Infrastructure Support.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ian Bird All Activity Meeting, Sofia
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks InGRID: A Generic Autonomous Expert System.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Fault Detection and Diagnosis in the EGEE grid C. Germain-Renaud, X. Zhang, M. Sebag.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Software Licensing in the EGEE Grid infrastructure.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
EGEE-II INFSO-RI Enabling Grids for E-sciencE NA3 PAR review at EGEE'07 Conference, Budapest, 4 October, 2007 EGEE-II NA3 Activities.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid is a Bazaar of Resource Providers and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Mining Job Monitoring Data Automatic Error.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Nov. 18, EGEE and gLite are registered trademarks High-End Computing - Clusters.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Robin McConnell Activity Manager UEDIN (NeSC)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations automation team presentazione.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Bob Jones EGEE project director.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GOCDB4 Gilles Mathieu, RAL-STFC, UK An introduction.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks IT ROC: Vision for EGEE III Tiziana Ferrari.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
Regional Operations Centres Core infrastructure Centres
The Grid Observatory SSC Towards a Computer Science SSC
Cécile Germain-Renaud Grid Observatory meeting 19 October 2007 Orsay
EGEE and autonomic computing
Information System (BDII)
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Grid Observatory: goals and challenges C. Germain-Renaud (CNRS/LRI & LAL) EGEE’07 Conference Budapest, Hungary 1-5 October 2007

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 2 Overview NA4 cluster in EGEE-III proposal Integrate the collection of data on the behaviour of the EGEE grid and users with the development of models and of an ontology for the domain knowledge

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 3 Some immediate questions Ressource allocation –Performance of the gLite scheduling hierarchy –Published waiting time –Reactive grids – Everybody's grid Dimensioning –Patterns and trends in requests and usage –Anticipate peaks On-line fault management –Detection –Diagnosis –Prevention

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 4 The big picture Considering current technologies, we expect that the total number of device administrators will exceed 220 millions by 2010 – Gartner June 2001 No more Moore’s Law free lunch: much more complex software & applications The Virtual Organization concept creates common goods

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 5 Autonomic Computing Computing systems that manage themselves in accordance with high-level objectives from humans. Kephart & Chess A vision of Autonomic Computing, IEEE Computer 2003 –Self-*: configuration, optimization, healing, protection –Of open non steady state dynamic systems

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 6 Autonomic Computing Computing systems that manage themselves in accordance with high-level objectives from humans. Kephart & Chess A vision of Autonomic Computing, IEEE Computer 2003 –Self-*: configuration, optimization, healing, protection –Of open non steady state dynamic systems –Academic and industry involved

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 7 Autonomic Grids Statistical analysis Data mining Machine learning monitor analyze plan execute knowledge DATA REQUIRED

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 8 Data Collection and Publication Acquisition, consolidation, long-term conservation of traces of EGEE activities –Permanent storage of reliable, exhaustive, filtered information –Exhaustive: added value in snapshots of the inputs and grid state e.g. workload and available services during a relevant time range –Filtered: from operational to structured No join ! L&B schema

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 9 Data Collection and Publication Acquisition, consolidation, long-term conservation of traces of EGEE activities –Permanent storage of reliable, exhaustive, filtered information: from operational to structured –No monitoring development: rich ecosystem of sources, with very different scopes, deployment and institutional status –Centralized CIC tools (GOCDB, SAM, SFT,…), core gLite (L&B, BDII,…) sites (Maui/PBS logs) gLite integrators (R-GMA, Job Provenance) experience integrators (DashBoard) external software (MonaLisa)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 10 Data Collection and Publication Acquisition, consolidation, long-term conservation of traces of EGEE activities –Permanent storage of reliable, exhaustive, filtered information: from operational to structured –No monitoring development: rich ecosystem of sources, with very different scopes, deployment and institutional status The major challenge is exhaustive –Some data are outside the scope: external traffic on shared resources –Inside the scope, we need snapshots of the grid state and inputs –Privacy related legal constraints –Scientific usage will help –Interaction with EGI –Long-term: privacy-preserving data mining

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 11 Data Collection and Publication Publication service: navigation and querying –Integration of independent sources –Indexing along the needs of the users communities  Scheduling: ongoing work with CoreGrid  Jobs: ongoing work with KDUbik Ontology –The Glue Information Model: an ontology of the resources –Concepts for the grid dynamics e.g. job lifecycle or users relations –Expert concepts as prior knowledge of non-trivial correlations: workflows, failure modes,… Resource Job

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 12 Models Intrinsic characterizations of «grid traffic»: (distribution of) e.g. job arrival rate, running time, application data locality –Likely to be similar to IP traffic: many short, and a significant number of long, at all scales –Long range dependencies

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 13 Models Intrinsic characterizations of «grid traffic»: (distribution of) e.g. job arrival rate, running time, application data locality –Likely to be similar to IP traffic: many short, and a significant number of long, at all scales –Long range dependencies Characterizations of middleware-dependant metrics e.g. queuing delays, overhead, SE load

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 14 Models Intrinsic characterizations of «grid traffic»: (distribution of) e.g. job arrival rate, running time, application data locality –Likely to be similar to IP traffic: many short, and a significant number of long, at all scales –Long range dependencies Characterizations of middleware-dependant metrics e.g. queuing delays, SE load Inference of models for middleware components and applications, users and usage profiles, users interactions

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 15 Autonomic dependability On-line failure detection and anticipation Passive vs Active probing : a lot of information is available from user work Black-box –On-line statistics from « similar » actions (executions, data access, middleware modules)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 16 Evaluation Assessing performance at the grid scale is a challenge –Need a snapshot of the inputs and grid state e.g. workload and available services during a relevant time range –Classical optimization does not scale –Advanced optimization: anytime algorithms

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 17 Abrupt changepoint detection Page-Hinckley statistics Time-sequential version of Wald’s statistics – also known as CUSUM « intelligent threshold » test which minimizes the expected time before a change detection for a fixed false positive rate Routine in quality control, clinical trials VO software bug Blackhole

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 18 Autonomic dependability On-line failure detection and anticipation Passive vs Active probing : a lot of information is available from user work Black-box –On-line statistics from « similar » actions (executions, data access, middleware modules) Supervised and unsupervised learning

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 19 Mining the L&B logs Constructive induction Double clustering

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 20 Autonomic dependability On-line failure detection and anticipation Passive vs Active probing : a lot of information is available from user work Black-box –On-line statistics from « similar » actions (executions, data access, middleware modules) Supervised and unsupervised learning Active probing –Adaptive on-line test selection for best coverage of possibly faulty components –Experience planning

Enabling Grids for E-sciencE EGEE-II INFSO-RI Application Track - Grid Observatory 21 Goals & Challenges Contributions to a quantitative approach of grid middleware and architecture, in the RISC sense Operational impacts on EGEE: evaluation, autonomic dependability Basic research in autonomic computing Collaboration between EGEE and national research initiatives and other UE projects: DEMAIN, PASCAL KD-Ubiq, CoreGrid, and hopefully more Adequate tradeoff between productivity and sustainability