Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks The Grid Observatory: goals and challenges.

Similar presentations


Presentation on theme: "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks The Grid Observatory: goals and challenges."— Presentation transcript:

1 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks The Grid Observatory: goals and challenges C. Germain-Renaud (CNRS/LRI & LAL) EGEE’07 Conference Budapest, Hungary 1-5 October 2007

2 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 2 Overview NA4 cluster in EGEE-III proposal Integrate the collection of data on the behaviour of the EGEE grid and users with the development of models and of an ontology for the domain knowledge

3 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 3 Some immediate questions Ressource allocation –Performance of the gLite scheduling hierarchy –Published waiting time –Reactive grids – Everybody's grid Dimensioning –Patterns and trends in requests and usage –Anticipate peaks On-line fault management –Detection –Diagnosis –Prevention

4 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 4 The big picture Considering current technologies, we expect that the total number of device administrators will exceed 220 millions by 2010 – Gartner June 2001 No more Moore’s Law free lunch: much more complex software & applications The Virtual Organization concept creates common goods

5 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 5 Autonomic Computing Computing systems that manage themselves in accordance with high-level objectives from humans. Kephart & Chess A vision of Autonomic Computing, IEEE Computer 2003 –Self-*: configuration, optimization, healing, protection –Of open non steady state dynamic systems

6 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 6 Autonomic Computing Computing systems that manage themselves in accordance with high-level objectives from humans. Kephart & Chess A vision of Autonomic Computing, IEEE Computer 2003 –Self-*: configuration, optimization, healing, protection –Of open non steady state dynamic systems –Academic and industry involved

7 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 7 Autonomic Grids Statistical analysis Data mining Machine learning monitor analyze plan execute knowledge DATA REQUIRED

8 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 8 Data Collection and Publication Acquisition, consolidation, long-term conservation of traces of EGEE activities –Permanent storage of reliable, exhaustive, filtered information –Exhaustive: added value in snapshots of the inputs and grid state e.g. workload and available services during a relevant time range –Filtered: from operational to structured No join ! L&B schema

9 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 9 Data Collection and Publication Acquisition, consolidation, long-term conservation of traces of EGEE activities –Permanent storage of reliable, exhaustive, filtered information: from operational to structured –No monitoring development: rich ecosystem of sources, with very different scopes, deployment and institutional status –Centralized CIC tools (GOCDB, SAM, SFT,…), core gLite (L&B, BDII,…) sites (Maui/PBS logs) gLite integrators (R-GMA, Job Provenance) experience integrators (DashBoard) external software (MonaLisa)

10 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 10 Data Collection and Publication Acquisition, consolidation, long-term conservation of traces of EGEE activities –Permanent storage of reliable, exhaustive, filtered information: from operational to structured –No monitoring development: rich ecosystem of sources, with very different scopes, deployment and institutional status The major challenge is exhaustive –Some data are outside the scope: external traffic on shared resources –Inside the scope, we need snapshots of the grid state and inputs –Privacy related legal constraints –Scientific usage will help –Interaction with EGI –Long-term: privacy-preserving data mining

11 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 11 Data Collection and Publication Publication service: navigation and querying –Integration of independent sources –Indexing along the needs of the users communities  Scheduling: ongoing work with CoreGrid  Jobs: ongoing work with KDUbik Ontology –The Glue Information Model: an ontology of the resources –Concepts for the grid dynamics e.g. job lifecycle or users relations –Expert concepts as prior knowledge of non-trivial correlations: workflows, failure modes,… Resource Job

12 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 12 Models Intrinsic characterizations of «grid traffic»: (distribution of) e.g. job arrival rate, running time, application data locality –Likely to be similar to IP traffic: many short, and a significant number of long, at all scales –Long range dependencies

13 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 13 Models Intrinsic characterizations of «grid traffic»: (distribution of) e.g. job arrival rate, running time, application data locality –Likely to be similar to IP traffic: many short, and a significant number of long, at all scales –Long range dependencies Characterizations of middleware-dependant metrics e.g. queuing delays, overhead, SE load

14 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 14 Models Intrinsic characterizations of «grid traffic»: (distribution of) e.g. job arrival rate, running time, application data locality –Likely to be similar to IP traffic: many short, and a significant number of long, at all scales –Long range dependencies Characterizations of middleware-dependant metrics e.g. queuing delays, SE load Inference of models for middleware components and applications, users and usage profiles, users interactions

15 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 15 Autonomic dependability On-line failure detection and anticipation Passive vs Active probing : a lot of information is available from user work Black-box –On-line statistics from « similar » actions (executions, data access, middleware modules)

16 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 16 Evaluation Assessing performance at the grid scale is a challenge –Need a snapshot of the inputs and grid state e.g. workload and available services during a relevant time range –Classical optimization does not scale –Advanced optimization: anytime algorithms

17 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 17 Abrupt changepoint detection Page-Hinckley statistics Time-sequential version of Wald’s statistics – also known as CUSUM « intelligent threshold » test which minimizes the expected time before a change detection for a fixed false positive rate Routine in quality control, clinical trials VO software bug Blackhole

18 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 18 Autonomic dependability On-line failure detection and anticipation Passive vs Active probing : a lot of information is available from user work Black-box –On-line statistics from « similar » actions (executions, data access, middleware modules) Supervised and unsupervised learning

19 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 19 Mining the L&B logs Constructive induction Double clustering

20 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 20 Autonomic dependability On-line failure detection and anticipation Passive vs Active probing : a lot of information is available from user work Black-box –On-line statistics from « similar » actions (executions, data access, middleware modules) Supervised and unsupervised learning Active probing –Adaptive on-line test selection for best coverage of possibly faulty components –Experience planning

21 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Application Track - Grid Observatory 21 Goals & Challenges Contributions to a quantitative approach of grid middleware and architecture, in the RISC sense Operational impacts on EGEE: evaluation, autonomic dependability Basic research in autonomic computing Collaboration between EGEE and national research initiatives and other UE projects: DEMAIN, PASCAL KD-Ubiq, CoreGrid, and hopefully more Adequate tradeoff between productivity and sustainability


Download ppt "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks The Grid Observatory: goals and challenges."

Similar presentations


Ads by Google