EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Mining Job Monitoring Data Automatic Error.

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Infrastructure overview Arnold Meijster &
Advertisements

Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks R. Brunetti INFN-Torino The Italian Regional.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Operations Dashboard Workplan Cyril.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks VO-specific systems for the monitoring of.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Julia Andreeva CERN (IT/GS) CHEP 2009, March 2009, Prague New job monitoring strategy.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Steven Newhouse EGEE’s plans for transition.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The network monitoring in grid context Operations.
INFSO-RI Enabling Grids for E-sciencE ES applications in EGEEII – M. Petitdidier –11 February 2008 Earth Science session Wrap up.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LOFAR Archive Information System Kor Begeman.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROD model assessment ROC UKI John Walsh.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks S. Natarajan (CSU) C. Martín (UCM) J.L.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Monitoring and enforcement of Service Level Agreements John Shade EGEE-II / EGEE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Bazaar Vision Ideas of RC/VO coordination,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Introduction to GILDA and gaining access.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA2 Quality Plan for EGEE III Geneviève.
Enabling Grids for E- sciencE EGEE and gLite are registered trademarks EGEE-III INFSO-RI Analysis of Overhead and waiting times.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite Build Programme and Multi-Platform.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Stephen Childs Trinity College Dublin &
Enabling Grids for E-sciencE EGEE-III INFSO-RI EGEE Training Follow-on survey 2009 Conducted on past EGEE training course participants 120 respondents.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Design of an Expert System for Enhancing.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Services for advanced workflow programming.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Abel Carrión Ignacio Blanquer Vicente Hernández.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Dashboard Cyril L’Orphelin - CNRS/IN2P3.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Enabling the use of e-Infrastructures with.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Xavier Jeannin (CNRS/UREC Paris, FR) 24.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Vassiliki Pouli
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Data Mining and Decision Support
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks New Authorization Service Christoph Witzig,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Computational chemistry with ECCE on EGEE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks C. Martín, A. Lorca (UCM) Introduction to.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Programming with the DRMAA OGF Standard.
Data Mining By: Johan Johansson. Mining Techniques Association Rules Association Rules Decision Trees Decision Trees Clustering Clustering Nearest Neighbor.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ian Bird All Activity Meeting, Sofia
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks InGRID: A Generic Autonomous Expert System.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Fault Detection and Diagnosis in the EGEE grid C. Germain-Renaud, X. Zhang, M. Sebag.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Operations WS: Introduction & Objectives.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Workflow management tool for Earth science applications Ladislav Hluchy, Viet Tran Institute of Informatics.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Bioinformatics activity Christophe BLANCHET.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Grid Observatory: goals and challenges.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Study on Authorization Christoph Witzig,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks NA3 Resources Robin McConnell.
MND section. Summary of activities Job monitoring In collaboration with GridView and LB teams enabled full chain from LB harvester via MSG to Dashboard.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Nov. 18, EGEE and gLite are registered trademarks High-End Computing - Clusters.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Astrophysical Cluster Session Claudio Vuerli,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Towards an Information System Product Team.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GOCDB4 Gilles Mathieu, RAL-STFC, UK An introduction.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Constraints on primordial non-Gaussianity.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks An insight into GOCDB for ROD Operators.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Status of the SAM/Nagios/GSTAT Components.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining 101 with Scikit-Learn
Waikato Environment for Knowledge Analysis
An Excel-based Data Mining Tool
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Presentation transcript:

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Mining Job Monitoring Data Automatic Error Source Detection of Grid Job Failures using Data Mining Techniques Gerhild Maier September 24 th 2008

Enabling Grids for E-sciencE EGEE-III INFSO-RI Mining Job Monitoring Data Gerhild Maier 2 Problem Description  We have... … a lot of information about jobs in the Dashboard database … exit codes … many tools to monitor jobs  We don’t have … … a clear classification of all exit codes; application exit codes are sometimes misleading  We want... … to look at the underlying problem … an automatic detection of the error source, the problematic Grid component … a generic tool for all big LHC experiments … a simple tool with few specification needed from the user

Enabling Grids for E-sciencE EGEE-III INFSO-RI Mining Job Monitoring Data Gerhild Maier 3 Approach  Step 1: data preprocessing –How much job information? –How many data sets?  Step 2: data mining –Supervised or unsupervised method? –Clustering? Classification? Decision tree? Association rules?  Step 3: output representation –Where to present the output? –Textual or graphical representation?

Enabling Grids for E-sciencE EGEE-III INFSO-RI Mining Job Monitoring Data Gerhild Maier 4 Step 1: data preprocessing  consider six job characteristics –username –site –computing element –storage element –filename –exit code  good/bad classification with Support Vector Machines  select job information over a two day period

Enabling Grids for E-sciencE EGEE-III INFSO-RI Mining Job Monitoring Data Gerhild Maier 5 Step 2: data mining (1/2)  Association Rule Mining –find frequent item sets in the database –item: attribute - value pair (e.g. site=CERN-PROD) –rule: {A, B}  {C}, where A, B, C are items and –support: how much data includes A, B and C? –confidence: if A, B are included, how much data also includes C? –e.g. {username=xxx, ce=cmsgrid02.hep.wisc.edu}  {exit code = 70500}  Example: CMS job monitoring data –2 day period –42667 analysis jobs –49 rules with exit code in the consequent of the rule

Enabling Grids for E-sciencE EGEE-III INFSO-RI Mining Job Monitoring Data Gerhild Maier 6 Step 2: data mining (2/2) Find frequent item set Create association rules Pruning the rules to eliminate redundancies … rule 1 rule 2... rule n rule 1 rule 2 … rule k item set 1 item set 2 item set n Apriori Algorithm Pruning Algorithm Set of association rules Job Monitoring Information of the Dashboard Database

Enabling Grids for E-sciencE EGEE-III INFSO-RI Mining Job Monitoring Data Gerhild Maier 7 Step 3: output representation(1/2)  QAOES: Quick Analysis Of Error Source  textual representation of the association rules

Enabling Grids for E-sciencE EGEE-III INFSO-RI Mining Job Monitoring Data Gerhild Maier 8 Step 3: output representation(2/2)  graphical representation of the rules  each line corresponds to one rule  each point corresponds to an item  {username=user224, site=GRIF}  {exitcode=10034}

Enabling Grids for E-sciencE EGEE-III INFSO-RI Mining Job Monitoring Data Gerhild Maier 9 Outlook  adapt the statistical measurement to define a rule as interesting in the pruning step  provide the prototype to shifters of the ATLAS distributed production system to help tracking errors