Log analysis and user traceability Eygene Ryabinkin, Russian Research Centre «Kurchatov Institute» March, 12 th 2009,

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyProxy and EGEE Ludek Matyska and Daniel.
Advertisements

Legacy code support for commercial production Grids G.Terstyanszky, T. Kiss, T. Delaitre, S. Winter School of Informatics, University.
Configuration management
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
A3.1 Assignment 3 Simple Job Submission Using GT 4 GRAM.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
MS CRM Integration WhosOn Service Integration Presentation MS CRM User Group.
1 SuccessFactors Proprietary and Confidential © 2011 SuccessFactors, Inc. All rights reserved. Creating a Bulk Import Job in Quartz Dan Hayes Senior Technical.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Supporting MPI Applications on EGEE Grids Zoltán Farkas MTA SZTAKI.
Edit the text with your own short phrases. The animation is already done for you; just copy and paste the slide into your existing presentation. RegisterLogin.
Evaluating Web Server Log Analysis Tools David Strom SD’98 2/13/98.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Introduction to UNIX/Linux Exercises Dan Stanzione.
Chapter 7 Designing Classes. Class Design When we are developing a piece of software, we want to design the software We don’t want to just sit down and.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
Welcome Today Our Topics are: DNS (The Potential Problem for Complete Anonymity) Transparent DNS Proxy (The Problem & The Solution) How To.
BaBar WEB job submission with Globus authentication and AFS access T. Adye, R. Barlow, A. Forti, A. McNab, S. Salih, D. H. Smith on behalf of the BaBar.
DIRAC Web User Interface A.Casajus (Universitat de Barcelona) M.Sapunov (CPPM Marseille) On behalf of the LHCb DIRAC Team.
Web application architecture
The EDGeS project receives Community research funding 1 Specific security needs of Desktop Grids Desktop Grids Desktop Grids EDGeS project EDGeS project.
Feeds Module for Drupal 7 Use Cases, Overview, and Walktroughs.
1 In the good old days... Years ago… the WWW was made up of (mostly) static documents. –Each URL corresponded to a single file stored on some hard disk.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
How to Make an ILLiad Request Here are step-by-step instructions on how to request material from Interlibrary Loan.
1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.
Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
Belle MC Production on Grid 2 nd Open Meeting of the SuperKEKB Collaboration Soft/Comp session 17 March, 2009 Hideyuki Nakazawa National Central University.
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
F.Pacini - Milan - 8 May, n° 1 Results of Meeting on Workload Manager Components Interaction DataGrid WP1 F. Pacini
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
The EDGeS project receives Community research funding 1 SG-DG Bridges Zoltán Farkas, MTA SZTAKI.
M1G Introduction to Database Development 2. Creating a Database.
1 Sergio Maffioletti Grid Computing Competence Center GC3 University of Zurich Swiss Grid School 2012 Develop High Throughput.
Andrew McNabETF Firewall Meeting, NeSC, 5 Nov 2002Slide 1 Firewall issues for Globus 2 and EDG Andrew McNab High Energy Physics University of Manchester.
CERN IT Department CH-1211 Genève 23 Switzerland t MSG status update Messaging System for the Grid First experiences
INFSO-RI Enabling Grids for E-sciencE Installation and configuration of gLite Resource Broker Emidio Giorgio INFN EGEE-EMBRACE tutorial,
New perfSonar Dashboard Andy Lake, Tom Wlodek. What is the dashboard? I assume that everybody is familiar with the “old dashboard”:
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Tracking your tasks with Task Monitoring PAT eLearning – Module 11 Edward.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
Grid Operations Centre LCG Accounting Trevor Daniels, John Gordon GDB 8 Mar 2004.
Some Title from the Headrer and Footer, 19 April Overview Requirements Current Design Work in Progress.
The Metadata Tool Custom Metadata Tool Who this tool is for: This tool designed to be used a data management system. This tool is geared more for the.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
OSG AuthZ components Dane Skow Gabriele Carcassi.
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
INFSO-RI Enabling Grids for E-sciencE /10/20054th EGEE Conference - Pisa1 gLite Configuration and Deployment Models JRA1 Integration.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks User traceability and log analysis tools.
Data Management The European DataGrid Project Team
WP3 Security and R-GMA Linda Cornwall. WP3 UserVOMS service authr map pre-proc authr LCAS LCMAPS pre-proc LCAS Coarse-grained e.g. Spitfire WP2 service.
BQS integration in gLite-CE TCG meeting, CERN 01/11/2006 Sylvain Reynaud, Fabio Hernandez.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
EGI Technical Forum Amsterdam, 16 September 2010 Sylvain Reynaud.
EGEE is a project funded by the European Union under contract INFSO-RI Grid accounting with GridICE Sergio Fantinel, INFN LNL/PD LCG Workshop November.
Dave Kant LCG Accounting Overview GDA 7 th June 2004.
1 Configuration Database David Forrest University of Glasgow RAL :: 31 May 2009.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
SQL Database Management
Managing State Chapter 13.
Tango Administrative Tools
Cristina del Cano Novales STFC - RAL
New Functionality in ARIN Online
Web Application Development Using PHP
Open data in teaching and education
Presentation transcript:

Log analysis and user traceability Eygene Ryabinkin, Russian Research Centre «Kurchatov Institute» March, 12 th 2009, OSCT-7 meeting, Madrid

lcg-CE logs: general ideas CE logs link Grid jobs to the local jobs, so they are the most logical point to start from. Jobmap logs are available and they have almost all information: user DN, VOMS FQAN, Grid (EDG) and LRMS job IDs, local user mapping and gatekeeper contact. With LRMS ID we can trace the job down to the execution nodes. For Torque, one can use either accounting logs or plain job logs. Don't currently know about SGE in YAIM's flavour.

lcg-CE logs: additional details Jobmap logs are laid out by date: file names are grid-jobmap_YYYYMMDD. This is very good and handy. Jobmap logs are missing the IP address of the client, so one should also parse gatekeeper logs – oops! GK logs are huge and ugly, the only unique identifier that links jobmap entry to GK entries is Grid (EDG) job ID. IP address lookup involves GK JM ID lookup and search for IP on the previous entries.

lcg-CE logs: file locations /var/log/globus-gatekeeper.*: most verbose logs about jobs that gatekeeper processes. /opt/edg/var/gatekeeper/grid-jobmap_*: summaries of job run by lcgpbs and friends. /var/spool/pbs/server_priv/accounting/*: Torque logs that carry most activity traces, we are mainly interested in start/end events. /var/spool/pbs/server_logs/*: carry more verbose Torque logs, but exist only on the Torque server, not necessarily on the CE.

lcg-CE logs: parsing modes – 1 Typical problem 1: some user with a known DN executed some jobs on the local farm in a given interval of time. Find these jobs and, possibly, dig out their details. Solution: use 'job-search' parsing mode providing the DN (regex, really) of user and time interval. This gives the list of jobs for this user. Modifier '--dig-lrms' instructs the tool to look up job statistics from the LRMS records (currently Torque-only using accounting logs).

lcg-CE logs: parsing modes – 2 Typical problem 2: we want to trace the job by its Grid ID. Solution: use 'job-search' parsing mode providing Grid job ID. Jobmap logs are parsed in the time-reversed order and search terminates on the first hit (Grid job IDs are unique), so recent jobs will be found rather quickly. '--dig-lrms' can be used to get LRMS job particulars.

lcg-CE logs: parsing modes – 3 Typical problem 3: find jobs that are submitted using pure Globus (not LCG/gLite) methods in the given time frame. The rationale is to look who is submitting jobs directly to our CE. Solution: use 'job-search' parsing mode providing the time range and specifying '--only- direct' switch. This mode will catch only LRMS jobs: usages of fork jobmanager won't be catched.

lcg-CE logs: parsing modes – 4 Typical problem 4: find all jobs that were using 'fork' jobmanager (direct execution on CE host). This parsing mode is not finished, but 'job- search' with modifier '--only-fork' and a time range will do the work. One problem is that here we need to parse full gatekeeper logs and extract records that aren't correspond to a regular non-fork jobs. Since normal jobs also use fork jobmanager to spawn grid- monitor/Condor-C, the problem isn't fairly trivial.

lcg-CE logs: GridFTP We also have GridFTP logs on the CE. Do we really need to parse them too? The best request we easily process is the following one: please, find all GridFTP activity for the given user in the given time frame. We can try relate various GridFTP sessions and even tie them to the jobs, but this will involve heuristics and checks won't be easy. So, the question is: do we need this?

lcg-CE logs: current status Have a toolset to trace jobs by their Grid (EDG) ID, user DN and to find pure Globus jobs. The toolset is currently refactored to provide the framework for doing log lookup on other node types and to abstract file parsers from analysis core. Current language is Perl, but I thinking about Python variant – it can be faster and cleaner. Will show the tools to the public after some refactoring and polishing.

lcg-CE logs: roadmap Finish 'fork' jobs detection. SGE support: Sun GridEngine is currently supported by gLite too, although user base isn't fairly large now. Add more bells and whistles to the current tools: limit the number of job records, provide command to find most active users, etc. Probably implement parsing of GridFTP logs. Anything else I had missed.

RB/LB logs: ideas and questions No real code written, only research/planning. RBs are now slightly out-of-fashion, people like WMS, but still, we have some working RBs. LB has the database where bookkeeping information is stored and we can use old good SQL to interrogate it. But Daniel said that we – shouldn't use pure SQL, because of possible schema changes; – it doesn't have all useful information.

LB logs: ideas and questions Daniel also said that there should be a better way to interrogate LB database, but I always used plain SQL to do it up to now. Gathered data will be the same as one provided by 'edg-job-logging-info'. One distinction is that the use of 'job-logging-info' is subject to ACLs, direct usage of SQL DB – isn't. In the case of combined LB/RB (or LB/WMS) can also extract some information from the SandBox directory.

RB logs: GridFTP GridFTP logs on RBs are minimal: no session traces, just accounting data in /var/log/edg-wl- in.ftpd.log. No user DN's, only poolaccount user names. Some path names carry job IDs, so we can identify user sessions and can relate them to the jobs – this could be handy. In principle, it is sometimes interesting to know who got user's output sanbox, so we probably should try to parse these logs.

WMS/Cream CE No real work was done up to date, only planning. I have WMS instance, so I plan to research on what data could be collected from this node type. I expect that job traces simular to RB ones and download upload records (both GridFTP and HTTP) will be available. Cream CE instance is going to be deployed in a couple of months. Once it will be up – I'll analyze it too.

Data management: SE logs Only in plans, no real work was done. Can only speak about DPM SE for now: have no dCache instance. As a recall from the SSC2, DNPS and DPM logs have some shared identifiers that can be used to relate the records in the various log files. Needs more analysis: I hadn't concentrated on the DM logs yet.

Thanks! Thanks for Daniel Kouril for presenting this stuff and discuissing/advicing on the presentation. Thanks to everyone listened to this session.

Questions? Suggestions? Feel free to ask ;))