Proxy management mechanism and gLExec integration with the PanDA pilot Status and perspectives.

Slides:



Advertisements
Similar presentations
Wei Lu 1, Kate Keahey 2, Tim Freeman 2, Frank Siebenlist 2 1 Indiana University, 2 Argonne National Lab
Advertisements

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Haga clic para cambiar el estilo de título Haga clic para modificar el estilo de subtítulo del patrón DIRAC Framework A.Casajus and R.Graciani (Universitat.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Report Distribution Report Distribution in PeopleTools 8.4 Doug Ostler & Eric Knapp 7264.
Implementing ISA Server Caching. Caching Overview ISA Server supports caching as a way to improve the speed of retrieving information from the Internet.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
LcgCAF:CDF submission portal to LCG Federica Fanzago for CDF-Italian Computing Group Gabriele Compostella, Francesco Delli Paoli, Donatella Lucchesi, Daniel.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
Microsoft SharePoint Server 2010 for the Microsoft ASP.NET Developer Yaroslav Pentsarskyy
The ACGT Workflow Editing & Enactment Environment Giorgos Zacharioudakis Institute of Computer Science, Foundation for Research & Technology – Hellas (ICS-FORTH)
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
Interactive Job Monitor: CafMon kill CafMon tail CafMon dir CafMon log CafMon top CafMon ps LcgCAF: CDF submission portal to LCG resources Francesco Delli.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Security and Job Management.
DIRAC Review (13 th December 2005)Stuart K. Paterson1 DIRAC Review Exposing DIRAC Functionality.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
EMI is partially funded by the European Commission under Grant Agreement RI Argus Policies Tutorial Valery Tschopp - SWITCH EGI TF Prague.
OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
The SAM-Grid / LCG Interoperability Test Bed Gabriele Garzoglio ( ) Speaker: Pierre Girard (
Pilot Jobs John Gordon Management Board 23/10/2007.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
LCG Pilot Jobs + glexec John Gordon, STFC-RAL GDB 7 November 2007.
Virtualised Worker Nodes Where are we? What next? Tony Cass GDB /12/12.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Module 8: Managing Software Distribution. Collections Packages Programs Advertisements Collections Packages Programs Advertisements How Software.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
Dispatching Java agents to user for data extraction from third party web sites Alex Roque F.I.U. HPDRC.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
LCG Pilot Jobs and glexec John Gordon.
DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.
OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL May 8 th, 2008.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
WLCG Authentication & Authorisation LHCOPN/LHCONE Rome, 29 April 2014 David Kelsey STFC/RAL.
HPC pilot code. Danila Oleynik 18 December 2013 from.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
Microsoft ® Official Course Module 6 Managing Software Distribution and Deployment by Using Packages and Programs.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
LCG Pilot Jobs + glexec John Gordon, STFC-RAL GDB 7 December 2007.
Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.
Security and VO management enhancements in Panda Workload Management System Jose Caballero Maxim Potekhin Torre Wenaus Presented by Maxim Potekhin at HPDC08.
EMI is partially funded by the European Commission under Grant Agreement RI Argus Policies Tutorial Valery Tschopp (SWITCH) – Argus Product Team.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino Simone Campana.
Honolulu - Oct 31st, 2007 Using Glideins to Maximize Scientific Output 1 IEEE NSS 2007 Making Science in the Grid World - Using Glideins to Maximize Scientific.
Argus EMI Authorization Integration
Jean-Philippe Baud, IT-GD, CERN November 2007
L’analisi in LHCb Angelo Carbone INFN Bologna
OGF PGI – EDGI Security Use Case and Requirements
Multi User Pilot Jobs update
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Grid2Win: Porting of gLite middleware to Windows XP platform
Grid Deployment Board meeting, 8 November 2006, CERN
Initial job submission and monitoring efforts with JClarens
Presentation transcript:

Proxy management mechanism and gLExec integration with the PanDA pilot Status and perspectives

Agenda Architectural review Integration status Site deployment Perspectives

Introduction WLCG Grid Job and Worker Node Security Assessment: o o User traceability o Sandboxed execution of payloads o Local policy enforcement gLExec: o Provides proxy management on the WN o Toolset to manage the sandbox o Can change UID transparently

Architectural review PanDA: o Late binding pilot-based framework (pull) The Pilot Job and gLExec: o Handle the payload and user proxy o Sandbox and detach o Monitor and kill o Multi process, multi user environment

Pilot and gLExec PanDA Server PanDA pilot gLExec Computing Element … Payload MyProxy Job User proxy Job PanDA pilot gLExec Payload

Wrapper pilot Main process pilot Sandboxed pilot process user Payload process user t setUpEnvironment() createSandbox() runGlexec() startMonitoringLoop() startPayload() payload() sendHeartbeat() kill() cleanup()

Pilot Collect local info Job recovery Additional cleanup Multi-job loop Check proxy Check local space Collect local info Get job Fork sub process Clean up Monitoring loop Check log size (Check workDir) Looping check Check local space Setup job Transfer input Transfer output Execute payload Signal handler Abort and clean up gLExec Signal handler Abort and clean up

Pilot refactoring Two processes running: o Regular pilot process (download payloads) o Sandboxed process (monitoring loop) Process status encapsulation Multi user: o Pilot process ↔ Isolated sandbox o Transfer environments securely

Integration status: done New monitoring loop o Configuration handler o Serialize and encapsulate process status Sandbox creation o Status transfer, user proxy retrieval o Copy over the binaries Subprocess detaching

Integration status: pending Fine tuning the sandboxed process o Lots of different data to pass into the sandbox o And to retrieve afterwards Batch system signals handling Payload completion Time floor testing with different users Time scale: still work to do.

Issues Code structure: o Monolithic: much refactor needed. But done. o No consensus on environments:  Many options for: initdir, workdir, sandbox, mkgltmpdir, stagein/out... Not clear definition in all batch systems.  O(n 4 ) possible sanbox configurations. gLExec testing sandbox: o Kindly provided by Maarten.

Scalability Every worker node to reach MyProxy: o Artificially high load o There is no need if the PanDA Server:  Gets proxies and caches them o Security concerns for payload transfer o The worker node is not a trusted entity  Allowing all of them is not recommended

MyProxy 4. User proxy and job 3. User proxy 2. User proxy 2. Job PanDA pilot glExec Computing Element … Job PanDA pilot glExec Job PanDA client 1. User proxy and job PanDA Server 5. Notification: Proxy about to expire * The user credential is forwarded to the WN as part of the payload metadata by PanDA. Implementationwise, simple caching schemes would be desirable to reduce the load imposed on MyProxy. Proxy cache *

Perspectives Need to define new responsibles: o Developers, o Testers.