SSC2 and Update on Multi-user Pilot Jobs Framework Mingchao Ma, STFC – RAL HEPSysMan Meeting 20/06/2008.

Slides:



Advertisements
Similar presentations
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Advertisements

Jan 2010 Current OSG Efforts and Status, Grid Deployment Board, Jan 12 th 2010 OSG has weekly Operations and Production Meetings including US ATLAS and.
IPv6 testing plans 25 Jan Short term – next 6 weeks Add sites to testbed – Glasgow (DPM storage end point) – Fix DESY – Others? Is GridFTP mesh.
David Groep Nikhef Amsterdam PDP & Grid Traceability in the face of Clouds EGI-GEANT Symposium – cloud security track With grateful thanks for the input.
OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.
EGEE ARM-2 – 5 Oct LCG Security Coordination Ian Neilson LCG Security Officer Grid Deployment Group CERN.
Enabling Grids for E-sciencE EGEE III Security Training and Dissemination Mingchao Ma, STFC – RAL, UK OSCT Barcelona 2009.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Security Update Mingchao Ma HEPSYSMAN - Security 1 st July 2009.
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
Mine Altunay OSG Security Officer Open Science Grid: Security Gateway Security Summit January 28-30, 2008 San Diego Supercomputer Center.
Use of Condor on the Open Science Grid Chris Green, OSG User Group / FNAL Condor Week, April
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks David Kelsey RAL/STFC,
8-Jul-03D.P.Kelsey, LCG-GDB-Security1 LCG/GDB Security (Report from the LCG Security Group) RAL, 8 July 2003 David Kelsey CCLRC/RAL, UK
UKI ROC/GridPP/EGEE Security Mingchao Ma Oxford 22 October 2008.
Mine Altunay July 30, 2007 Security and Privacy in OSG.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
LCG Pilot Jobs + glexec John Gordon, STFC-RAL GDB 7 November 2007.
LCG/EGEE Security Operations HEPiX, Fall 2004 BNL, 22 October 2004 David Kelsey CCLRC/RAL, UK
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Summary of AAAA Information David Kelsey Infrastructure Policy Group, Singapore, 15 Sep 2008.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
Security Operations David Kelsey GridPP Deployment Board 3 Mar 2005
DTI Mission – 29 June LCG Security Ian Neilson LCG Security Officer Grid Deployment Group CERN.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Auditing Project Architecture VERY HIGH LEVEL Tanya Levshina.
OSG Site Admin Workshop - Mar 2008Using gLExec to improve security1 OSG Site Administrators Workshop Using gLExec to improve security of Grid jobs by Alain.
LCG Support for Pilot Jobs John Gordon, STFC GDB December 2 nd 2009.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
LCG Pilot Jobs and glexec John Gordon.
DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE and JSPG activities David Kelsey CCLRC/RAL.
JSPG Update David Kelsey MWSG, Zurich 31 Mar 2009.
LHCb Pilot Job Tests. We have not started this test due to several reasons: our analysis job submission (ganga) was not interfaced in time to DIRAC3,
WLCG Information System Use Cases Review WLCG Operations Coordination Meeting 18 th June 2015 Maria Alandes IT/SDC.
Analysis of job submissions through the EGEE Grid Overview The Grid as an environment for large scale job execution is now moving beyond the prototyping.
26/01/2007Riccardo Brunetti OSCT Meeting1 Security at The IT-ROC Status and Plans.
The GridPP DIRAC project DIRAC for non-LHC communities.
Placeholder ES 1 CERN IT EGI Technical Forum, Experiment Support group AAI usage, issues and wishes for WLCG Maarten Litmaath CERN.
LCG Pilot Jobs + glexec John Gordon, STFC-RAL GDB 7 December 2007.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
Maarten Litmaath (CERN), EGEE’08 1 Pilot Job Frameworks Review Introduction Summary GDB presentation.
EMI is partially funded by the European Commission under Grant Agreement RI Argus Policies Tutorial Valery Tschopp (SWITCH) – Argus Product Team.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Security aspects (based on Romain Wartel’s.
INFSO-RI Enabling Grids for E-sciencE Operational Security Coordination Team OSCT report EGEE-4, Pisa Ian Neilson, CERN.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Questionnaires to Cloud technology providers and sites Linda Cornwall, STFC,
Trusted Virtual Machine Images the HEPiX Point of View Tony Cass October 21 st 2011.
Running User Jobs In the Grid without End User Certificates - Assessing Traceability Anand Padmanabhan CyberGIS Center for Advanced Digital and Spatial.
OSG VO Security Policies and Requirements Mine Altunay OSG Security Team July 2007.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 17 – IT Security.
Why you should care about glexec OSG Site Administrator’s Meeting Written by Igor Sfiligoi Presented by Alain Roy Hint: It’s about security.
Maarten Litmaath, GDB, 2008/06/11 1 Pilot Job Frameworks Review GDB working group mandated by WLCG MB on Jan. 22, 2008 Mission –Review security issues.
Multi User Pilot Jobs update
John Gordon, STFC-RAL GDB 10 October 2007
LCG/EGEE Incident Response Planning
glexec/SCAS pilot service
Grid Deployment Board meeting, 8 November 2006, CERN
IPv6 update Duncan Rand Imperial College London
The LHCb Computing Data Challenge DC06
Presentation transcript:

SSC2 and Update on Multi-user Pilot Jobs Framework Mingchao Ma, STFC – RAL HEPSysMan Meeting 20/06/2008

Slide 2 Security Service Challenge What is it? How does it work? SSC 2 - UKI ROC experience

Slide 3 SSC - What is it? “The goal of the LCG/EGEE Security Service Challenge, is to investigate whether sufficient information is available to be able conduct an audit trace as part of an incident response, and to ensure that appropriate communications channels are available ” Like a fire drill!

Slide 4 SSC – Why and How? To check if communication channel among involved parties (Sites, VOs and Security contacts etc) is functioning; Exercises for system admins to trace users’ activities and to know various logfiles; Not intrusive – only ‘legal’ operations; No penetration and no execution of exploits; Conduct and monitor by OSCT and ROC security Officers; –CERN challenges ALL Tier1 sites; –ROC security officer challenges Tier2 sites within that ROC

Slide 5 Security Service Challenge SSC 1: challenges the Workload Management System (WMS) on the Grid: Resource Broker (RB) and Compute Element (CE) (2005) SSC 2: challenges the Storage Elements on the Grid (2007/2008) SSC 3: challenges the Operational Diligence of the LCG/EGEE Grid Sites (ongoing)

Slide 6 SSCs - UKI ROC Security Service Challenge 2 –22 Tier2 sites (SEs) UKI ROC were challenged by ROC security officer Security Service Challenge 3 –RAL Tier1 was challenged by CERN on 06 March deployment/ssc/SSC_2/SSC_2_google.html

Slide 7 Security Service Challenge 2 Timeline –From 21 January 2008 to 10 March 2008 –In total 22 sites (SEs) challenged –Job submission: from 21 Jan. to 28 Jan –4 weeks (Feb. 2008) cool down period –GGUS ticket opened: 03 March 2008 –Challenge completed: 5pm 10 March 2008

Slide 8 Security Service Challenge 2 Basic Statistic –22 SEs/Sites challenged, of which: One site failed to run challenge job; One site is opt out of the challenge due to site re-built; One site is no longer part of EGEE Grid; Initial response received from the 21 sites; 18 sites acknowledged the initial alert ticket within 24 hours; 2 site acknowledged ticket within 48 hours; 1 site acknowledge ticket within 72 hours;

Slide 9 Security Service Challenge 2 - Result

Slide 10 Security Service Challenge 2 Preliminary Analysis –All responsed sites (18) found some traces of the job activities and at least identified one SE operation –Communication channel seems to work well; Most sites acknowledged ticket within 24 hours 1 sites was within 72 hours, where a new staff has no support role in GGUS, therefore unable to answer the ticket

Slide 11 Security Service Challenge 2 Issues observed –None of 19 sites were able to identity the Lookup operation –Some sites only provided RAW logs (though correct part of log) information with little or no analysis –A few sites experienced log missing (accidentally deleted log file due to mis-configuration; log retention is only a month, again due to mis-configuration or lost log files due to system-rebuilt etc.) –SE’s logs (syntax and format) are still too complex; it seems that it is very difficult to fully rebuild some operations (site configuration? Or Insufficient log information?); Too many logfiles!

Slide 12 Multi-user Pilot Jobs Framework

Slide 13 What is multi-user pilot Job? A multi-user pilot job, hereafter referred to simply as a pilot job, is a Grid job for which the following holds*: –a Grid job is submitted with a set of credentials belonging to either a member of the VO or to a service owned and operated by the VO –when this Grid job begins to execute at a Site, it pulls down and executes workload, hereafter called a user job, owned and submitted by a different member of the VO or multiple user jobs owned and submitted by multiple different members of the VO *Policy on Grid Multi-User Pilot Jobs

Slide 14 Pilot Jobs Framework A VO/Experiment-specific Workload Management System (WMS): –CMS glideinWMS –LHCb DIRAC WMS –ATLAS PanDA –ALICE ???

Slide 15 A Simplified Diagram End User Central Job Repository/VO-Specific WMS VOMS ServerMy Proxy ServerOthers Worker Node(s) Pilot Job Glexec User Job Site 1 Jobs + Proxy Submit Pilot Job + Pilot Proxy Get User Jobs & User Proxy Worker Node(s) Pilot Job Glexec User Job Site 2

Slide 16 Pilot Job Frameworks Review Workgroup GDB working group mandated by WLCG MB on Jan. 22, 2008 Mission –Review security issues in the pilot job framework of each experiment Pilot jobs are taken as multi-user in this context –Define a minimum set of security requirements –Advise on improvements Per framework or common to all –Report to GDB and MB Time frame is a few months Members –ALICE: Predrag Buncic –ATLAS: Torre Wenaus –CMS: Igor Sfiligoi –LHCb: Andrei Tsaregorodtsev –WLCG: Maarten Litmaath (chair) –EGEE: David Groep –FNAL: Eileen Berman –GridPP: Mingchao Ma –OSG: Mine Altunay * Content from Maarten Litmaath, GDB, 2008/06/11

Slide 17 Questionnaire Describe in a schematic way all components of the system. –If a component needs to use IPC to talk to another component for any reason, describe what kind of authentication, authorization, integrity and/or privacy mechanisms are in place. If configurable, specify the typical, minimum and maximum protection you can get. Describe how user proxies are handled from the moment a user submits a task to the central task queue to the moment that the user task runs on a WN, through any intermediate storage. What happens around the identity change on the WN, e.g. how is each task sandboxed and to what extent? How can running processes be accounted to the correct user? How is a task spawned on the WN and how is it destroyed? How can a site be blocked?

Slide 18 Questionnaire (cont.) What site security processes are applied to the machine(s) running the WMS? –Who is allowed access to the machine(s) on which the service(s) run, and how do they obtain access? –How are authorized individuals authenticated on the machine(s)? –What is the process for keeping the service(s) and OS patched and up-to-date, especially with respect to security patches? –Do you have an identified security contact? –Describe the incident response plan to deal with security incidents and reports of unauthorized use? –What services (in general) run on the machine(s) that offer the WMS service? –What processes exist to maintain audit logs (e.g. for use during an incident)? –What monitoring exists on the machine(s) to aid detection of security incidents or unauthorized use? Can you limit the users that can submit jobs to the VO WMS? How?