Interfacing a Managed Local Fabric to the GRID LCG Review Tim Smith IT/FIO.

Slides:



Advertisements
Similar presentations
CERN – BT – 01/07/ Cern Fabric Management -Hardware and State Bill Tomlin GridPP 7 th Collaboration Meeting June/July 2003.
Advertisements

Overview of local security issues in Campus Grid environments Bruce Beckles University of Cambridge Computing Service.
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
“Managing a farm without user jobs would be easier” Clusters and Users at CERN Tim Smith CERN/IT.
Introduction to DBA.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
ASIS et le projet EU DataGrid (EDG) Germán Cancio IT/FIO.
CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Authorizing Grid Resource Access and Consumption Erik Elmroth, Michał.
From Entrepreneurial to Enterprise IT Grows Up Nate Baxley – ATLAS Rami Dass – ATLAS
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Managing Mature White Box Clusters at CERN LCW: Practical Experience Tim Smith CERN/IT.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
material assembled from the web pages at
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Olof Bärring – WP4 summary- 4/9/ n° 1 Partner Logo WP4 report Plans for testbed 2
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
Presented by: Reem Alshahrani. Outlines What is Virtualization Virtual environment components Advantages Security Challenges in virtualized environments.
Large Farm 'Real Life Problems' and their Solutions Thorsten Kleinwort CERN IT/FIO HEPiX II/2004 BNL.
Virtual Batch Queues A Service Oriented View of “The Fabric” Rich Baker Brookhaven National Laboratory April 4, 2002.
Deployment work at CERN: installation and configuration tasks WP4 workshop Barcelona project conference 5/03 German Cancio CERN IT/FIO.
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
MTA SZTAKI Hungarian Academy of Sciences Introduction to Grid portals Gergely Sipos
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
Managing the CERN LHC Tier0/Tier1 centre Status and Plans March 27 th 2003 CERN.ch.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
The OSG and Grid Operations Center Rob Quick Open Science Grid Operations Center - Indiana University ATLAS Tier 2-Tier 3 Meeting Bloomington, Indiana.
High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
EGEE is a project funded by the European Union under contract IST Package Manager Predrag Buncic JRA1 ARDA 21/10/04
The EDG Testbed The European DataGrid Project Team
CERN - IT Department CH-1211 Genève 23 Switzerland Operations procedures CERN Site Report Grid operations workshop Stockholm 13 June 2007.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Mario Reale – GARR NetJobs: Network Monitoring Using Grid Jobs.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
II EGEE conference Den Haag November, ROC-CIC status in Italy
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
Grid Deployment Technical Working Groups: Middleware selection AAA,security Resource scheduling Operations User Support GDB Grid Deployment Resource planning,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Managing Large Linux Farms at CERN OpenLab: Fabric Management Workshop Tim Smith CERN/IT.
Regional Operations Centres Core infrastructure Centres
ALICE and LCG Stefano Bagnasco I.N.F.N. Torino
WP4 Fabric Management 3rd EU Review Maite Barroso - CERN
Sergio Fantinel, INFN LNL/PD
Status and plans of central CERN Linux facilities
GGF15 – Grids and Network Virtualization
Patrick Dreher Research Scientist & Associate Director
Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002
Wide Area Workload Management Work Package DATAGRID project
The Problem ~6,000 PCs Another ~1,000 boxes But! Affected by:
Presentation transcript:

Interfacing a Managed Local Fabric to the GRID LCG Review Tim Smith IT/FIO

2003/11/18Fabric Grid Interface: Contents  Fabrics, GRIDs and the interface  Interfacing  User Management  Security  Worker Nodes  SW distribution  Gateway Nodes  Milestones

2003/11/18Fabric Grid Interface: Converging Development Fronts  Tier-0 Fabric Automation  Production  Software and Configuration Managers LXBATCH  Development/Prototypes  State and HW management systems, Monitoring, Fault tolerance  LCG-0, LCG-1,… LCG-2  Deploying entire SW stack (including fabric!)  GRID Services and Operations  Fabric – GRID Integration  Establishing the boundaries and interfaces

2003/11/18Fabric Grid Interface: GRID Services Architecture  Extra SW and Services  Matching of Procedures  Matching of Environment Underlying GRID Services Fabric Services The DreamReality GRID Application Layer Collective Services

2003/11/18Fabric Grid Interface: Prototype Service Lifecycle Focuses  Proliferation, Elaboration  Focus on functionality  Performance and scalability  Simplification, Automation  Focus on uniformity, minimisation  Process and procedure  Availability and reliability  Stability and robustness Production  Risks  Destabilisation  Workload

2003/11/18Fabric Grid Interface: User Management  Passwd file handling > Certificate handling  Authentication: Gathering from VOs  Authorisation  Mapping to local identities (real or pool)  Building gridmap files from widely gathered info sources  Current integration issues  Named accounts > Temporarily assigned accounts  12,000 personal on LXPLUS  LCG-1: 50 static pool accounts  LCG-2: 80 dynamic pool accounts  Accounting and Auditing  Feedback on usage  Blocking / cleaning out dead-wood  Open Issues  VOMS / AuthZ integration  Registration – interface centralised to local procedures  User interaction - Notification of service change or incidents  User + Service Manager familiarity with new services

2003/11/18Fabric Grid Interface: Prototype User Management  Focus on adding users  avoiding being a blockage to new user uptake  Avoiding accumulated dead- wood and dormant accounts  minimise security exposures and recooperate resources Production  Risks  Multiple authorisations confusion  c.f. multiple groups of the late 1990s

2003/11/18Fabric Grid Interface: Security  Security  Extending: local practices to encompass grid demands  Instead of processing and raising alarms …  Collection and storage of sufficient history of raw files  No history of attack patterns  Adapting: Tracking of incidents and blocking of compromised accounts – but now anonymous accounts and certificates to be blocked  Audit requirements  Incident response procedures  Revocation procedures  Risks:  Exposure: Incident propagation requires coordinated approach

2003/11/18Fabric Grid Interface: SW distribution  OpSys and Common Applications   SPMA and NCM  HTTP as SW distribution protocol  Load balanced server cluster  Pre-caching of SW packages on the node possible  Examples  LSF upgrade; 1000 nodes, 10 minutes, no interruption  Kernel upgrade; multiple version support, reboot later  Security upgrades; weekly, big KDE patch  Risks:  EDG toolkit – long term support

2003/11/18Fabric Grid Interface: SW distribution  GRID + Application middleware  Packaging approaches: RPM / tar / PACMAN  Automating a workstation orientated approach  Complying to enterprise management requirements  Configuration complexity  Bulky, supplier orientated  1050 RPMs for LXBATCH; 220 extra for simple WN  Work In Progress  Tuning dependencies  Trimming unnecessary/conflicting SW and services  gcc, python, tomcat  Risks: Push aside the production knowledge  Harder user support, SW maintenance, incident handling

2003/11/18Fabric Grid Interface: SW distribution  Applications – Experiment SW  Rapid release cycles  Balancing User and Administrator preferences  Experiments desire for control of installation, validation and publication  Efficient local access; leverage SW distribution tools  Local Disks vs Shared file systems  Pragmatic approach in LCG-1  No intelligent cache yet so either  Copy at the start of every job  Shared file systems  Risks:  Duplication; Wasted resources in creation and housing  Hidden demands on reliability and stability of shared file systems

2003/11/18Fabric Grid Interface: Batch Worker Nodes  Solved some scalability issues like  NFS mounts of CE on WNs  Job Managers to address Gass-cache issues  Interfacing to a mature batch scheduler  Build on lowest common denominator approach  Rudimentary use of batch scheduler power  Shifting scheduler decisions higher up the chain  Expose non-homogenous HW as multiple queues  Open Issues  Wide area network access to/from batch nodes  Only 30% of 1000 LXBATCH nodes left on routed network  Risks:  Drop in efficiency while learn to share with new queues

2003/11/18Fabric Grid Interface: GRID Service Nodes  Functionality deployed: RB, CE, SE, UI, …  Open Issues  Coping with realistic loads – jobs; long running, complex, chaotic  Scalability  Redundancy  Operations  Learn to capture state and restart on other nodes  Learn how to upgrade without service interruptions  Do the services “age” or “pollute” the host nodes?

2003/11/18Fabric Grid Interface: Milestones  Fabric  Production: SPMA and NCM  July and November  Development: SMS, Monitoring, FT, HMS  October, November, September, October  On track for Q  GRID Integration  August: 10 LXBATCH nodes integrated in LCG-1 (isolated from main LXBATCH)  5 in LCG-0, manually configured  Iterating on automation and productionising  October: 100 LXBATCH nodes…  December: Full integration of LXBATCH  Delayed until early 2004

2003/11/18Fabric Grid Interface: Conclusions  A production fabric has  Inertia … as a virtue!  Charted QoS  Scalability  Procedures and Manageability  Cautious introduction of GRID services  Retain qualities and add functionality!