LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.

Slides:

Advertisements

Similar presentations

Physics with SAM-Grid Stefan Stonjek University of Oxford 6 th GridPP Meeting 30 th January 2003 Coseners House.

Advertisements

Chapter 20 Oracle Secure Backup.

4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.

Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini

1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.

Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.

GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.

1 PUNCH PUNCH (Purdue University Network Computing Hubs) is a distributed network-computing infrastructure that allows geographically dispersed users to.

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

6/2/20071 Grid Computing Sun Grid Engine (SGE) Manoj Katwal.

GRID Workload Management System Massimo Sgaravatto INFN Padova.

Workload Management Massimo Sgaravatto INFN Padova.

DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.

Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug

© 2012 IBM Corporation Tivoli Workload Automation Informatica Power Center.

The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.

QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.

GRID job tracking and monitoring Dmitry Rogozin Laboratory of Particle Physics, JINR 07/08/ /09/2006.

KARMA with ProActive Parallel Suite 12/01/2009 Air France, Sophia Antipolis Solutions and Services for Accelerating your Applications.

Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.

Dynamic Firewalls and Service Deployment Models for Grid Environments Gian Luca Volpato, Christian Grimm RRZN – Leibniz Universität Hannover Cracow Grid.

Tech talk 20th June Andrey Grid architecture at PHENIX Job monitoring and related stuff in multi cluster environment.

Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.

TRASC Globus Application Launcher VPAC Development Team Sudarshan Ramachandran.

3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.

COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.

QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.

CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.

GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.

© 2008 Open Grid Forum Independent Software Vendor (ISV) Remote Computing Primer Steven Newhouse.

BOSCO Architecture Derek Weitzel University of Nebraska – Lincoln.

1 Report on CHEP 2007 Raja Nandakumar. 2 Synopsis  Two classes of talks and posters ➟ Computer hardware ▓ Dominated by cooling / power consumption ▓

PNPI HEPD seminar 4 th November Andrey Shevel Distributed computing in High Energy Physics with Grid Technologies (Grid tools at PHENIX)

Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,

Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.

WNoDeS – Worker Nodes on Demand Service on EMI2 WNoDeS – Worker Nodes on Demand Service on EMI2 Local batch jobs can be run on both real and virtual execution.

Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”

Andrey Meeting 7 October 2003 General scheme: jobs are planned to go where data are and to less loaded clusters SUNY.

Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.

Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -

What is SAM-Grid? Job Handling Data Handling Monitoring and Information.

1 Media Grid Initiative By A/Prof. Bu-Sung Lee, Francis Nanyang Technological University.

GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Status report on Application porting at SZTAKI.

APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.

NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.

SAN DIEGO SUPERCOMPUTER CENTER Inca Control Infrastructure Shava Smallen Inca Workshop September 4, 2008.

Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing

AMH001 (acmse03.ppt - 03/7/03) REMOTE++: A Script for Automatic Remote Distribution of Programs on Windows Computers Ashley Hopkins Department of Computer.

Aneka Cloud ApplicationPlatform. Introduction Aneka consists of a scalable cloud middleware that can be deployed on top of heterogeneous computing resources.

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.

Tool Integration with Data and Computation Grid “Grid Wizard 2”

The Gateway Computational Web Portal Marlon Pierce Indiana University March 15, 2002.

INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.

Microsoft ® Official Course Module 6 Managing Software Distribution and Deployment by Using Packages and Programs.

INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.

CSF. © Platform Computing Inc CSF – Community Scheduler Framework Not a Platform product Contributed enhancement to The Globus Toolkit Standards.

Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.

Breaking the frontiers of the Grid R. Graciani EGI TF 2012.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.

GridWay Overview John-Paul Robinson University of Alabama at Birmingham SURAgrid All-Hands Meeting Washington, D.C. March 15, 2007.

Interacting with the cluster ssh, sftp, & slurm batch scripts

Workload Management Workpackage

GWE Core Grid Wizard Enterprise (

Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel

Wide Area Workload Management Work Package DATAGRID project

Presentation transcript:

LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.

© Platform Computing Inc Contents  Overview  How it Works  Security  Summary  Q & A

© Platform Computing Inc Overview  LSF Universus is an extension of LSF that provides a secure, transparent, one-way interface from an LSF cluster to any foreign cluster.  A foreign cluster is a local or remote cluster managed by a non-LSF workload management system.  Universus allows organizations to tie multiple clusters running various workload management systems together into a single logical cluster.  Provides users with a single, secure interface to all the computing resources

© Platform Computing Inc Overview LSFPBSSGECCE Cluster/Desktops LSF Scheduler Web PortalJob Scheduler Cluster/Desktops LSF Scheduler MultiCluster

© Platform Computing Inc Benefits  Users can use all computing resources without having to log into them and using only LSF commands to submit, monitor and control jobs  Centralized scheduler, but sites retain local control  Local users can continue using local resources during and after implementation  Low cost, fully supported solution  As secure as you need it to be

© Platform Computing Inc Current Implementations  Sandia National Laboratory Used to link OpenPBS, PBS Pro, and LSF Clusters in New Mexico and California  Singapore National Grid (NG) Used “to provide seamless access to NGPP resources” Resources include LSF, PBS Pro, and N1GE clusters Completed in Q  Distributed European Infrastructure for Supercomputing Applications (DEISA) Universus has been deployed on top of the DEISA native batch systems to enable users to easily access resources allocations on different remote platforms.

© Platform Computing Inc Requirements & Assumptions  Needs to support any foreign workload management systems that provides a command line interface and/or supports embedded directives in job script  Needs to support Kerberos authentication  No shared file system between LSF master and execution host  All file transfers and job content must be encrypted  LSF daemons are installed on the head node of the remote cluster

© Platform Computing Inc Optional Runtime Output How it Works

© Platform Computing Inc Uses LSF Extension Facilities  esub Records job submission options Ensures that job submission record is copied to the execution host  Job Starter Reads in the job submission record and builds the command line to submit the job to the foreign workload management system Monitors job status and reports the information back to LSF Propagates kill, suspend, and resume signals to the remote job Reads job stdout and stderr into LSF output stream in real time  lsrcp Uses scp to securely transfer files to and from execution hosts  eauth Uses Kerberos to authenticate users, hosts, and services

© Platform Computing Inc LSF to PBS Option Map

© Platform Computing Inc Security  LSF Universus provides for secure file transfers by replacing the standard LSF lsrcp program with a wrapper script that calls scp  Kerberos is supported. Just use the Kerberized version of LSF and ssh  Setting the LSF_RSH=ssh causes LSF to use ssh instead of rsh  While the LSF Kerberos integration provides LSF daemon authentication, daemon communication is not encrypted

© Platform Computing Inc Secure File Transfers

© Platform Computing Inc Summary LSF Universus is:  A proven, fully-supported solution built using the standard LSF extension mechanisms  A meta-scheduler that sits on top of, and schedules to, a variety and extensible set of workload management systems  Cost-effective because it only requires LSF to be installed on the head-node of each resource  Easy to implement since sites can retain control of their resources, and since the resources can be used during roll-out

© Platform Computing Inc Summary LSF Universus is:  Open and extensible. The executables that comprise the solution are scripts that can be understood and modified to suit your needs  As secure as you need it to be. Standard, ssh, and Kerberos authentication are available, or you could extend the solution to support PKI  Practical. Universus collects job resource information that be used for metering and accounting.

Platform LSF