The Grid - Multi-Domain Distributed Computing Kai Rasmussen Paul Ruggieri.

Slides:



Advertisements
Similar presentations
The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
Advertisements

Distributed Systems basics
High Performance Computing Course Notes Grid Computing.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Universität Dortmund Robotics Research Institute Information Technology Section Grid Metaschedulers An Overview and Up-to-date Solutions Christian.
Notes to the presenter. I would like to thank Jim Waldo, Jon Bostrom, and Dennis Govoni. They helped me put this presentation together for the field.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © Chapter 1, pp For educational use only.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Globus Computing Infrustructure Software Globus Toolkit 11-2.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Grid Toolkits Globus, Condor, BOINC, Xgrid Young Suk Moon.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
DISTRIBUTED COMPUTING
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Grid Computing - AAU 14/ Grid Computing Josva Kleist Danish Center for Grid Computing
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Grid Security Issues Shelestov Andrii Space Research Institute NASU-NSAU, Ukraine.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Grid Security 1. Grid security is a crucial component Need for secure communication between grid elements  Authenticated ( verify entities are who they.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Using NMI Components in MGRID: A Campus Grid Infrastructure Andy Adamson Center for Information Technology Integration University of Michigan, USA.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
The Globus Project: A Status Report Ian Foster Carl Kesselman
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Tools for collaboration How to share your duck tales…
Grid Middleware Tutorial / Grid Technologies IntroSlide 1 /14 Grid Technologies Intro Ivan Degtyarenko ivan.degtyarenko dog csc dot fi CSC – The Finnish.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Authors: Ronnie Julio Cole David
MTA SZTAKI Hungarian Academy of Sciences Introduction to Grid portals Gergely Sipos
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
1 P-GRADE Portal: a workflow-oriented generic application development portal Peter Kacsuk MTA SZTAKI, Hungary Univ. of Westminster, UK.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
7. Grid Computing Systems and Resource Management
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Globus Grid Tutorial Part 2: Running Programs Across Multiple Resources.
The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no Workflow repository, user.
2/22/2001Greenbook 2001/OASCR1 Greenbook/OASCR Activities Focus on technology to enable SCIENCE to be conducted, i.e. Software tools Software libraries.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Globus: A Report. Introduction What is Globus? Need for Globus. Goal of Globus Approach used by Globus: –Develop High level tools and basic technologies.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
The GRIDS Center, part of the NSF Middleware Initiative Grid Security Overview presented by Von Welch National Center for Supercomputing.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Clouds , Grids and Clusters
Peter Kacsuk – Sipos Gergely MTA SZTAKI
Grid Computing.
University of Technology
Presentation transcript:

The Grid - Multi-Domain Distributed Computing Kai Rasmussen Paul Ruggieri

Topic Overview  The Grid  Types  Virtual Organizations  Security  Real Examples  Grid Tools  Condor  Cactus  Cactus-G  Globus  OGSA

The Grid  What is a Grid system?  Highly heterogeneous set of resources that may or may not be maintained by multiple administrative domains  Early idea  Computational resources would be universally available as electric power

“A hardware and software infrastructures that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities” - Ian Foster Resources are distributed across sites and organizations with no centralized point of control Resources are distributed across sites and organizations with no centralized point of control What constitutes a Grid? What constitutes a Grid? Resources coordinated without being subjected to a centralized control Resources coordinated without being subjected to a centralized control Uses standard, open source protocols and interfaces Uses standard, open source protocols and interfaces Delivers non-trivial qualities of service Delivers non-trivial qualities of service

Grid Types  Computations Grids  Resource pure CPU  Strength: Computational Intensive applications  Data Grids  Shared storage and data  Terabytes of storage space.  Sharing of data among collaborators  Fault Tolerance  Equipment Grids  Set of resources that surround shared equipments, such as a telescope

Virtual Organizations  Grids are Multi-domain  Resources are administrated by separate departments or institutions  All wish to maintain individual control  There is a cross site grouping of collaborators sharing resources  “Virtual Organization”

Virtual Organizations  Users of VO’s share a common goal and trust  Collection of resources, users and rules governing sharing  Highly controlled - What is Shared? Who is Sharing? How can resources be used?  One global domains acting over individual collaborating domains

Grid Security  Highly distributed nature  VOs spread over many security domains  Authentication  Proving identity  Authorization  Obtaining privileges  Confidentiality & Integrity  Identity and privileges can be trusted

Authentication  Certificate Authority (CA)  Entity that signs certificate that proves users identity  Certificate then used as credentials to use system  Typically several CAs to prevent single point of failure/attack  Globus Grid Security Infrastructure (GSI)  Globus’s Authentication component  Global security credential later mapped to local  Kerberos tickets or local username and password  Typically generate short-term proxy certificate with long-term certificate

Authentication  Certification Authority Coordination Group  Maintains a global infrastructure of trusted CA agents  CA must meet standards  Physically secure  Must validate identity with Registration Authorities using official documents or photographic identification  Private Keys must be minimum of 1020 Bits and have max 1 year life  28 approved CAs is European union

Security Issues  Delegation  User entrusts separate entity to perform task  Entity must be given certification and trusted to behave  Limit proxies strength  Endow proxy with specific purpose

Grid Projects  EGEE - Enabling Grids for eScience  70 sites in over 27 countries  Mostly European  40 Virtual Organizations  GENIUS Grid-Portal is used for submission  Individual collaborators use own middle-ware tools to group resources

LCG  Large Hadron Collider Computation Grid  Developed distributed systems needed to support computation and data needs of LHC physics experiments  EGEE Collaborator  100 Sites  Worlds largest Grid

Grid 2003  US effort  27 National sites  Processors, Simultaneous Jobs  Infrastructure for  Particle Physics Grid  Virtual Data Grid Laboratory  Develop Application Grid Laboratory - Grid3  Platform for experimental CS Research  Built on Virtual Data Toolkit  Collection of Globus, Condor and other middleware tools

TeraGrid  40 Teraflops of Computational Power  8 National Sites with strong backbone  Used for NSF sponsored High Performance Computing  Mapping the human arterial tree model  TeraShake - Earthquake simulation

Applications  Climate Monitoring + Simulation  Network Weather Service  Climate Data-Analysis Tool  Both run on the Earth System Grid running on Globus  MEANDER nowcast meteorology  Run on Hungarian Supergrid  ATLAS Challenge  Simulate high energy proton-proton collisions  Computational Science Simulations  Biology, Fluid Dynamics

Grid Tools  Many middleware implementations  Globus  Condor  Condor-G  Cactus-G  OGSA  Solves common Grid problems  Resource discovery/management/allocation  Security/Authentication

Condor  Initially developed in 1983 at University of Wisconsin  Pre-Grid tool  A Local Resource Management System  Allows creation of communities with distributed resources  Communities should grown naturally  Sharing as much or as little as they care too  Sounds like Virtual Organizations

Condor  Responsibilities  Job Management, Scheduling  Resource monitoring and management  Checkpointing and Migration  Utilize idle CPU  Cycle ‘Scavenge

Condor Pool  Full set of users and resources in community  Composed of three Entities  Agent  Finds resources and executes jobs  Resource  Advertise itself and how it can be used in pool  Matchmaker  Knows of all agents and resources  Puts together compatible pairs  Pool is defined by single matchmaker

Matchmaking  Problem of centralized Scheduling  Resources have multiple owners  Unique use requirements  Matchmaking finds balance between user and resource needs  ClassAds  Agents advertise requirements  Resources advertise how it can be used

Matchmaking  Matchmaker scans all known ClassAds  Creates matching pairs of agents and resources  Informs both parties  Individually responsible to negotiate job and initiating execution of job  Separation of matching and claiming  Matchmaker unaware of complicated allocation  Stale information may exist. Resource can deny match

Condor Flocking  Linking condor pools necessary for collaboration  Sharing of resources beyond the organizational level  Individuals belonging to multiple communities  Gateway Flocking  Entire communities are linked  Direct Flocking  Individual collaborators belong to many pools

Gateway Flocking  Gateway entity serves as a singular point of access for cross pool communication  Matchmakers talk to Gateways  Gateways talk to Gateways  Transparent to user  Organizational level sharing  Powerful, but difficult to setup and maintain

Gateway Flocking

Direct Flocking  Agents report to multiple matchmakers  Individual collaboration  Natural idea for users  Less powerful but simpler to build and deploy  Eventually used in favor Gateway Flocking

Direct Flocking

Cactus  General-purpose, open-source parallel computation framework  Developed for numerical solution to Einstein’s equation  Two main components flesh and thorns  Flesh – central core  Thorns – application modules  Provides simple abstract API  Hides MPI parallel driver, I/O (thorns)

Cactus-G  “Grid-enabled” Cactus  Combines Cactus and MPICH-G2 (more later)  Layered approach  Application thorns  Grid-aware infrastructure thorns  Grid-enabled communication library (MPICH-G2 in this case)

Globus  Condor  Pre-Grid tool applied to Grid Systems  Multi-domain possible but limited  No security. Focus primarily on resource management  Globus  Set of Grid specific tools  Extendable and Hierarchical

The Toolkit  Globus Toolkit  Components for basic security, resource management, etc  Well defined interfaces - “Hour-glass” architecture  Local services sit behind API  Global services built on top of these local services  Interfaces useful to manage heterogeneity  Information Service integral component  Information-rich environment needed

Globus Services

Resource Management  Globus Resource Allocation Manager (GRAM)  Responsible for set of local resources  Single domain  Implemented with set a local RM tools  Condor, NQE, Fork, Easy-LL, etc…  Resource requests expressed in Resource Specification Language (RSL

Resource Broker  Manages RSL requests  Uses Information services to discover GRAMS  Transforms abstract RSLs into more specific requirements  Sends allocation requests to appropriate GRAM

Information Service  Grid always in flux  Information rich system produces information users find useful  Enhances flexibility and performance  Necessity for administration  Globus Metacomputing Directory Service (MDS)  Stores and makes accessible Grid information  Lightweight Directory Access Protocol (LDAP)  Extensible representation for information  Stores component information in directory information tree

Security  Local Heterogeneity  Resources operated in multiple security domains  All use different authentication techniques  N-Way authentication  Job may be any number of processes on any number of resources  One logical entity. User should only authenticate once.

Security  Globus Security Infrastructure (GSI)  Modular design constructed on top of local services  Solves local heterogeneity  Globus Identity  Mapped into local user identities by local GSI  Allows for n-way authorization

OGSA  Open Grid Services Architecture  Defines a Grid Service  Provides standard interface for naming, creating, discovering a Grid Service  Location Transparent  Globus Toolkit  GRAM – resource allocation/management  MDS-2 – information discovery  GSI – authentication (single sign-on)  Web services  Widely used  Language/system independent

OGSA – Grid Service Interface

OGSA – VO Structure

Condor-G  Hybrid Condor-Globus System  Local Condor agent (Condor-G)  Communicates with Globus GRAM, MDS, GSI, etc  Optimized Globus’s GRAM to work with Condor better

Specific Testbed  Grid2003  Organized into 6 VOs (one for each application)  At each VO site, middleware installed with grid certificate databases  GSI, GRAM, and GridFTP used from Globus  MDS  MonALISA  Agent-based monitoring used in conjunction with MDS

MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface Nicholas Karonis, Brian Toonen, Ian Foster

Abstract  Grid Enabled MPI implementation  Extends MPICH  Utilizes Globus Toolkit  Authentication, Authorization, Resource Allocation, Executable Staging, I/O, Process management creation and control  Hide/Expose critical aspects of heterogeneous environment

The Problem  Grids difficult to program for…  heterogeneous, highly distributed  Build on existing MPI API  MPICH specifically  Can we implement MPI constructs in a highly heterogeneous environment efficiently and transparently?  Yes, use Globus!  Can we also allow users to manage heterogeneity?  Yes, existing MPI Communicator Construct!

MPICH-G2  Global Security Infrastructure (GSI)  Single sign-on authentication  Monitoring and Discovery Service (MDS)  Select nodes to execute on  Resource Specification Language  Generated by mpirun  Specifies job resource requirements  Dynamically-Updated Request Online Coallocator (DUROC)

MPICH-G2 Flow Diagram

MPICH-G2 Improvements  Replaces MPICH-G  Replace use of Nexus (Globus) for all communication with optimized code  Increased Bandwidth  Cutout extra layer (Nexus)  Reduce intra-machine vendor MPI messaging latency  Eliminate unnecessary polling based on source rank info (for Recv)  Specified, Specified-pending, multimethod (more later)  Only poll TCP (expensive) when necessary (ie using TCP not vendor MPI)

MPICH-G2 Improvements 2  More efficient use of sockets  Uses one socket for both directions  Multilevel topology-aware collective operations  Collective operations originally implemented assuming equidistance  Not likely in Grid scenario

App Heterogeneity Management  Topology Discovery  Need method of discovering topology to minimize expensive transfers  intra-site communication vs intra-machine communication  Use existing MPI communicator construct  Associate attributes with communicators  Topology depths and colors  Allow MPI developers to create communicators which group processes topologically

Example MPICH-G2 App

Performance Groupings  Specified  MPI_Recv explicitly specifies process on same machine  No outstanding asynchronous operations  Explicitly call vendor MPI  Specified-pending  MPI_Recv explicitly specifies process on same machine  Outstanding recv requests on same machine  Forced to continuously poll vendor MPI  Multimethod  MPI_Recv source rank is MPI_ANY_SOURCE  OR outstanding recv requests which may require TCP  Forced to continuously poll vendor MPI and TCP

Vendor MPI Results  Increased performance compared to MPICH-G  Relatively close performance to straight vendor MPI

Vendor MPI Results

TCP/IP Results  Similar results as Vendor MPI (less interesting)  Authors explicitly say they did not attempt to modify the TCP code

TCP/IP Results

Conclusions  Good performance  Improved performance opposed to previous version  “good enough” performance to justify use  Eases transition of MPI applications to the context of a Grid  Just works  Provides developer with a relatively simply means of writing “smart” apps which are aware of their topology

P-GRADE Portal

MTA SZTAKI  Computer and Automation Research Institute of the Hungarian Academy of Sciences  Laboratory of Parallel and Distributed Computing  Peter Kacsuk  Joszef Patvarczki  HunGrid  Member of both SEE-Grid and EGEE

Two Grid Problems  Middleware tools build together into a Grid  Too many complex parts  Confusing for users with little experience  Mostly research scientists  PVM and MPI allow for Parallel execution  Executed within a Globus or Condor site shows good performance  Performance decreases when executed in multiple sites

P-GRADE Portal  A Web based Portal for accessing Grid  High level tools hide complexity of middleware  Can be accessed anywhere  Workflow solution  Complex problems are broken into several parts treated as single framework  Executed as an acyclic graph  Parallelism at two levels  Independent branches run on several grid sites  Individual nodes can be parallel programs (MPI or PVM)

Portal  Fully functional; built upon middleware tools  Grid Certificate management  Setting up Grid environment  Creation and modification of workflow apps  Management and parallel execution of workflow apps on grid resources  Visualization of workflow progress

Grid Certificate  Security done through Globus GSI  Connect to Proxy server; download Certificate  Monitor status

Resource Management  Use Globus tools to attach jobs to resources  Two Strategies  Static Allocation  Connect Directly to GRAM Servers  Dynamic Allocation  Connect to MDS service  Allocate through Grid resource broker

Workflow Creation & Monitoring  P-GRADE  Java app for creating parallel workflows  Directed input and output files

Parameter Study  Singular job run under varying input parameters  Outputs later compared against each other  Logical Grid Application  Each job independent and can be run in parallel

P-GRADE Portal w/ PStudy  Adaped Portal to create and manage Parametric studies  New workflow Editor  Creation of parameterized input file  Manage parameter values  Workflow Management  Submit workflows by parameter ranges  Compare outputs  Monitor individual job status

Pstudy Manager

Visualization

PGRADE Demo  ere/gridsphere ere/gridsphere ere/gridsphere