DRM/Computational Grids Bill DeSalvo August 18, 2004.

Slides:



Advertisements
Similar presentations
CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding or
Advertisements

National Institute of Advanced Industrial Science and Technology Advance Reservation-based Grid Co-allocation System Atsuko Takefusa, Hidemoto Nakada,
Current status of grids: the need for standards Mike Mineter TOE-NeSC, Edinburgh.
CSF4, SGE and Gfarm Integration Zhaohui Ding Jilin University.
Seminar Grid Computing ‘05 Hui Li Sep 19, Overview Brief Introduction Presentations Projects Remarks.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Universität Dortmund Robotics Research Institute Information Technology Section Grid Metaschedulers An Overview and Up-to-date Solutions Christian.
Intelligent Grid Solutions 1 / 18 Convergence of Grid and Web technologies Alexander Wöhrer und Peter Brezany Institute for Software.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Workload Management Massimo Sgaravatto INFN Padova.
The Open Grid Service Architecture (OGSA) Standard for Grid Computing Prepared by: Haoliang Robin Yu.
Business Intelligence Dr. Mahdi Esmaeili 1. Technical Infrastructure Evaluation Hardware Network Middleware Database Management Systems Tools and Standards.
Grid Computing Net 535.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Globus 4 Guy Warner NeSC Training.
Windows ® Powered NAS. Agenda Windows Powered NAS Windows Powered NAS Key Technologies in Windows Powered NAS Key Technologies in Windows Powered NAS.
Sanbolic Enabling the Always-On Enterprise Company Overview.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
OPEN GRID SERVICES ARCHITECTURE AND GLOBUS TOOLKIT 4
Grid Computing - AAU 14/ Grid Computing Josva Kleist Danish Center for Grid Computing
DRM/Computational Grids Bill DeSalvo April 14,, 2004.
McGraw-Hill/Irwin © The McGraw-Hill Companies, All Rights Reserved BUSINESS PLUG-IN B17 Organizational Architecture Trends.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Web Services Load Leveler Enabling Autonomic Meta-Scheduling in Grid Environments Objective Enable autonomic meta-scheduling over different organizations.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Middleware for Grid Computing and the relationship to Middleware at large ECE 1770 : Middleware Systems By: Sepehr (Sep) Seyedi Date: Thurs. January 23,
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Authors: Ronnie Julio Cole David
1October 9, 2001 Sun in Scientific & Engineering Computing Grid Computing with Sun Wolfgang Gentzsch Director Grid Computing Cracow Grid Workshop, November.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
7. Grid Computing Systems and Resource Management
International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Introduction to Grid Computing and its components.
Military Technical Academy Bucharest, 2006 GRID - Synthesis - ADINA RIPOSAN Department of Applied Informatics.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
CSF. © Platform Computing Inc CSF – Community Scheduler Framework Not a Platform product Contributed enhancement to The Globus Toolkit Standards.
CSF4 Meta-Scheduler Zhaohui Ding College of Computer Science & Technology Jilin University.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
1 Platform LSF6 What’s new in LSF6
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Workload Management Workpackage
SuperComputing 2003 “The Great Academia / Industry Grid Debate” ?
The Open Grid Service Architecture (OGSA) Standard for Grid Computing
GWE Core Grid Wizard Enterprise (
Management of Virtual Execution Environments 3 June 2008
Wide Area Workload Management Work Package DATAGRID project
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
Presentation transcript:

DRM/Computational Grids Bill DeSalvo August 18, 2004

Computational Grids

© Platform Computing Inc Definitions… Cluster : An arbitrary collection of distributed IT resources organized as a management domain… a single system environment Grid : Transparent, secure, coordinated resource sharing across one or more sites… a cluster of clusters

© Platform Computing Inc Grid Drivers Virtual Organizations -New infrastructure enables new org structures -Collaborative computing New Class of Capabilities -Potential to solve very large problems New Business Models -Outsourcing of computing tasks -Utility computing -Peak load support Source: IDC Optimize Capabilities Resource Optimization -Maximize return on capital equipment Resource Access -Provide mechanisms to share resources across organizational boundaries Cost Sharing -allow multiple groups to contribute resources to a project while maintaining control of those resources Improved Management Model - incorporate multiple systems into an organization under a single unified systems model Source: IDC Optimize Infrastructure

© Platform Computing Inc Ian Foster’s Three-Point Grid Checklist Coordinates resources Not subject to centralized control One or more (virtual) organizations Geographic distribution of users/resources is common Standard, open, general-purpose protocols and interfaces Delivers nontrivial qualities of service SLAs vs. policies vs. QoS Translates business objectives into IT objectives Enables effective utilization, resource aggregation, and remote access to specialized resources Clusters are NOT grids! A cluster is a local-area, logical arrangement of independent entities that collectively provide a service.

© Platform Computing Inc Virtual Organizations

© Platform Computing Inc Evolution of the Grid

© Platform Computing Inc

9 Everyone’s Aware of “The Grid”

© Platform Computing Inc Platform Grid Competencies Resource Leasing Job Forwarding Account Mapping Grid Fairshare Scheduling Advance Reservations User Authentication Reliable Data Transfer Outgrowth of Platform’s experience in Grid and Distributed Computing

Platform MultiCluster

© Platform Computing Inc Three-Point Grid Checklist & Platform MultiCluster Coordinates resources Not subject to centralized control ‘Single’ organization (“Enterprise Grid”) Geographic distribution of users/resources is common Proprietary protocols and interfaces Delivers nontrivial qualities of service SLAs vs. policies Common queues Advance reservation Resource leasing Fairshare SLAs Translates business objectives into IT objectives Enables effective utilization, resource aggregation, and remote access to specialized resources

© Platform Computing Inc Why MultiCluster Global Sharing, Local Ownership (“politics of the grid”) Providing … while maintaining … Increased Capacity Increased Capability Increased Scalability Growing Computational Needs Local Autonomy Dept A Dept B Dept C Dept D

© Platform Computing Inc Job Forwarding Model “HPC Center” Configuration Enhanced transparency FCFS guarantee, pending reason support, chunk jobs, host type/queue status aware scheduling, checkpoint/migration Cluster A HPC Center Cluster B Cluster C

© Platform Computing Inc Job Forwarding Model Compute Servers Compute Servers Site A Site B Send queue Receive queue You submit We do --- Job transfer data staging Account mapping Accounting

© Platform Computing Inc Resource Leasing Model Accelerating Enterprise Grid Adoption Single system image, ease of administration, scalability Enable fairshare, preemption, pending reason support, chunk jobs, advance reservation, interactive jobs, parallel jobs, … across clusters

© Platform Computing Inc Compute Servers Compute Servers Site A Site B Configuration Begin Queue QUEUE=lease HOSTS= End Queue Begin HostExport PER_HOST = hopper curie DISTRIBUTION = [siteA, 10] SLOTS = 10 End HostExport

© Platform Computing Inc Common Resource Leases t utilization By Admin Lease 528 CPUs To Site A Site B project completes t utilization By Load IF (load < threshold(X)) Lease 528 CPUs to Site A ELSE Reclaim Site B hits extended low util period then goes up t utilization By User Req Lease based on Advance Rsv req Site B is always loaded

© Platform Computing Inc Advance Reservation Nodes dedicated to User A for time duration  Reserve nodes for exclusive access for user or user group  Ensures critical work is done without interference  Useful for benchmarking or system maintenance  One-time and recurring reservation  Administrator defines reservation for users

Use Cases

© Platform Computing Inc

© Platform Computing Inc DoD HPCMP Grid DoD HPCMP Challenge Initiative to share resources on HPCMP’s resources easily & transparently: SMDC, TACOM, NRL, NAVO and WSMR, … Build a meta-queuing system to integrate the centers Primary Benefit The capability to submit a job to a single, common queue, which will be sent to the best available computer in the Grid

© Platform Computing Inc DOD HPCMO Solution Platform LSF MultiCluster  Resource reservation protocol  Transparent job control  Accounting Client-server, interactions Kerberized  Ticket forwarding/renewal  Multi-realm support  Account mapping Platform FTA  Kerberized  Fault tolerant DoD HPCMP Grid Requirement Fire and Forget Full Kerberos 5 Support Reliable, Secure File Transfer

© Platform Computing Inc NAVO SUN E10K 64 PEs AEDC Origin PEs DREN NRL Origin PEs TACOM/TARD EC Onyx2 32 PEs RTTC Origin PEs SMDC Origin PEs SSCSD HP Superdom e 44 PEs AFFTC Origin PEs WSMR Origin PEs DREN GRID Challenges  Logistics / Coordination  People  User Accounts  Geographic locations  Site configurations  Time zones /schedules  Network Security /Firewalls  Intro of batch queuing systems to environments  Training & skills transfer DoD HPCMP Grid

© Platform Computing Inc SHARCNET External Grids/Portal

© Platform Computing Inc SHARCNET The network is no longer ‘passive plumbing’ True resource that can be managed in real time – with guaranteed QoS Potential projects -based resource leasing, advance reservation IP-based topology awareness Enables new classes of Grid applications Operational results Real-time, remote visualization Virtual storage Persistent/pervasive On demand

The Globus Toolkit V2

© Platform Computing Inc Sharing pains…physical login Compute Servers Compute Servers Site A Site B You have to Get and maintain multiple accounts Use different batch systems No consolidated accounting Manual file movement

© Platform Computing Inc The Globus Toolkit™ Version 2 (GT2) A software toolkit that addresses key technical problems in the development of Grid-enabled tools, services, and applications Offers a modular “bag of technologies” Enables incremental development of grid-enabled tools and applications Implements standard Grid protocols and APIs Made available under liberal Open Source license Provided by The Globus Alliance

© Platform Computing Inc Globus Toolkit: Evaluation (+) Good technical solutions for key problems, e.g. Authentication and authorization Resource discovery and monitoring Reliable remote service invocation High-performance remote data access This & good engineering is enabling progress Good quality reference implementation, multi-language support, interfaces to many systems, large user base, industrial support Growing community code base built on tools

© Platform Computing Inc Globus Toolkit: Evaluation (-) Protocol deficiencies, e.g. Heterogeneous basis: HTTP, LDAP, FTP No standard means of invocation, notification, error propagation, authorization, termination, … Significant missing functionality, e.g. Databases, sensors, instruments, workflow, … Virtualization of end systems (hosting envs.) Little work on total system properties, e.g. Dependability, end-to-end QoS, … Reasoning about system properties Scalability

© Platform Computing Inc LSF MC & Globus MC: Transparent, dynamic, intelligent, scalable inter-cluster sharing User does not need to know about clusters: total transparency MC dynamically chooses the “best cluster” to run the job User chooses which cluster to submit job to via Globus interface Static, non-intelligent sharing Lacks transparency Cluster A Cluster B Cluster C Globus Inter-cluster protocols

Globus Toolkit 3 (OGSA)

© Platform Computing Inc Every product an island unto itself Prelude to OGSA: An Analogy

© Platform Computing Inc Prelude to OGSA: An Analogy Differentiated products, integrated stack

© Platform Computing Inc Open Grid Services Architecture (OGSA) Next-generation architecture Consequence of technology refresh (i.e., refactoring the Globus Toolkit) and research into Autonomic Computing Convergence of Grid Computing and Web Services Globus Toolkit Access services – e.g., CLIs, GUIs, portals and CoGs Resource and allocation management Monitoring and discovery services – e.g., sensing and indexing Data management services – e.g., file transfer, replica management, etc. Security – e.g., the Grid Security Infrastructure Initially SOAP, WSDL and WS-Inspection The Global Grid Forum (GGF) serves as the standards authority Two layers Core Grid platform – OGSA platform interfaces and models Core Grid infrastructure – Open Grid Services Infrastructure (OGSI)

© Platform Computing Inc Importance of OGSA to Customers Grid-enabled Web Services transforming IT Analyst feedback (e.g., Gartner) Customer experience Customers demand standards-compliant products, solutions and services – why? Vendors guilty of over-promising and under-delivering Avoid single-vendor lock-in Proprietary implementations based on open standards Seek multi-vendor deliverables Framework for partner collaboration Demanding professionalism in software engineering Seek to be engaged in the process

© Platform Computing Inc Platform Embraces Open Standards Platform developing software for over 11 years Standards efforts are recent activities Existing implementations are proprietary Platform is an NPi founder NPi merged with GGF (4/02) NPi being leveraged in OGSA Platform committed to open standards Proprietary implementations based on open standards Platform experienced in Open Source arena Offering Linux solutions for over 6 years Offering Globus Toolkit solutions for about 2 years Source-code available for components of Platform LSF

Platform and Globus

© Platform Computing Inc Platform Globus Toolkit CSF Plus Advanced CSF-based metascheduler Job persistence; enhanced scalability (6x GT 3); Cluster load balancing and host type matching (LSF only) Globus Toolkit 3 Community Scheduler Framework (CSF) Round robin job scheduling; Advance reservation booking, query, & control; Reservation based scheduling; Job throttling for increased reliability Connectors for 3rd party workload management systems (ie: SGE, PBS, etc) Native command line interface support Platform Globus Tookit One step installation Open Source Platform Enhancements

CSF

© Platform Computing Inc What is CSF? CSF (Community Scheduler Framework). Not a Platform product. Contributed industries 1st open source meta-scheduler enhancement to Globus Toolkit V3.X. Developed with the latest version of OGSI – grid guideline being developed with Global Grid Forum. Open source "meta-scheduler“ – framework - Provides basic protocols and interfaces to help resources work together in heterogeneous environments - enables global access and maintains local control of resources

© Platform Computing Inc Key Benefits of OGSA Compliance Future-proof & protect grid investment using standards-based solutions Standardized approach to access Platform LSF Interoperate with 3rd party systems

© Platform Computing Inc Metaschedulers Scheduler that co-ordinates communication between heterogeneous schedulers that operate at a local level Enables global access and coordination while maintaining local control and ownership of resources Future – possible to schedule workload execution also storage, network bandwidth, etc.

© Platform Computing Inc CSF Grid Services Job Service creates, monitors and controls compute jobs Reservation Serviceguarantees resources are available for running a job Queueing Serviceprovides a service where administrators can customize and define scheduling policies at the VO level and/or at the different resource manager level RM Adaptor Serviceprovides a Grid service interface that bridges the Grid service protocol and resource managers (LSF, PBS, SGE, Condor and other RMs)

© Platform Computing Inc CSF Architecture Platform LSF User Globus Toolkit User LSF Meta- scheduler Plugin Grid Service Hosting Environment Job Service Reservation Service Meta-Scheduler Global Information Service RIPS GRAM SGE RIPS GRAM PBS RIPS RM Adapter RIPS = Resource Information Provider Services GRAM = Grid Resource & Allocation Mangement Queuing Service Third Party Workload Management System Platform LSF

Profile High Low Awareness/KnowledgeLiking/Preference/ConvictionCommitment Grid Canada OMII

© Platform Computing Inc What are the Multi-Domain Tools and What Do They Do? Platform MultiCluster Enables global access and coordination while maintaining local control and ownership of resources Join geographically dispersed clusters Production quality solution to build enterprise grids Platform proprietary solution that is standards-based & OGSA compliant Globus Toolkit Tools to join geographically dispersed clusters A bunch of “bricks” to build grids (that’s why it’s called a toolkit) Users have to specify which cluster they would like their job to be sent to – not transparent Open source solution Platform adds commercial support: documentation, training, tech support, professional services

Data Grids

© Platform Computing Inc

© Platform Computing Inc Data Grid Spectrum No Updates Periodic Updates Frequent Updates GOV/EDU Grid Life Sciences Grid Auto Grid Partial replication  Efficient & reliable file transfer  Intelligent transfer  Workload-directed caching  Cache-aware scheduling  Data pipeline Sharing scope HEP Grid User private Intra-project sharing Aero Grid EDA Grid  Efficient data sync Inter-project sharing Intelligent data scheduling

© Platform Computing Inc Data Grid Spectrum No Updates Periodic Updates Frequent Updates Sharing scope User private Intra-project sharing Inter-project sharing GridFTP Replica Catalog FTA DataGrid

Summary

© Platform Computing Inc Summary OGSA applies to e-Science and e-Business Rich architectural framework Existing, emerging and planned specifications Ultimately resulting in Open Standards Existing, emerging and planned implementations The Community Scheduler Framework Standards-based Choice of implementations Ushers existing grids towards OGSA compliance Spectrum of potential use cases

Thank you.