1 Resource Management of Large- Scale Applications on a Grid Laukik Chitnis and Sanjay Ranka (with Paul Avery, Jang-uk In and Rick Cavanaugh) Department.

Slides:

Advertisements

Similar presentations

Distributed Data Processing

Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.

Agreement-based Distributed Resource Management Alain Andrieux Karl Czajkowski.

4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.

All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.

Sphinx Server Sphinx Client Data Warehouse Submitter Generic Grid Site Monitoring Service Resource Message Interface Current Sphinx Client/Server Multi-threaded.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

Technical Architectures

GridFlow: Workflow Management for Grid Computing Kavita Shinde.

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Other servers Java client, ROOT (analysis tool), IGUANA (CMS viz. tool), ROOT-CAVES client (analysis sharing tool), … any app that can make XML-RPC/SOAP.

Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria.

EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.

The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.

Workload Management Massimo Sgaravatto INFN Padova.

DISTRIBUTED COMPUTING

Sergey Belov, Tatiana Goloskokova, Vladimir Korenkov, Nikolay Kutovskiy, Danila Oleynik, Artem Petrosyan, Roman Semenov, Alexander Uzhinskiy LIT JINR The.

WP6: Grid Authorization Service Review meeting in Berlin, March 8 th 2004 Marcin Adamski Michał Chmielewski Sergiusz Fonrobert Jarek Nabrzyski Tomasz Nowocień.

Ajou University, South Korea ICSOC 2003 “Disconnected Operation Service in Mobile Grid Computing” Disconnected Operation Service in Mobile Grid Computing.

CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.

N. GSU Slide 1 Chapter 02 Cloud Computing Systems N. Xiong Georgia State University.

DISTRIBUTED COMPUTING

Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.

Grid Leadership Avery –PI of GriPhyN ($11 M ITR Project) –PI of iVDGL ($13 M ITR Project) –Co-PI of CHEPREO –Co-PI of UltraLight –President of SESAPS Ranka.

Cluster Reliability Project ISIS Vanderbilt University.

WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

Main Sphinx Design Concepts There are two primary design components which comprise Sphinx The Database Warehouse The Control Process The Database Warehouse.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.

Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.

CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.

Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.

Service - Oriented Middleware for Distributed Data Mining on the Grid ，劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.

CPT Demo May Build on SC03 Demo and extend it. Phase 1: Doing Root Analysis and add BOSS, Rendezvous, and Pool RLS catalog to analysis workflow.

Middleware for Grid Computing and the relationship to Middleware at large ECE 1770 : Middleware Systems By: Sepehr (Sep) Seyedi Date: Thurs. January 23,

Data Mining and Exploration Middleware for Distributed and Grid Computing – University of Minnesota 1 Sphinx: A Scheduling Middleware for Data.

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

Cracow Grid Workshop ‘06 17 October 2006 Execution Management and SLA Enforcement in Akogrimo Antonios Litke Antonios Litke, Kleopatra Konstanteli, Vassiliki.

July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.

What is SAM-Grid? Job Handling Data Handling Monitoring and Information.

GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.

Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.

The Grid Effort at UF Presented by Craig Prescott.

US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.

Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

7. Grid Computing Systems and Resource Management

Distributed Services for Grid Enabled Data Analysis Distributed Services for Grid Enabled Data Analysis.

International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.

Static Process Scheduling

Introduction to Grid Computing and its components.

Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.

Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.

PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.

Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,

Workload Management Workpackage

Clouds , Grids and Clusters

Grid Computing.

Supporting Fault-Tolerance in Streaming Grid Applications

Support for ”interactive batch”

Large Scale Distributed Computing

Presentation transcript:

1 Resource Management of Large- Scale Applications on a Grid Laukik Chitnis and Sanjay Ranka (with Paul Avery, Jang-uk In and Rick Cavanaugh) Department of CISE University of Florida, Gainesville (

2 Overview High End Grid Applications and Infrastructure at University of Florida Resource Management for Grids Sphinx Middleware for Resource Provisioning Grid Monitoring for better meta-scheduling Provisioning Algorithm Research for multi- core and grid environments

3 The Evolution of High-End Applications (and their system characteristics) Data Intensive Applications Compute Intensive Applications Geographically distributed datasets High speed storage Gigabit networks Geographically distributed datasets High speed storage Gigabit networks Large clusters Supercomputers Large clusters Supercomputers MainFrame Applications Central mainframes

4 Some Representative Applications HEP, Medicine, Astronomy, Distributed Data Mining

petabytes countries Representative Application: High Energy Physics

6 Representative Application: Tele- Radiation Therapy RCET Center for Radiation Oncology

7 Representative Application: Distributed Intrusion Detection NSF ITR Project: Middleware for Distributed Data Mining (PI: Ranka joint with Kumar and Grossman) NSF ITR Project: Middleware for Distributed Data Mining (PI: Ranka joint with Kumar and Grossman) Data Management Services Data Mining and Scheduling Services Application.. Data Management Services Data Transport Services

8 Grid Infrastructure Florida Lambda Rail and UF

9 Campus Grid (University of Florida) NSF Major Research Instrumentation Project (PI: Ranka, Avery et. al.) 20 Gigabit/sec Network 20+ Terabytes 2-3 Teraflops 10 Scientific and Engineering Applications NSF Major Research Instrumentation Project (PI: Ranka, Avery et. al.) 20 Gigabit/sec Network 20+ Terabytes 2-3 Teraflops 10 Scientific and Engineering Applications Infiniband based Cluster Gigabit Ethernet Based Cluster

10 Grid Services The software part of the infrastructure!

11 Services offered in a Grid Resource Management Services Data Management Services Monitoring and Information Services Security Services Note that all the other services use security services

12 Resource Management Services Provide a uniform, standard interface to remote resources including CPU, Storage and Bandwidth Main component is the remote job manager Ex: GRAM (Globus Resource Allocation Manager)

13 Resource Management on a Grid User The Grid Site 1 Condor PBS LSF fork GRAM Narration: note the different local schedulers Site 3 Site 2 Site n

14 Scheduling your Application

15 Scheduling your Application An application can be run on a grid site as a job The modules in grid architecture (such as GRAM) allow uniform access to the grid sites for your job But… Most applications can be “parallelized” And these separate parts of it can be scheduled to run simultaneously on different sites Thus utilizing the power of the grid

16 Modeling an Application Workflow Many workflows can be modeled as a Directed Acyclic Graph The amount of resource required (in units of time) is known to a degree of certainty There is a small probability of failure in execution (in a grid environment this could happen due to resources no longer available) Directed Acyclic Graph

17 Workflow Resource Provisioning Resources Applications Policies Priority Large AccessControl Precedence Quota MultipleOwnership Executing multiple workflows over distributed and adaptive (faulty) resources while managing policies Data Intensive Time Constraints Distributed Multi-coreHeterogeneous Faulty

18 A Real Life Example from High Energy Physics Merge two grids into a single multi-VO“Inter-Grid” How to ensure that neither VO is harmed? both VOs actually benefit? there are answers to questions like: “With what probability will my job be scheduled and complete before my conference deadline?” Clear need for a scheduling middleware! FNAL Rice UI MIT UCSD UF UW Caltech UM UTA ANL IU UC LBL SMU OU BU BNL

19 Typical scenario VDT Server VDT Client ? ? ?

20 Typical scenario VDT Server VDT Client ? ? ?

21 Some Requirements for Effective Grid Scheduling Information requirements Past & future dependencies of the application Persistent storage of workflows Resource usage estimation Policies Expected to vary slowly over time Global views of job descriptions Request Tracking and Usage Statistics State information important Resource Properties and Status Expected to vary slowly with time Grid weather Latency of measurement important Replica management System requirements Distributed, fault-tolerant scheduling Customisability Interoperability with other scheduling systems Quality of Service

22 Incorporate Requirements into a Framework VDT Server VDT Client Assume the GriPhyN Virtual Data Toolkit: Client (request/job submission) Globus clients Condor-G/DAGMan Chimera Virtual Data System Server (resource gatekeeper) MonALISA Monitoring Service Globus services RLS (Replica Location Service) ? ? ?

23 Incorporate Requirements into a Framework Assume the Virtual Data Toolkit: Client (request/job submission) Clarens Web Service Globus clients Condor-G/DAGMan Chimera Virtual Data System Server (resource gatekeeper) MonALISA Monitoring Service Globus services RLS (Replica Location Service) VDT Server VDT Client Framework design principles: Information driven Flexible client-server model General, but pragmatic and simple Avoid adding middleware requirements on grid resources ? Recommendation Engine

24 System Adaptive Scheduling Co- allocation Fault- tolerant Policy- based QoS support Flexible interface Nimrod-G Economy-driven Deadline support XOXXOX Maui/Silver Priority-based Reservation OOXOOX PBS Batch job scheduling Queue-based XOXXOX EZ-Grid Policy-based XOXOXO Prophet Parallel SPMD XXXXOX LSF Interactive, batch modes XOOOOX Related Provisioning Software

25 Innovative Workflow Scheduling Middleware Modular system Automated scheduling procedure based on modulated service Robust and recoverable system Database infrastructure Fault-tolerant and recoverable from internal failure Platform independent interoperable system XML-based communication protocols SOAP, XML-RPC Supports heterogeneous service environment 60 Java Classes 24,000 lines of Java code 50 test scripts, 1500 lines of script code

26 The Sphinx Workflow Execution Framework Sphinx Server VDT Client VDT Server Site MonALISA Monitoring Service Globus Resource Replica Location Service Condor-G/DAGMan Request Processing Data Warehouse Data Management Information Gathering Sphinx Client Chimera Virtual Data System Clarens WS Backbone

27 Sphinx Workflow Scheduling Server Functions as the Nerve Centre Data Warehouse Policies, Account Information, Grid Weather, Resource Properties and Status, Request Tracking, Workflows, etc Control Process Finite State Machine Different modules modify jobs, graphs, workflows, etc and change their state Flexible Extensible Sphinx Server Control Process Job Execution Planner Graph Reducer Graph Tracker Job Predictor Graph Data Planner Job Admission Control Message Interface Graph Predictor Graph Admission Control Data Warehouse Data Management Information Gatherer

28 SPHINX Scheduling in Parallel for Heterogeneous Independent NetworXs

29 Policy Based Scheduling Sphinx provides “soft” QoS through time dependent, global views of Submissions (workflows, jobs, allocation, etc) Policies Resources Uses Linear Programming Methods Satisfy Constraints Policies, User-requirements, etc Optimize an “objective” function Estimate probabilities to meet deadlines within policy constraints J. In, P. Avery, R. Cavanaugh, and S. Ranka, "Policy Based Scheduling for Simple Quality of Service in Grid Computing", in Proceedings of the 18th IEEE IPDPS, Santa Fe, New Mexico, April, 2004 Resources Submissions Time SubmissionsResourcesTime Policy Space

30 Ability to tolerate task failures Significant Impact of using feedback information Jang-uk In, Sanjay Ranka et. al. "SPHINX: A fault-tolerant system for scheduling in dynamic grid environments", in Proceedings of the 19th IEEE IPDPS, Denver, Colorado, April, 2005

31 Grid Enabled Analysis SC|03

Distributed Services for Grid Enabled Data Analysis Distributed Services for Grid Enabled Data Analysis Sphinx Scheduling Service Fermilab File Service VDT Resource Service Caltech File Service VDT Resource Service RLS Replica Location Service Sphinx/VDT Execution Service MonALISA Monitoring Service ROOT Data Analysis Client Chimera Virtual Data Service Iowa File Service VDT Resource Service Florida File Service VDT Resource Service Clarens Globus GridFTP Clarens Globus MonALISA

33 Evaluation of Information gathered from grid monitoring systems Correlation indexTurnaround time Queue length Cluster load Average Job Delay

34 Limitation of Existing Monitoring Systems for the Grid Information aggregated across multiple users is not very useful in effective resource allocation. An end-to-end parameter such as Average Job Delay - the average queuing delay experienced by a job of a given user at an execution site - is a better estimate for comparing the resource availability and response time for a given user. It is also not very susceptible to monitoring latencies.

35 Effective DAG Scheduling The completion time based algorithm here uses the Average Job Delay parameter for scheduling As seen in the adjoining figure, it outperforms the algorithms tested with other monitored parameters.

36 Work in Progress: Modeling Workflow Cost and developing efficient provisioning algorithms Directed Acyclic Graph 1. Developing an objective measure of completion time Integrating performance and reliability of workflow execution P (Time to complete >=T) <= epsilon 2. Relating this measure to the properties of the longest path of the DAG based on the mean and uncertainty of time required for underlying tasks due to 1) variable time requirements due to different parameter values 2) failure due to change of the underlying resources etc. 3. Developing novel scheduling and replication techniques to optimize allocation based on these metrics.

37 Work in Progress: Provisioning algorithms for multiple workflows (Yield Management) Quality of Service guarantees for each workflow Controlled (a cluster of multi-core processors) versus uncontrolled (grid of multiple clusters owned by multiple units) environment Level 1 Level 2 Level 3 Level 4 Dag 1Dag 2Dag 3Dag 5Dag 4 Level 1 Level 2 Level 3 Level 4 Dag 1Dag 2Dag 3Dag 5Dag 4 Multiple Workflows

38 CHEPREO - Grid Education and Networking  E/O Center in Miami area  Tutorial for Large Scale Application Development

39 Grid Education Developing a Grid tutorial as part of CHEPREO Grid basics Components of a Grid Grid Services OGSA … OSG summer workshop South Padre island, Texas. July 11-15, Lectures and Hands-on sessions Building and Maintaining a Grid

40 Acknowledgements CHEPREO project, NSF GriPhyN/iVDgL, NSF Data Mining Middleware, NSF Intel Corporation

41 Thank You May the Force be with you!

42 Additional slides

43 Effect of latency on Average Job Delay Latency is simulated in the system by purposely retrieving old values for the parameter while making scheduling decisions The correlation indices with added latencies are comparable, though lower as expected, to the correlation indices of ‘un-delayed’ Average Job Delay parameter. The amount of correlation is still quite high. Average Job Delay correlation index with turnaround time Added latency = 5 minutes Added latency = 10 minutes Site rank Raw value Learning period 29 jobs48 jobs

44 SPHINX Scheduling Latency Average scheduling latency for various number of DAG’s (20, 40, 80 and 100) with different arrival rate per minute.

45 Graphical user interface for data analysis ROOT Virtual data service Chimera Grid scheduling service Sphinx Grid enabled execution service VDT client Grid resource management service VDT server Grid enabled Web service Clarens Grid resource monitoring system MonALISA Replica location service RLS Demonstration at Supercomputing Conference: Distributed Data Analysis in a Grid Environment The architecture has been implemented and demonstrated in SC03 and SC04, Arizona, USA, 2003.

46 Scheduling DAGs: Dynamic Critical Path Algorithm The DCP algorithm executes the following steps iteratively: 1. Compute the earliest possible start time (AEST) and the latest possible start time (ALST) for all tasks on each processor. 2. Select a task which has the smallest difference between its ALST and AEST and has no unscheduled parent task. If there are tasks with the same differences, select the one with a smaller AEST. 3. Select a processor which gives the earliest start time for the selected task

47 Scheduling DAGs: ILP- Novel algorithm to support heterogeneity (work supported by Intel Corporation) There are two novel features: Assign multiple independent tasks simultaneously – cost of task assigned depends on the processor available, many tasks commence with a small difference in start time. Iteratively refine the scheduling - refines the scheduling by using the cost of the critical path based on the assignment in the previous iteration. Directed Acyclic Graph

48 Comparison of different algorithms Number of processors = 30. Number of Tasks = Number of processors = 30.

49 Time for Scheduling