Download presentation
Presentation is loading. Please wait.
Published byCorey Campbell Modified over 9 years ago
1
1 Resource Management of Large- Scale Applications on a Grid Laukik Chitnis and Sanjay Ranka (with Paul Avery, Jang-uk In and Rick Cavanaugh) Department of CISE University of Florida, Gainesville ranka@cise.ufl.edu 352 392 6838 (http://www.cise.ufl.edu/~ranka/)
2
2 Overview High End Grid Applications and Infrastructure at University of Florida Resource Management for Grids Sphinx Middleware for Resource Provisioning Grid Monitoring for better meta-scheduling Provisioning Algorithm Research for multi- core and grid environments
3
3 The Evolution of High-End Applications (and their system characteristics) 198019902000 Data Intensive Applications Compute Intensive Applications Geographically distributed datasets High speed storage Gigabit networks Geographically distributed datasets High speed storage Gigabit networks Large clusters Supercomputers Large clusters Supercomputers MainFrame Applications Central mainframes
4
4 Some Representative Applications HEP, Medicine, Astronomy, Distributed Data Mining
5
5 1- 1-10 petabytes 1000+ 20+ countries Representative Application: High Energy Physics
6
6 Representative Application: Tele- Radiation Therapy RCET Center for Radiation Oncology
7
7 Representative Application: Distributed Intrusion Detection NSF ITR Project: Middleware for Distributed Data Mining (PI: Ranka joint with Kumar and Grossman) NSF ITR Project: Middleware for Distributed Data Mining (PI: Ranka joint with Kumar and Grossman) Data Management Services Data Mining and Scheduling Services Application.. Data Management Services Data Transport Services
8
8 Grid Infrastructure Florida Lambda Rail and UF
9
9 Campus Grid (University of Florida) NSF Major Research Instrumentation Project (PI: Ranka, Avery et. al.) 20 Gigabit/sec Network 20+ Terabytes 2-3 Teraflops 10 Scientific and Engineering Applications NSF Major Research Instrumentation Project (PI: Ranka, Avery et. al.) 20 Gigabit/sec Network 20+ Terabytes 2-3 Teraflops 10 Scientific and Engineering Applications Infiniband based Cluster Gigabit Ethernet Based Cluster
10
10 Grid Services The software part of the infrastructure!
11
11 Services offered in a Grid Resource Management Services Data Management Services Monitoring and Information Services Security Services Note that all the other services use security services
12
12 Resource Management Services Provide a uniform, standard interface to remote resources including CPU, Storage and Bandwidth Main component is the remote job manager Ex: GRAM (Globus Resource Allocation Manager)
13
13 Resource Management on a Grid User The Grid Site 1 Condor PBS LSF fork GRAM Narration: note the different local schedulers Site 3 Site 2 Site n
14
14 Scheduling your Application
15
15 Scheduling your Application An application can be run on a grid site as a job The modules in grid architecture (such as GRAM) allow uniform access to the grid sites for your job But… Most applications can be “parallelized” And these separate parts of it can be scheduled to run simultaneously on different sites Thus utilizing the power of the grid
16
16 Modeling an Application Workflow Many workflows can be modeled as a Directed Acyclic Graph The amount of resource required (in units of time) is known to a degree of certainty There is a small probability of failure in execution (in a grid environment this could happen due to resources no longer available) Directed Acyclic Graph
17
17 Workflow Resource Provisioning Resources Applications Policies Priority Large AccessControl Precedence Quota MultipleOwnership Executing multiple workflows over distributed and adaptive (faulty) resources while managing policies Data Intensive Time Constraints Distributed Multi-coreHeterogeneous Faulty
18
18 A Real Life Example from High Energy Physics Merge two grids into a single multi-VO“Inter-Grid” How to ensure that neither VO is harmed? both VOs actually benefit? there are answers to questions like: “With what probability will my job be scheduled and complete before my conference deadline?” Clear need for a scheduling middleware! FNAL Rice UI MIT UCSD UF UW Caltech UM UTA ANL IU UC LBL SMU OU BU BNL
19
19 Typical scenario VDT Server VDT Client ? ? ?
20
20 Typical scenario VDT Server VDT Client ? ? ? @#^%#%$@#
21
21 Some Requirements for Effective Grid Scheduling Information requirements Past & future dependencies of the application Persistent storage of workflows Resource usage estimation Policies Expected to vary slowly over time Global views of job descriptions Request Tracking and Usage Statistics State information important Resource Properties and Status Expected to vary slowly with time Grid weather Latency of measurement important Replica management System requirements Distributed, fault-tolerant scheduling Customisability Interoperability with other scheduling systems Quality of Service
22
22 Incorporate Requirements into a Framework VDT Server VDT Client Assume the GriPhyN Virtual Data Toolkit: Client (request/job submission) Globus clients Condor-G/DAGMan Chimera Virtual Data System Server (resource gatekeeper) MonALISA Monitoring Service Globus services RLS (Replica Location Service) ? ? ?
23
23 Incorporate Requirements into a Framework Assume the Virtual Data Toolkit: Client (request/job submission) Clarens Web Service Globus clients Condor-G/DAGMan Chimera Virtual Data System Server (resource gatekeeper) MonALISA Monitoring Service Globus services RLS (Replica Location Service) VDT Server VDT Client Framework design principles: Information driven Flexible client-server model General, but pragmatic and simple Avoid adding middleware requirements on grid resources ? Recommendation Engine
24
24 System Adaptive Scheduling Co- allocation Fault- tolerant Policy- based QoS support Flexible interface Nimrod-G Economy-driven Deadline support XOXXOX Maui/Silver Priority-based Reservation OOXOOX PBS Batch job scheduling Queue-based XOXXOX EZ-Grid Policy-based XOXOXO Prophet Parallel SPMD XXXXOX LSF Interactive, batch modes XOOOOX Related Provisioning Software
25
25 Innovative Workflow Scheduling Middleware Modular system Automated scheduling procedure based on modulated service Robust and recoverable system Database infrastructure Fault-tolerant and recoverable from internal failure Platform independent interoperable system XML-based communication protocols SOAP, XML-RPC Supports heterogeneous service environment 60 Java Classes 24,000 lines of Java code 50 test scripts, 1500 lines of script code
26
26 The Sphinx Workflow Execution Framework Sphinx Server VDT Client VDT Server Site MonALISA Monitoring Service Globus Resource Replica Location Service Condor-G/DAGMan Request Processing Data Warehouse Data Management Information Gathering Sphinx Client Chimera Virtual Data System Clarens WS Backbone
27
27 Sphinx Workflow Scheduling Server Functions as the Nerve Centre Data Warehouse Policies, Account Information, Grid Weather, Resource Properties and Status, Request Tracking, Workflows, etc Control Process Finite State Machine Different modules modify jobs, graphs, workflows, etc and change their state Flexible Extensible Sphinx Server Control Process Job Execution Planner Graph Reducer Graph Tracker Job Predictor Graph Data Planner Job Admission Control Message Interface Graph Predictor Graph Admission Control Data Warehouse Data Management Information Gatherer
28
28 SPHINX Scheduling in Parallel for Heterogeneous Independent NetworXs
29
29 Policy Based Scheduling Sphinx provides “soft” QoS through time dependent, global views of Submissions (workflows, jobs, allocation, etc) Policies Resources Uses Linear Programming Methods Satisfy Constraints Policies, User-requirements, etc Optimize an “objective” function Estimate probabilities to meet deadlines within policy constraints J. In, P. Avery, R. Cavanaugh, and S. Ranka, "Policy Based Scheduling for Simple Quality of Service in Grid Computing", in Proceedings of the 18th IEEE IPDPS, Santa Fe, New Mexico, April, 2004 Resources Submissions Time SubmissionsResourcesTime Policy Space
30
30 Ability to tolerate task failures Significant Impact of using feedback information Jang-uk In, Sanjay Ranka et. al. "SPHINX: A fault-tolerant system for scheduling in dynamic grid environments", in Proceedings of the 19th IEEE IPDPS, Denver, Colorado, April, 2005
31
31 Grid Enabled Analysis SC|03
32
Distributed Services for Grid Enabled Data Analysis Distributed Services for Grid Enabled Data Analysis Sphinx Scheduling Service Fermilab File Service VDT Resource Service Caltech File Service VDT Resource Service RLS Replica Location Service Sphinx/VDT Execution Service MonALISA Monitoring Service ROOT Data Analysis Client Chimera Virtual Data Service Iowa File Service VDT Resource Service Florida File Service VDT Resource Service Clarens Globus GridFTP Clarens Globus MonALISA
33
33 Evaluation of Information gathered from grid monitoring systems Correlation indexTurnaround time Queue length-0.05818 Cluster load-0.20775 Average Job Delay0.892542
34
34 Limitation of Existing Monitoring Systems for the Grid Information aggregated across multiple users is not very useful in effective resource allocation. An end-to-end parameter such as Average Job Delay - the average queuing delay experienced by a job of a given user at an execution site - is a better estimate for comparing the resource availability and response time for a given user. It is also not very susceptible to monitoring latencies.
35
35 Effective DAG Scheduling The completion time based algorithm here uses the Average Job Delay parameter for scheduling As seen in the adjoining figure, it outperforms the algorithms tested with other monitored parameters.
36
36 Work in Progress: Modeling Workflow Cost and developing efficient provisioning algorithms Directed Acyclic Graph 1. Developing an objective measure of completion time Integrating performance and reliability of workflow execution P (Time to complete >=T) <= epsilon 2. Relating this measure to the properties of the longest path of the DAG based on the mean and uncertainty of time required for underlying tasks due to 1) variable time requirements due to different parameter values 2) failure due to change of the underlying resources etc. 3. Developing novel scheduling and replication techniques to optimize allocation based on these metrics.
37
37 Work in Progress: Provisioning algorithms for multiple workflows (Yield Management) Quality of Service guarantees for each workflow Controlled (a cluster of multi-core processors) versus uncontrolled (grid of multiple clusters owned by multiple units) environment Level 1 Level 2 Level 3 Level 4 Dag 1Dag 2Dag 3Dag 5Dag 4 Level 1 Level 2 Level 3 Level 4 Dag 1Dag 2Dag 3Dag 5Dag 4 Multiple Workflows
38
38 CHEPREO - Grid Education and Networking E/O Center in Miami area Tutorial for Large Scale Application Development
39
39 Grid Education Developing a Grid tutorial as part of CHEPREO Grid basics Components of a Grid Grid Services OGSA … OSG summer workshop South Padre island, Texas. July 11-15, 2005 http://osg.ivdgl.org/twiki/bin/view/SummerGridWorkshop/ Lectures and Hands-on sessions Building and Maintaining a Grid
40
40 Acknowledgements CHEPREO project, NSF GriPhyN/iVDgL, NSF Data Mining Middleware, NSF Intel Corporation
41
41 Thank You May the Force be with you!
42
42 Additional slides
43
43 Effect of latency on Average Job Delay Latency is simulated in the system by purposely retrieving old values for the parameter while making scheduling decisions The correlation indices with added latencies are comparable, though lower as expected, to the correlation indices of ‘un-delayed’ Average Job Delay parameter. The amount of correlation is still quite high. Average Job Delay correlation index with turnaround time Added latency = 5 minutes Added latency = 10 minutes Site rank 0.6889590.754222 Raw value 0.5826850.777754 Learning period 29 jobs48 jobs
44
44 SPHINX Scheduling Latency Average scheduling latency for various number of DAG’s (20, 40, 80 and 100) with different arrival rate per minute.
45
45 Graphical user interface for data analysis ROOT Virtual data service Chimera Grid scheduling service Sphinx Grid enabled execution service VDT client Grid resource management service VDT server Grid enabled Web service Clarens Grid resource monitoring system MonALISA Replica location service RLS Demonstration at Supercomputing Conference: Distributed Data Analysis in a Grid Environment The architecture has been implemented and demonstrated in SC03 and SC04, Arizona, USA, 2003.
46
46 Scheduling DAGs: Dynamic Critical Path Algorithm The DCP algorithm executes the following steps iteratively: 1. Compute the earliest possible start time (AEST) and the latest possible start time (ALST) for all tasks on each processor. 2. Select a task which has the smallest difference between its ALST and AEST and has no unscheduled parent task. If there are tasks with the same differences, select the one with a smaller AEST. 3. Select a processor which gives the earliest start time for the selected task
47
47 Scheduling DAGs: ILP- Novel algorithm to support heterogeneity (work supported by Intel Corporation) There are two novel features: Assign multiple independent tasks simultaneously – cost of task assigned depends on the processor available, many tasks commence with a small difference in start time. Iteratively refine the scheduling - refines the scheduling by using the cost of the critical path based on the assignment in the previous iteration. Directed Acyclic Graph
48
48 Comparison of different algorithms Number of processors = 30. Number of Tasks = 2000. Number of processors = 30.
49
49 Time for Scheduling
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.