Download presentation
Presentation is loading. Please wait.
Published byMeryl Nash Modified over 9 years ago
1
Performance-responsive Middleware for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems Group
2
Context Funded by / collaborating with –UK e-Science Core Programme –IBM (Watson, Hursley) –NASA (Ames) –NEC Europe –Los Alamos National Laboratory Integrate established performance tools into emerging grid middleware High Performance Systems Group
3
Grid Resource Management How do we enable and regulate the resource sharing between users? While… providing vision of access to full resources hiding detail & unnecessary complexity providing acceptable levels of service
4
High Performance Systems Group Workload Generation, Visualisation… Discovery, Mapping, Scheduling, Security, Accounting… Computing, Storage, Instrumentation… Managing through Middleware Key interface between applications & resources
5
High Performance Systems Group Key Middleware Activities Determine what resources are required (advertise) Determine what resources are available (discovery) Map requirements to available resources (scheduling) Maintain contract of performance (service level agreement)
6
Performance Services Intra-domain –Lab- / department-based –Shared resources under local administration Multi-domain –Campus- / country-based –Wide-area resource and task management –Cross domain High Performance Systems Group
7
Performance Services High Performance Systems Group Intra-domain –Lab- / department-based –Shared resources under local administration Multi-domain –Campus- / country-based –Wide-area resource and task management –Cross domain
8
Performance Services High Performance Systems Group Intra-domain –Lab- / department-based –Shared resources under local administration Multi-domain –Campus- / country-based –Wide-area resource and task management –Cross domain
9
Performance Prediction Performance prediction tools Aim to predict –Execution time –Communication usage –Data and resource requirements Provides best guess as to how an application will execute on a given resource High Performance Systems Group
10
PACE User Application Resource
11
High Performance Systems Group PACE User Application Resource Application Model Resource Model
12
Application Model Resource Model PACE User Evaluation Engine Model parameters Resource config. High Performance Systems Group
13
Application Model Resource Model PACE User Evaluation Engine Model parameters Resource config. High Performance Systems Group
14
Why is prediction useful? Scaling properties Compare runtime options with –deadline –available resources –priority / other jobs –etc. High Performance Systems Group Allows runtime scenarios to be explored before deployment
15
1. Intra-Domain Co-Scheduling High Performance Systems Group Augment emerging middleware with additional performance information Handle predictive and non-predictive tasks Use predictive data for system improvement –Time to complete tasks / utilisation of resources –QoS – ability to meet deadlines Scheduler driver, or co-scheduler (called Titan)
16
Intra-Domain Co-Scheduling High Performance Systems Group Non-predictive tasks PORTAL PRE- EXECUTION ENGINE MATCHMAKER SCHEDULE QUEUE PACE GA CLUSTER CONNECTOR CONDOR REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS RESOURCES CLASSADS Titan
17
Intra-Domain Co-Scheduling High Performance Systems Group Non-predictive tasks PORTAL PRE- EXECUTION ENGINE MATCHMAKER SCHEDULE QUEUE PACE GA CLUSTER CONNECTOR CONDOR REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS RESOURCES CLASSADS Titan
18
Intra-Domain Co-Scheduling High Performance Systems Group Non-predictive tasks Tasks with prediction data PORTAL PRE- EXECUTION ENGINE MATCHMAKER SCHEDULE QUEUE PACE GA CLUSTER CONNECTOR CONDOR REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS RESOURCES CLASSADS Titan
19
Intra-Domain Co-Scheduling High Performance Systems Group Non-predictive tasks Tasks with prediction data PORTAL PRE- EXECUTION ENGINE MATCHMAKER SCHEDULE QUEUE PACE GA CLUSTER CONNECTOR CONDOR REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS RESOURCES CLASSADS Titan
20
Intra-Domain Co-Scheduling High Performance Systems Group Non-predictive tasks Tasks with prediction data PORTAL PRE- EXECUTION ENGINE MATCHMAKER SCHEDULE QUEUE PACE GA CLUSTER CONNECTOR CONDOR REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS RESOURCES CLASSADS Titan
21
Intra-Domain Co-Scheduling High Performance Systems Group Non-predictive tasks Tasks with prediction data PORTAL PRE- EXECUTION ENGINE MATCHMAKER SCHEDULE QUEUE PACE GA CLUSTER CONNECTOR CONDOR REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS RESOURCES CLASSADS Titan
22
Intra-Domain Co-Scheduling High Performance Systems Group Non-predictive tasks Tasks with prediction data PORTAL PRE- EXECUTION ENGINE MATCHMAKER SCHEDULE QUEUE PACE GA CLUSTER CONNECTOR CONDOR REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS RESOURCES CLASSADS Titan
23
Intra-Domain Deployment Without co-schedulerWith co-scheduler Time to complete = 70.08mTime to complete = 35.19m High Performance Systems Group
24
Publish intra-domain perf. data through global information services (MDS) Augment service with agent system –One agent per domain / VO When a task is submitted –Agents query IS, and negotiate to discover best domain to run task Scheme is tested on a 256-node exp. Grid –16 resource domains; 6 arch. types High Performance Systems Group 2. Multi-Domain Management
25
High Performance Systems Group Multi-Domain Management time
26
High Performance Systems Group Multi-Domain Management time
27
High Performance Systems Group Multi-Domain Management time
28
High Performance Systems Group Multi-Domain Management Time to complete = 2752s
29
Multi-Domain Management High Performance Systems Group Time to complete = 467s;an improvement of 83%
30
Multi-Domain Management High Performance Systems Group Time to complete = 467s; an improvement of 83%
31
QoS: Ability to Meet Deadline High Performance Systems Group activeinactive
32
Resource usage High Performance Systems Group activeinactive
33
Many Issues Remain Identification of meaningful QoS metrics –User-orientated –Contract-based Honouring of SLA –End-to-end service management –Resolving conflicts Managing Workflow (CCGrid 2003) –See poster & demo But…version 1.0, Condor/GT2-based, available for download –See www.dcs.warwick.ac.uk/~hpsg High Performance Systems Group
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.