Download presentation
Presentation is loading. Please wait.
Published byAndra Moody Modified over 8 years ago
1
Advisor: Resource Selection 11/15/2007 Nick Trebon University of Chicago
2
2 SPRUCE: Resource Selection Introduction: SPRUCE Urgent Computing: SPRUCE Provide token-based, priority access for “urgent” computing jobs Elevated batch queue priority Resource reservations Elevated network bandwidth priority Urgent computations Defined by a strict deadline -- late results are useless Must be in “warm standby” -- ready to run when and where needed Urgency responses are resource-specific
3
3 University of Chicago SPRUCE: Resource Selection Resource Selection Given a workflow, deadline, and workflow input parameters, how does one select the “best” configuration? Workflow configuration includes input sources, output sources, requested cpus, urgency, resource… “Best”: Most likely to meet deadline? Least intrusive to other users? Resource with highest application reliability? Analyze application-specific, historical and live data to determine the likelihood of meeting a deadline
4
4 University of Chicago SPRUCE: Resource Selection Urgent Resource Selection
5
5 University of Chicago SPRUCE: Resource Selection Total Turnaround Time Generate a bound for the total turnaround time Generate bounds for: File Staging (F T ) Pre-Allocation time (e.g., queue delay) (P T ) Execution time (E T ) If we assume each stage is independent, then Overall turnaround time = F T + P T + E T
6
6 University of Chicago SPRUCE: Resource Selection File Staging Delay Calculate for input and output delays Utilize the Network Weather Service Monitor bandwidth via short, periodic probes Generate predictions for expected bandwidth Problems? Ensure probes are large enough to capture the behavior seen for large file transfers Are GridFTP transfers routed differently?
7
7 University of Chicago SPRUCE: Resource Selection Pre-Allocation Delay Policy is resource-dependent E.g., elevated priority, next-to-run, pre- emption, etc. Normal priority: use Batch Queue Predictor* Next-to-run: post-process queue logs to determine an empirical bound Pre-emption: generate empirical bound * http://spinner.cs.ucsb.edu/batchq/
8
8 University of Chicago SPRUCE: Resource Selection Live Queue Data Current methods involve processing historical batch queue logs Parse MDS logs of recent queue state What intuition can we glean? Are there other SPRUCE jobs submitted? What is the current load on the resource?
9
9 University of Chicago SPRUCE: Resource Selection Execution Delay Start simple: Use historical application performance to generate a cubic spline performance model Better idea Given a resource, workflow input and number of cpus, determine an empirical bound on the delay
10
10 University of Chicago SPRUCE: Resource Selection Meeting the deadline Given a workflow, which configuration provides the best likelihood of meeting the deadline? Application reliability? Changing the number of requested CPUs? Urgency level? Is next-to-run almost as good as pre-emption? Will our job run without using a token at all? Etc.
11
11 University of Chicago SPRUCE: Resource Selection Advisor Interface
12
12 University of Chicago SPRUCE: Resource Selection Select a workflow (example)
13
13 University of Chicago SPRUCE: Resource Selection Advisor Results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.