Presentation is loading. Please wait.

Presentation is loading. Please wait.

Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.

Similar presentations


Presentation on theme: "Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink."— Presentation transcript:

1 Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink and expand individual jobs to available sets of processors Example: Machine with 100 processors Example: Machine with 100 processors Job1 arrives, can use 20-150 processors Job1 arrives, can use 20-150 processors Assign 100 processors to it Assign 100 processors to it Job2 arrives, can use 30-70 processors, Job2 arrives, can use 30-70 processors, –and will pay more if we meet its deadline Make resource allocation decisions Make resource allocation decisions

2 Multiple Parallel Machines Faucet submits a request: Faucet submits a request: CPU seconds, min-max cpus, deadline, interacive? CPU seconds, min-max cpus, deadline, interacive? Parallel machines submit bids: Parallel machines submit bids: A job for 100 cpu hours may get a lower price bid if: A job for 100 cpu hours may get a lower price bid if: It has less tight deadline, It has less tight deadline, more flexible PE range more flexible PE range A job that requires 15 cpu minutes and a deadline of 1 minute A job that requires 15 cpu minutes and a deadline of 1 minute Will generate a variety of bids Will generate a variety of bids A machine with idle time on its hand: low bid A machine with idle time on its hand: low bid

3 How to make all of this work? The key: fine-grained resource management model The key: fine-grained resource management model Work units are objects and threads Work units are objects and threads rather than processes rather than processes Data units are object data, thread stacks,.. Data units are object data, thread stacks,.. Rather than pages Rather than pages Work/Data units can be migrated automatically Work/Data units can be migrated automatically during a run during a run

4 Anonymous Compute Power What is needed to make this metaphor work? Timeshared parallel machines in the background effective resource management Quality of computational service contracts/guarantees Front ends that will allow agents to submit jobs on user’s behalf: Computational Faucets

5 What does a Computational faucet do? What does a Computational faucet do? Submit requests to “the grid” Submit requests to “the grid” Evaluate bids and decide whom to assign work Evaluate bids and decide whom to assign work Monitor applications (for performance and correctness) Monitor applications (for performance and correctness) Provide interface to users: Provide interface to users: Interacting with jobs, and monitoring behavior Interacting with jobs, and monitoring behavior What does it look like? What does it look like? A browser!

6 Faucets QoS User specifies desired job parameters such as: program executable name, executable platform, min PE, max PE, estimated CPU-seconds (for various PE), priority, etc. User does not specify machine. Faucet software contacts a central server and obtains a list of available workstation clusters, then negotiates with clusters and chooses one to submit the job. User can view status of clusters. Planned: file transfer, user authentication, merge with Appspector for job monitoring. Central Server Faucet Client Web Browser Workstation Cluster

7 Time-shared Parallel Machines To bid effectively (profitably) in such an environment, a parallel machine must be able to run well-paying (important) jobs, even when it is already running others. Allows a suitably written Charm++/Converse program running on a workstation cluster to dynamically change the number of CPU's it is running on, in response to a network (CCS) request. Works in coordination with a Cluster Manager to give a job as many CPU's as are available when there are no other jobs, while providing the flexibility to accept new jobs and scale down.

8 Appspector Appspector provides a web interface to submitting and monitoring parallel jobs. Submission: user specifies machine, login, password, program name (which must already be available on the target machine). Jobs can be monitored from any computer with a web browser. Advanced program information can be shown on the monitoring screen using CCS.

9 BioCoRE Project Based Workbench for Modeling Conferences/Chat Rooms Lab Notebook Joint Document Preparation Goal: Simulate the process of doing research. Provide a web-based way to virtually bring scientists together. http://www.ks.uiuc.edu/Research/biocore/


Download ppt "Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink."

Similar presentations


Ads by Google