Download presentation
Presentation is loading. Please wait.
Published byEleanore Reynolds Modified over 9 years ago
1
Using Clusters -User Perspective
2
Pre-cluster scenario So many different computers: prithvi, apah, tejas, vayu, akash, agni, aatish, falaq, narad, qasid … Different S/W on each of them Different H/W capabilities The desired one may be down Only few are in the top bracket, so response may be slow
3
Cluster Only one machine for so many computers Same S/W everywhere Same H/W Few systems down is no problem One can use the m/c as Interactive Server, Batch Sever, Sequential m/c, Parallel m/c
4
User Interface to Cluster Like OS is between m/c and user This interface is between user and a chunk of m/c s Users Interface m/c s
5
Components Q ing: Collection of user jobs/requests in the form of batch jobs Scheduling: Selecting user jobs to run and m/c s to run on Monitoring: Usage policy implementation, Job and m/c status track
6
Portable Batch System (PBS) Two components: User Commands and System Daemons User commands eqv. GUI is also available User commands are for: submit, monitor, modify, delete etc. tasks. Daemons: Server for managing resources of the whole cluster Scheduler Selects the executer and its resources
7
Executer some node and some processor selected by the scheduler Running a job: 1- Create a file having OS and PBS commands:./a.out #PBS –l ncpus=4 2- Submitting a job: Use the command qsub [options]
8
-I option creates an interative session -q option selects the Q Checking the status of a job Tracejob job_number
9
9/05/2006 20:19:36 S Job Queued at request of santh@hncn17, owner = santh@hncn17, job name = SCR_LB70- m5stat, queue = workq 9/05/2006 20:19:36 S Job Modified at request of Scheduler@hncn17 9/05/2006 20:19:36 S enqueuing into workq, state 1 hop 1 9/05/2006 20:19:36 A queue=workq 9/05/2006 22:39:36 L Considering job to run 9/05/2006 22:39:36 L Not enough of the right type of nodes available
10
Modifying a job: qalter –l walltime=20:00 Deleting a job: qdel 17 Sending signals: qsig –s signal job_identifier Job movement between Qs is possible Parallel jobs are run through the command: mpirun Check pointing is possible
11
pbs_server, pbs_mom, pbs_scheduler are the three daemons Compute node runs only pbs_mom
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.