Scheduling From the Perspective of the Application By Francine Berman & Richard Wolski Presenter:Kun-chan Lan.

Scheduling From the Perspective of the Application By Francine Berman & Richard Wolski Presenter:Kun-chan Lan

Outline of the talk Overview Case study Application-centric scheduling AppleS Project Result Conclusion

Overview.. Why scheduling is important in metacomputing system –Better utilization of resource –Performance efficiency Application-centric scheduling –Everything is evaluated in terms of its impact on the application

..Overview.. Metacomputing –Aggregation of distributed and high- performance resources on coordinated networks, for performance required to address modern scientific problems –Heterogeneity(administrative domain, software/hardware architecture, protocol etc) –contention

Parallel computing vs. Metacomputing Performance oriented Aggregation of resources from a single site(a mutli- processor machine) Communicate via dedicated devices like switch,share-memory etc. Homogeneous(hardware/s oftware infrastructure, administrative domain etc) Performance oriented aggregation of resource from multiple sites Communicate via a distributed network Heterogeneous resources A software infrastructure required to coordinate distributed networks into a communication substrate

Scheduling for parallel computing Multiprocessor nodes generally have uniform capabilities Usually there is a centralized system scheduler Processors are dedicated to tasks of a single application -- No contention

Scheduling for Metacomputing Resources are often managed by separate schedulers which are not coordinated – no single system scheduler Data conversion between sides Overlapping of communication and computation to amortize network communication Separate optimized algorithm for tasks on different machine

Outline of the talk Overview Case study Application-centric scheduling AppleS Result Conclusion

Case 1: CLEO/NILE

CLEO A high energy physics project Each collision detected by CLEO is called an event Each event is recorded and passed to a program called “pass2” to computer offline the physical properties of the particles Records computed by “pass2” are read and compressed by another program for certain frequently-accessed fields One terabyte of data being generated per year

Nile.. A by-product of CLEO Each CLEO’s collaborating institution is a site Goal – provide a scalable, fault-tolerant, heterogeneous system of hundreds of commodity workstations, with access to a distributed database in excess of 100~TB –Resources(CLEO data) are spread across the United States and Canada at 24 collaborating institutions –resource can be accessed and used transparently from anywhere by any member of the CLEO collaboration

..Nile.. Not specific to CLEO, can be used by any application that is easily parallelizable Currently implemented in CORBA/JAVA Three components –Nile Control System(NCS) –Data Repository –User Interface Interconnecting networks include ATM,FDDI and Ethernet

Nile..

..Nile.. NCS: –Site manager: Interface between NCS and clients Receive job requests For each job request, create a job manager, store the job context into Job Database and place the job into queue stateless

..Nile.. NCS: Job DB: –Store the state of job Resource DB: –Maintain the state of available hardware resources at local site Data Location Manager: –Translate logical data specification in the job profile to a set of corresponding physical data objects, which can be used to determine the suitable hosts to run the sub jobs

..Nile.. NCS: Job Manager –Divide a single job into a set of sub jobs which can be executed in parallel –Monitor the state of sub-jobs –Collect and assemble the results, and pass them back the site manager Planner –Produce an execution plan consisting of a list of sub- jobs,each having a host machine and a set of data objects

Characteristics of CLEO/NILE The quantity of data for the problem is so large that no single site can provide all the resources needed Efficient resource allocation is crucial Execution sites and network interconnection are heterogeneous Some resources are shared by other application, so performance might vary greatly based on contention for resources

CASE 2: 3-D REACT Try to predict the energy level of reaction using quantum mechanics Simulate a hydrogen-deuterium reaction Essentially calculating the solution to a six- dimensional Schrodinger equation, and can be decomposed into three tasks –LHSF (local hyper-spherical surface function) –Log-D (logarithmic derivative propagation): use the result of LHSF as input –ASY:an asymptotic analysis on the matrices generated during the Log-D calculation

Scheduling 3D-REACT Distribute 3D-REACT two computation units –Cray C90 in SDSC –64-node Intel Paragon in CalTech The problem is divided into smaller sub-domains of 5-20 surface function per sub-domains, so LHSF and Log-D can be executed concurrently First C90 calculate the LHSF for a given sub-domain, and then the result is passed to Paragon which will calculate the log-D portion of that sub-domain While Paragon is calculating the first sub-domain, C90 can start calculating the second sub-domain After all the sub-domains are considered, the ASY will determine whether the calculation should stop

Characteristics of 3D-REACT The algorithm implemented by a task is optimized for the machine to which it has assigned –Eg. The Log-D implementation used in C90 is different than that used in Paragon Computation and communication can be pipelined to amortize communication delays Data might need to be converted into different format when being transferred between different sites –Eg. The floating point needed to be converted when C90 sends data to Paragon Scheduling is critical for performance –Each of the sub-tasks (LHSF/Log-D/ASY) can be execute on either machine

Generalization of Application- Centric scheduling Each application develop a schedule to optimize its own performance without regard to the performance goals of other applications which share the system Each application-centric schedule for different application is unrelated However, there are still some commonalities which underly application-centric program development

Components of Application- Centric scheduling.. Performance criteria/metrics Dynamic system state Application-specific resource locality Application performance characteristics User preferences Prediction

Performance criteria/metrics Performance criteria/metrics vary with the application –Eg. to minimize execution time 3D-REACT: by maximizing speedup over a single-machine implementation NILE: by distributing analysis of independent events –Some common metrics Execution time Speedup Cost of execution cycle –User will attempt to optimize the usages of same resource for different performance criteria at the same time

Dynamic system state Mixture of dedicated and non-dedicated resources –Should wait until the dedicated resources become available, or –Should execute the application with lesser performance on the non-dedicated resource currently available Requirement of dynamic assessment of –Current system state –Resource loads –Short-term, but accurate prediction

Application-specific Resource Locality Applications seek to use “close” resources? “Closeness” is a function of what the application requires from a resource as well as the resource’s capability –“Distance” of resources: the resource performance deliverable to application Is X and Y close? XY task X task Y

Application Characteristics Implementation-dependent and implementation- independent Some common categories of attributes –Task-specific implementation characteristics Computation paradigm,number/size of data structure, data communication pattern, memory requirement, etc. –Inter-task communication characteristics Data format for each task,pipeline size,communication regularity and frequency, etc. –Application structure information Input/output requirement,iteration pattern, etc.

User Preferences Not necessary directly related to application performance Act as a filter over the possible resources and implementation available to the user

Role of Prediction Prediction tells you –Potential communication and computation behavior of the application –Potential availability and load of resource –Potential performance of the application with respect to candidate schedules Sources of prediction –App-specific or app-independent benchmark –Statistical analysis –Sensed or sampled data –Analytical model

Process of scheduling an application 1.Use user preference to filter out infeasible schedules 2.Use application-specific and dynamic information to develop an schedule 3.Use individual notion of performance and resource locality to evaluate the schedule 4.Predict the performance of candidate schedules 5.Compare and determine the “best schedule” that can be implemented on the available resources

AppleS(Application-level Scheduler) Each application will have its own AppleS agent(a customized scheduler for each application) What does AppleS do? –Select resources –Determine a performance-efficient schedule –Implement that schedule with respect to the appropriate resource management system AppleS is NOT a resource management system: it rely on systems such as Globus,Legion

Organization of an AppleS agent

components of AppleS Resource Selector: –choose and filter different resource combination Planner –Generate a description of a resource-dependent schedule from a given resource combination Performance estimator –Generate an estimate for candidate schedules according to the user’s performance metric Coordinator –Choose the “best” schedule Actuator –Implement the “best” schedule on the target resource management system

Input of AppleS: Information Pool Network Weather Service –Dynamic information of system state and forecast of resource load Heterogeneous Application Template(HAT) –information for the structure, characteristics and implementation of application and its tasks Model –Used for performance estimation, planning and resource selection User Specification(US) –Information on user’s criteria for performance, execution constraint, preference for implementation, etc

Using AppleS 1.User provide information to AppleS via HAT and US 2.Coordinator uses this information to filter out infeasible/possibly-bad schedules 3.Resource selector identify promising sets of resource, and prioritize them based the logical “distance” between resources 4.Planner computes a potential schedule for each viable resource configuration 5.Performance estimator evaluates each schedule in terms of the user’s performance objective 6.Coordinator chooses the best schedule and then implements it with Actuator

Using AppleS Example: 3D-REACT 1.Assuming implementations of LHSF and Log-D are available for several architectures 2.HAT: specify the computation-to-communication ratios for LHSF and Log-D, degree of overlap that is possible between the two, etc. for each implementation 3.Resource selector determine viable pairs of resources 4.Planner identify a set of candidate schedules 5.Performance estimator calculate the transfer unit size between LHSF and Log-D for each candidate schedule 6.Coordinator sends the best schedule to the Actuator

Jacobi2D code.. a distributed data-parallel two dimensional Jacobi iterative solver commonly used to solve the finite-difference approximation to Poisson's equation Variable coefficients are represented as elements of a two-dimensional grid At each iteration, the new value of each grid element is defined to be the average of its four nearest neighbors during the previous iteration

..Jacobi2D code Typically, the Jacobi computation is parallelized by partitioning the grid into rectangular regions, and then assigning each region to a different processor Parallelism vs. communication overhead P 0 is twice as fast as processor P 1 or P 2

FDDI RS600 Alpha workstation

Three partition methods HPF Uniform/Blocked –each processor is assigned (at compile-time) a relatively equal-sized square region of the grid to compute Non-Uniform Strip –uses good static estimates for resource performance and uses resource selection to select a resource set from the total resources AppleS

Memory availability Adding two IBM SP-2 node with 128M memory into resource pool dedicated access to the two SP-2 nodes and the link between them the best partitioning is to split the grid evenly between the two SP-2 nodes as long as neither partition exceeded the available real memory on each node

A lot of page swapping

Conclusion Performance-efficient schedule must exploit the concurrency of independent application task as well as factor in the impact of resource contention/diversity/autonomy AppleS: http://apples.ucsd.edu/, still a working-in-progresshttp://apples.ucsd.edu/ Related work: MARS: http://www.uni-paderborn.de/pc2/projects/mol/mars.htm CLEO: http://www.lns.cornell.edu/public/CLEO/ 3D-REACT: http://www.cacr.caltech.edu/Publications/techpubs/CASA/ cacr123/web4.htm http://www.cacr.caltech.edu/Publications/techpubs/CASA/ cacr123/web4.htm

Scheduling From the Perspective of the Application By Francine Berman & Richard Wolski Presenter:Kun-chan Lan.

Similar presentations

Presentation on theme: "Scheduling From the Perspective of the Application By Francine Berman & Richard Wolski Presenter:Kun-chan Lan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scheduling From the Perspective of the Application By Francine Berman & Richard Wolski Presenter:Kun-chan Lan.

Similar presentations

Presentation on theme: "Scheduling From the Perspective of the Application By Francine Berman & Richard Wolski Presenter:Kun-chan Lan."— Presentation transcript:

Similar presentations

About project

Feedback