Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nomadic Grid Applications: The Cactus WORM G.Lanfermann Max Planck Institute for Gravitational Physics Albert-Einstein-Institute, Golm Dave Angulo University.

Similar presentations


Presentation on theme: "Nomadic Grid Applications: The Cactus WORM G.Lanfermann Max Planck Institute for Gravitational Physics Albert-Einstein-Institute, Golm Dave Angulo University."— Presentation transcript:

1 Nomadic Grid Applications: The Cactus WORM G.Lanfermann Max Planck Institute for Gravitational Physics Albert-Einstein-Institute, Golm Dave Angulo University of Chicago Chicago, Il.

2 Grid Application Development Software Project Outline l The Worm - Migration on the Grid: –Motivation –Design l The Worm: Adaptive Migration –Data Transfer using GridFTP –Resource Selection using MDS-2 and ClassAds –Contract Monitoring using MDS-2 –Intelligent Migration using Gram

3 Grid Application Development Software Project This talk: http://people.cs.uchicago.edu/~dangulo/grad s/CactusGrADS-Aug7-GlobusRetreat.ppt Other documents on GrADS in Cactus architecture: http://people.cs.uchicago.edu/~dangulo/grad s/arch/ http://www.cactuscode.org Paper available in back of 2001 Globus Retreat book

4 Grid Application Development Software Project Resource Broker Requested & Available Resource Payload Migration Payload Your Grid Migration on the Grid

5 Grid Application Development Software Project Large Scale HPC Simulation: Daily Routine The “daily” routine of doing large scale numerical simulations: l Take an educated guess at memory requirements, number of processors, disk space needed l Start with the first parameter in a range of values to explore the behavior of your simulation. –Select a machine and submit to the queuing system. Wait. –Archive and analyze the data; make changes to the parameter file, resubmit to the queuing system. Wait. l For the large production run, increase resolution of your experiment, take educated guess at memory,…. l Select a big machine, submit to the queue. Wait 3-7 days. l Archive data & checkpoint file, resubmit to the queue. Wait 3-7 days. l Archive data & checkpoint file, resubmit. Wait 3-7 days. l ….

6 Grid Application Development Software Project Automating the Routine l Let the computer find out about the code’s resource requirements. l Automatically contact appropriate machines, stage executables and submit to the queuing system. l Let the computer monitor the quality of the requested resources as the simulation progresses. l Perform multiple simulations over a range of parameters automatically and in parallel. l Archive the data and give the user a uniform access. There is plenty of room to automate the way simulations are carried out today.

7 Grid Application Development Software Project Cactus + Grid Cactus based Application Thorns The Physics: Initial Data, Evolution, Analysis, etc Grid Aware Application Thorns Drivers for Contract Management, Dynamic Resource Detection, Simulation Relocation Grid Enabled Communication Library MPICH-G2 implementation of MPI, can run MPI programs across heterogeneous computing resources Standard MPI Single Proc

8 Grid Application Development Software Project The Grid Layer Concept Application Thorns provides: Initial Data, Analysis,Evolution Grid Thorns provide: Migration & Resource Management Grid Enabled Simulation Grid Enabled Computational Framework Cactus Computational Framework

9 Grid Application Development Software Project Migrating Applications on the Grid Payload Application Information Server AIS Migration Unit Resource Management Resource Selector Client Worm Layer Hibernation Storage Off Site Data Server Resource Broker

10 Grid Application Development Software Project The WORM at HPDC10 Information Server Migration Server

11 Grid Application Development Software Project Current Architecture Under Development Resource Selection Client Thorn External Resource Selection Service “Worm” Migration Module Cactus Worm Server Thorns Cactus Application Unit Cactus Flesh Performance Degradation Detection User Supplied Application Payload External Processes Migration Logic Manager GridFTP Client Thorn External GridFTP Server (Source) External GridFTP Server (Destination) Data transfer Gram

12 Grid Application Development Software Project Migration of Checkpoint Files Uses alpha version of GridFTP l Allows Third Party Transfer –Without this, need to >do a GET to transfer files from source to Migrator >do a PUT to transfer files from Migrator to destination l Uses GSI security –Allows grid-proxy with only a single sign-on while retaining tight security l Allows fast, efficient, reliable transfer

13 Grid Application Development Software Project Resource Selector Architecture (ClassAds) Resource Selection Client Thorn ClassAds library Resource Selection Engine Request in ClassAds format Response in XML GIIS NWS Resources UTk Project GRISs MDS-2

14 Grid Application Development Software Project MDS-2 Future Plans l Resource selector goes to GRIS directly after resources discovered l To investigate: strategies for managing update traffic l Would like persistent queries to support notification of changes in resource status

15 Grid Application Development Software Project Resource Selection: Example Input: ClassAds format [ Type="request"; Owner="dangulo"; RequiredDomains={"cs.uiuc.edu", "ucsd.edu"}; requirements= "other.opSys==‘LINUX’ & other.minMemSize> (100G/other.CPUCount) && Include(other.domains, RequiredDomains) "; Rank= other.minCPUSpeed * other.CPUCount / (other.maxCPULoad+1); ]

16 Grid Application Development Software Project Resource Selection: Example output

17 Grid Application Development Software Project Performance Model Working on putting Performance Model into ClassAds Every processor is assigned to computer XYZ/N grid points. Requested Memory > 16(constant) + 512 * (10E-6)(constant) * (XYZ / N) (MB) Time needed to perform an iteration= (computation time + communication time) * slowdown 800 Floating point operations every grid point per iteration. Computation time= 800(constant) * (XYZ / N)/ cpuspeed cpuspeed is FLOPS Communication time= 1/G * 2*( T1 + 2 * T2 * GXYR) T1 is the communication latency between two processors. latency from NWS T2 is the transmit time for a word T2 = 1 / (available bandwidth) available bandwidth from NWS Slowdown=1 + cpuload

18 Grid Application Development Software Project Contract Monitor l Driven by three user-controllable parameters –Time quantum for “time per iteration” –% degradation in time per iteration (relative to prior average) before noting violation –Number of violations before migration l Potential causes of violation –Competing load on CPU –Computation requires more processing power: e.g., mesh refinement, new subcomputation –Hardware problems

19 Grid Application Development Software Project Contract Monitor Details l The end user specifies several variables. l These variables can be changed during runtime by contacting the application with an HTTP interface. l These variables include: – time quantum – % degradation – number of violations before migration l The system will then calculate the average wall clock time per iteration for each time quantum. l If the average iteration in any time quantum has lower performance (by the percentage specified) than the average for all the other previous quanta, then a violation is noted.

20 Grid Application Development Software Project Actions Taken on Contract Violation l Occurs when more than the specified number of violations have been noted l New set of resources requested from the ResourceSelector l Checkpoints the application l Moves checkpoint data to the new resources along with other data needed for restart l Restarts application on the new resources

21 Grid Application Development Software Project Migration Manager l Allows RS selection to occur asynchronously l Make intelligent choice on whether migration will actually help –Will not migrate to seemingly lower quality resources

22 Grid Application Development Software Project Summary l The Worm gives easy adaptability to changing grid environments to researchers in physics and computational science l Data Transfer using GridFTP l Resource Selection using MDS-2 and ClassAds l Contract Monitoring using MDS-2 l Intelligent Migration using Gram


Download ppt "Nomadic Grid Applications: The Cactus WORM G.Lanfermann Max Planck Institute for Gravitational Physics Albert-Einstein-Institute, Golm Dave Angulo University."

Similar presentations


Ads by Google