Presentation is loading. Please wait.

Presentation is loading. Please wait.

The EMAN Application: An Update. EMAN Oversimplified Preliminary 3D Model Preliminary 3D model Particles Electron Micrographs Refine Final 3D model.

Similar presentations


Presentation on theme: "The EMAN Application: An Update. EMAN Oversimplified Preliminary 3D Model Preliminary 3D model Particles Electron Micrographs Refine Final 3D model."— Presentation transcript:

1 The EMAN Application: An Update

2 EMAN Oversimplified Preliminary 3D Model Preliminary 3D model Particles Electron Micrographs Refine Final 3D model

3 Start proc3d volume Seq. component Parallel component classesbymra proc2d classalign2 make3d make3diter volume EMAN Refinement Process project3d

4 Recent non-EMAN Experiments Montage workflows show value of scheduling (joint with Eva Deelman et al.)

5 Recent EMAN Experiments Experiments with GroEL data set (“small”, 200MB input file replicated on all clusters) —All different grid configurations Conclusion: Significant advantage to using performance estimates, if they are accurate —Significant disadvantage to being “too smart”

6 Where do we go from here? Scalability —Putting together a big testbed for SC04 –~80 nodes at UH (IA-32 + IA-64), ~60 nodes at Rice (IA-64), ~4 nodes at Baylor College of Medicine (IA-32) —Run a big calculation (with real data?) —Problem: IP addresses can’t be permanent (at Rice) Better heuristics —Scheduler improvements under design (Anirban) —Performance prediction under load needed (Gabi?) Incorporating queueing systems —Schedule a cluster as a cluster —Needs model of queue delay

7 Backup slides after this

8 By using heuristic workflow scheduling, workflow completion times improve by an order of magnitude[>20 times] over random scheduling for heterogeneous platform Workflow completion time is within 10% of that using a very expensive AI scheduler that doesn’t scale to 2047 jobs Heuristic Workflow Scheduling: Results

9 Simulation results for workflow completion times for different “Montage” workflows Improvement of >20% for homogeneous platform Heuristic Workflow Scheduling: Results Preliminary results from joint work with Ewa Deelman et. al. at USC ISI

10 Results with GroEL data EMAN data-set: 200MB input file replicated on all clusters Ran the refinement cycle for the GroEL data using the new version of EMAN Used new performance models Testbed: 6 nodes [i2-53 to 58] on the UoH mckinley cluster and 7 nodes [torc1 - torc7] on the Utk torc cluster Analyzed results of makespan length for the most computationally intensive step in the workflow : classesbymra [clsbymra] Compared heuristic scheduling using performance models with random scheduling

11 Results: Unloaded Resources Used 2 nodes [i2-55 to 56] on mckinley cluster and 7 nodes [torc1 - torc7] on the torc cluster The number in the braces after execution times indicate the number of clsbymra instances mapped to the site rank[clsbymra_i][UH_mc_j]=7.60; rank[clsbymra_i][Utk_mc_j]=16.37 Conclusion : Very accurate relative performance models on different heterogeneous platforms combined with heuristic scheduling result in near optimal load balance of the clsbymra instances when the grid resources are relatively unloaded [dedicated resources] Heuristic_run1Heuristic_run2Random_run1Random_run2 Exectime(uh)12m 45s [38]12m 40s [38]5m 22s [15]6m 44s [19] Exectime(utk)11m 50s [60]11m 45s [60]15m 39s [83]15m 58s [79] Makespan12m 45s12m 40s15m 39s15m 58s

12 Results: Loaded Resources Used 5 nodes [i2-54 to 57] on mckinley cluster and 7 nodes [torc1 - torc7] on the torc cluster; i2-54 to 57 were highly loaded; torcs not loaded rank[clsbymra_i][UH_mc_j]=7.60; rank[clsbymra_i][Utk_mc_j]=16.37 Uneven load balance due to loading of the uh machines Conclusion : Performance model based scheduling works only when the underlying set of resources are reliable [or advanced reserved]. NWS predictions may not be enough. Heuristic_runRandom_run Exectime(uh)16m 41s [60]9m 38s [44] Exectime(utk)7m 51s [38]10m 28s [54] Makespan16m 41s10m 28s

13 Results: Inaccurate Performance Models Used 6 nodes [i2-53 to 58] on mckinley cluster and 7 nodes [torc1 - torc7] on the torc cluster; torcs not loaded; UoH machines moderately loaded rank[clsbymra_i][UH_mc_j]=4.57; rank[clsbymra_i][Utk_mc_j]=16.37 Performance model for UoH machines way off. Conclusion : Inaccurate relative performance models on different heterogeneous platforms result in poor load balance of the clsbymra instances. [Note that the numbers here reflect loss of performance due to both inaccurate perfromance models and moderate load on UoH] Heuristic_run1Heuristic_run2Random_run1Random_run2 Exectime(uh)21m 10s [77]21m 29s [77]6m 5s [49]4m 44s [41] Exectime(utk)3m 54s [21]3m 55s [21]9m 13s [49]11m 51s [57] Makespan21m 10s21m 29s9m 13s11m 51s

14 Results with rdv data Successfully ran refinement cycle for rdv data using EMAN version 1.6 using the GrADSoft code base Medium/large data-set: 2GB input file replicated on all clusters New performance models for the components Testbed: — Six nodes [i2-53 to i2-58] at the mckinley cluster at University of Houston - IA64 — Seven single processor nodes [torc1 to torc7] in the torc cluster at University of Tennessee, Knoxville - IA32

15 Results: rdv data with unloaded resources Component Name Resource(s) Chosen # instances Output directory Component Exec. Time proc3di2-581GrADS_27111<1 min project3di2-581GrADS_319141h. 48 min proc2di2-581GrADS_5765<1 min classesbymrai2-53 to 58 torc1-7 68 [i2-*] 42 [torc*] GrADS_9850 GrADS_9849 84 h. 30 min 81 h. 41 min classalign2i2-53 to 58379GrADS_2749645 min make3di2-581GrADS_1632547 min proc3di2-581GrADS_27520<1 min proc3di2-581GrADS_13198<1 min Accurate relative performance models on different heterogeneous platforms combined with heuristic scheduling result in optimal load balance of the classesbymra instances when the resources are unloaded


Download ppt "The EMAN Application: An Update. EMAN Oversimplified Preliminary 3D Model Preliminary 3D model Particles Electron Micrographs Refine Final 3D model."

Similar presentations


Ads by Google