Download presentation
Presentation is loading. Please wait.
Published byBuddy Fisher Modified over 8 years ago
1
The EMAN Application: An Update
2
EMAN Oversimplified Preliminary 3D Model Preliminary 3D model Particles Electron Micrographs Refine Final 3D model
3
Start proc3d volume Seq. component Parallel component classesbymra proc2d classalign2 make3d make3diter volume EMAN Refinement Process project3d
4
Recent non-EMAN Experiments Montage workflows show value of scheduling (joint with Eva Deelman et al.)
5
Recent EMAN Experiments Experiments with GroEL data set (“small”, 200MB input file replicated on all clusters) —All different grid configurations Conclusion: Significant advantage to using performance estimates, if they are accurate —Significant disadvantage to being “too smart”
6
Where do we go from here? Scalability —Putting together a big testbed for SC04 –~80 nodes at UH (IA-32 + IA-64), ~60 nodes at Rice (IA-64), ~4 nodes at Baylor College of Medicine (IA-32) —Run a big calculation (with real data?) —Problem: IP addresses can’t be permanent (at Rice) Better heuristics —Scheduler improvements under design (Anirban) —Performance prediction under load needed (Gabi?) Incorporating queueing systems —Schedule a cluster as a cluster —Needs model of queue delay
7
Backup slides after this
8
By using heuristic workflow scheduling, workflow completion times improve by an order of magnitude[>20 times] over random scheduling for heterogeneous platform Workflow completion time is within 10% of that using a very expensive AI scheduler that doesn’t scale to 2047 jobs Heuristic Workflow Scheduling: Results
9
Simulation results for workflow completion times for different “Montage” workflows Improvement of >20% for homogeneous platform Heuristic Workflow Scheduling: Results Preliminary results from joint work with Ewa Deelman et. al. at USC ISI
10
Results with GroEL data EMAN data-set: 200MB input file replicated on all clusters Ran the refinement cycle for the GroEL data using the new version of EMAN Used new performance models Testbed: 6 nodes [i2-53 to 58] on the UoH mckinley cluster and 7 nodes [torc1 - torc7] on the Utk torc cluster Analyzed results of makespan length for the most computationally intensive step in the workflow : classesbymra [clsbymra] Compared heuristic scheduling using performance models with random scheduling
11
Results: Unloaded Resources Used 2 nodes [i2-55 to 56] on mckinley cluster and 7 nodes [torc1 - torc7] on the torc cluster The number in the braces after execution times indicate the number of clsbymra instances mapped to the site rank[clsbymra_i][UH_mc_j]=7.60; rank[clsbymra_i][Utk_mc_j]=16.37 Conclusion : Very accurate relative performance models on different heterogeneous platforms combined with heuristic scheduling result in near optimal load balance of the clsbymra instances when the grid resources are relatively unloaded [dedicated resources] Heuristic_run1Heuristic_run2Random_run1Random_run2 Exectime(uh)12m 45s [38]12m 40s [38]5m 22s [15]6m 44s [19] Exectime(utk)11m 50s [60]11m 45s [60]15m 39s [83]15m 58s [79] Makespan12m 45s12m 40s15m 39s15m 58s
12
Results: Loaded Resources Used 5 nodes [i2-54 to 57] on mckinley cluster and 7 nodes [torc1 - torc7] on the torc cluster; i2-54 to 57 were highly loaded; torcs not loaded rank[clsbymra_i][UH_mc_j]=7.60; rank[clsbymra_i][Utk_mc_j]=16.37 Uneven load balance due to loading of the uh machines Conclusion : Performance model based scheduling works only when the underlying set of resources are reliable [or advanced reserved]. NWS predictions may not be enough. Heuristic_runRandom_run Exectime(uh)16m 41s [60]9m 38s [44] Exectime(utk)7m 51s [38]10m 28s [54] Makespan16m 41s10m 28s
13
Results: Inaccurate Performance Models Used 6 nodes [i2-53 to 58] on mckinley cluster and 7 nodes [torc1 - torc7] on the torc cluster; torcs not loaded; UoH machines moderately loaded rank[clsbymra_i][UH_mc_j]=4.57; rank[clsbymra_i][Utk_mc_j]=16.37 Performance model for UoH machines way off. Conclusion : Inaccurate relative performance models on different heterogeneous platforms result in poor load balance of the clsbymra instances. [Note that the numbers here reflect loss of performance due to both inaccurate perfromance models and moderate load on UoH] Heuristic_run1Heuristic_run2Random_run1Random_run2 Exectime(uh)21m 10s [77]21m 29s [77]6m 5s [49]4m 44s [41] Exectime(utk)3m 54s [21]3m 55s [21]9m 13s [49]11m 51s [57] Makespan21m 10s21m 29s9m 13s11m 51s
14
Results with rdv data Successfully ran refinement cycle for rdv data using EMAN version 1.6 using the GrADSoft code base Medium/large data-set: 2GB input file replicated on all clusters New performance models for the components Testbed: — Six nodes [i2-53 to i2-58] at the mckinley cluster at University of Houston - IA64 — Seven single processor nodes [torc1 to torc7] in the torc cluster at University of Tennessee, Knoxville - IA32
15
Results: rdv data with unloaded resources Component Name Resource(s) Chosen # instances Output directory Component Exec. Time proc3di2-581GrADS_27111<1 min project3di2-581GrADS_319141h. 48 min proc2di2-581GrADS_5765<1 min classesbymrai2-53 to 58 torc1-7 68 [i2-*] 42 [torc*] GrADS_9850 GrADS_9849 84 h. 30 min 81 h. 41 min classalign2i2-53 to 58379GrADS_2749645 min make3di2-581GrADS_1632547 min proc3di2-581GrADS_27520<1 min proc3di2-581GrADS_13198<1 min Accurate relative performance models on different heterogeneous platforms combined with heuristic scheduling result in optimal load balance of the classesbymra instances when the resources are unloaded
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.