1 Scalable Molecular Dynamics for Large Biomolecular Systems Robert Brunner James C Phillips Laxmikant Kale Department of Computer Science and Theoretical.

1 Scalable Molecular Dynamics for Large Biomolecular Systems Robert Brunner James C Phillips Laxmikant Kale Department of Computer Science and Theoretical Biophysics Group University of Illinois at Urbana Champaign

2 Parallel Computing with Data-driven Objects Laxmikant (Sanjay) Kale Parallel Programming Laboratory Department of Computer Science http://charm.cs.uiuc.edu

3Overview Context: approach and methodologyContext: approach and methodology Molecular dynamics for biomoleculesMolecular dynamics for biomolecules Our program, NAMDOur program, NAMD –Basic parallelization strategy NAMD performance optimizationsNAMD performance optimizations –Techniques –Results Conclusions: summary, lessons and future workConclusions: summary, lessons and future work

4 Parallel Programming Laboratory Objective: Enhance performance and productivity in parallel programmingObjective: Enhance performance and productivity in parallel programming –For complex, dynamic applications –Scalable to thousands of processors Theme:Theme: –Adaptive techniques for handling dynamic behavior Strategy: Look for optimal division of labor between human programmer and the “system”Strategy: Look for optimal division of labor between human programmer and the “system” –Let the programmer specify what to do in parallel –Let the system decide when and where to run them Data driven objects as the substrate: Charm++Data driven objects as the substrate: Charm++

5 System Mapped Objects 1 12 5 9 10 2 11 34 7 13 6 8 1 5 8 10 4 11 12 9 23 9 6 713

6 Data Driven Execution Scheduler Message Q

7Charm++ Parallel C++ with data driven objectsParallel C++ with data driven objects Object Arrays and collectionsObject Arrays and collections Asynchronous method invocationAsynchronous method invocation Object Groups:Object Groups: –global object with a “representative” on each PE Prioritized schedulingPrioritized scheduling Mature, robust, portableMature, robust, portable http://charm.cs.uiuc.eduhttp://charm.cs.uiuc.edu

8 Multi-partition Decomposition Writing applications with Charm++Writing applications with Charm++ –Decompose the problem into a large number of chunks –Implements chunks as objects Or, now, as MPI threads (AMPI on top of Charm++)Or, now, as MPI threads (AMPI on top of Charm++) Let Charm++ map and remap objectsLet Charm++ map and remap objects –Allow for migration of objects –If desired, specify potential migration points

9 Load Balancing Mechanisms Re-map and migrate objectsRe-map and migrate objects –Registration mechanisms facilitate migration –Efficient message delivery strategies –Efficient global operations Such as reductions and broadcastsSuch as reductions and broadcasts Several classes of load balancing strategies providedSeveral classes of load balancing strategies provided –Incremental –Centralized as well as distributed –Measurement based

10 Principle of Persistence An observation about CSE applicationsAn observation about CSE applications –Extension of principle of locality –Behavior of objects, including computational load and communication patterns, tend to persist over time Application induced imbalance:Application induced imbalance: –Abrupt, but infrequent, or –Slow, cumulative –Rarely: frequent, large changes Our framework still deals with this case as wellOur framework still deals with this case as well Measurement based strategiesMeasurement based strategies

11 Measurement-Based Load Balancing Strategies Collect timing data for several cyclesCollect timing data for several cycles Run heuristic load balancerRun heuristic load balancer –Several alternative ones Robert Brunner’s recent Ph.D. thesis:Robert Brunner’s recent Ph.D. thesis: –Instrumentation framework –Strategies –Performance comparisons

12 Molecular Dynamics ApoA-I: 92k Atoms

13 Molecular Dynamics and NAMD MD is used to understand the structure and function of biomoleculesMD is used to understand the structure and function of biomolecules –Proteins, DNA, membranes NAMD is a production-quality MD programNAMD is a production-quality MD program –Active use by biophysicists (published science) –50,000+ lines of C++ code –1000+ registered users –Features include: CHARMM and XPLOR compatibilityCHARMM and XPLOR compatibility PME electrostatics and multiple timesteppingPME electrostatics and multiple timestepping Steered and interactive simulation via VMDlSteered and interactive simulation via VMDl

14 NAMD Contributors PI s :PI s : –Laxmikant Kale, Klaus Schulten, Robert Skeel NAMD Version 1:NAMD Version 1: –Robert Brunner, Andrew Dalke, Attila Gursoy, Bill Humphrey, Mark Nelson NAMD2:NAMD2: –M. Bhandarkar, R. Brunner, Justin Gullingsrud, A. Gursoy, N.Krawetz, J. Phillips, A. Shinozaki, K. Varadarajan, Gengbin Zheng,.. Theoretical Biophysics Group, supported by NIH

15 Molecular Dynamics Collection of [charged] atoms, with bondsCollection of [charged] atoms, with bonds Newtonian mechanicsNewtonian mechanics At each time-stepAt each time-step –Calculate forces on each atom BondsBonds Non-bonded: electrostatic and van der Waal’sNon-bonded: electrostatic and van der Waal’s –Calculate velocities and advance positions 1 femtosecond time-step, millions needed!1 femtosecond time-step, millions needed! Thousands of atoms (1,000 - 100,000)Thousands of atoms (1,000 - 100,000)

16 Cut-off Radius Use of cut-off radius to reduce workUse of cut-off radius to reduce work –8 - 14 Å –Far away atoms ignored! (screening effects) 80-95 % work is non-bonded force computations80-95 % work is non-bonded force computations Some simulations need faraway contributionsSome simulations need faraway contributions –Particle-Mesh Ewald (PME) Even so, cut-off based computations are important:Even so, cut-off based computations are important: –Near-atom calculations constitute the bulk of the above –Multiple time-stepping is used: k cut-off steps, 1 PME So, (k-1) steps do just cut-off based simulationSo, (k-1) steps do just cut-off based simulation

17Scalability The program should scale up to use a large number of processors.The program should scale up to use a large number of processors. –But what does that mean? An individual simulation isn’t truly scalableAn individual simulation isn’t truly scalable Better definition of scalability:Better definition of scalability: –If I double the number of processors, I should be able to retain parallel efficiency by increasing the problem size

18Isoefficiency Quantify scalabilityQuantify scalability –(Work of Vipin Kumar, U. Minnesota) How much increase in problem size is needed to retain the same efficiency on a larger machine?How much increase in problem size is needed to retain the same efficiency on a larger machine? Efficiency : sequential-time/ (P · parallel-time)Efficiency : sequential-time/ (P · parallel-time) –Parallel-time = computation + communication + idle

19 Early methods Atom replication:Atom replication: –Each processor has data for all atoms –Force calculations parallelized Collection of forces: O(N log p) communicationCollection of forces: O(N log p) communication –Computation: O(N/P) –Communication/computation Ratio: O(P log P) : Not Scalable Atom DecompositionAtom Decomposition –Partition the atoms array across processors Nearby atoms may not be on the same processorNearby atoms may not be on the same processor –Communication: O(N) per processor –Ratio: O(N) / (N / P) = O(P): Not Scalable

20 Force Decomposition Distribute force matrix to processorsDistribute force matrix to processors –Matrix is sparse, non uniform –Each processor has one block –Communication: –Ratio: Better scalability in practiceBetter scalability in practice –Can use 100+ processors –Plimpton: –Hwang, Saltz, et al: 6% on 32 processors6% on 32 processors 36% on 128 processor36% on 128 processor –Yet not scalable in the sense defined here!

21 Spatial Decomposition Allocate close-by atoms to the same processorAllocate close-by atoms to the same processor Three variations possible:Three variations possible: –Partitioning into P boxes, 1 per processor Good scalability, but hard to implementGood scalability, but hard to implement –Partitioning into fixed size boxes, each a little larger than the cut-off distance –Partitioning into smaller boxes Communication: O(N/P)Communication: O(N/P) –Communication/Computation ratio: O(1) –So, scalable in principle

22 Ongoing work Plimpton, Hendrickson:Plimpton, Hendrickson: – new spatial decomposition NWChem (PNL)NWChem (PNL) Peter Kollman, Yong Duan et al:Peter Kollman, Yong Duan et al: –microsecond simulation –AMBER version (SANDER)

23 Spatial Decomposition in NAMD But the load balancing problems are still severe

24 Hybrid Decomposition

25 FD + SD Now, we have many more objects to load balance:Now, we have many more objects to load balance: –Each diamond can be assigned to any processor –Number of diamonds (3D): 14·Number of Patches14·Number of Patches

26 Bond Forces Multiple types of forces:Multiple types of forces: –Bonds(2), angles(3), dihedrals (4),.. –Luckily, each involves atoms in neighboring patches only Straightforward implementation:Straightforward implementation: –Send message to all neighbors, –receive forces from them –26*2 messages per patch!

27 Bond Forces Assume one patch per processor:Assume one patch per processor: –An angle force involving atoms in patches (x1,y1,z1), (x2,y2,z2), (x3,y3,z3) is calculated in patch: (max{xi}, max{yi}, max{zi}) B CA

28 NAMD Implementation Multiple objects per processorMultiple objects per processor –Different types: patches, pairwise forces, bonded forces –Each may have its data ready at different times –Need ability to map and remap them –Need prioritized scheduling Charm++ supports all of theseCharm++ supports all of these

29 Load Balancing Is a major challenge for this applicationIs a major challenge for this application –Especially for a large number of processors Unpredictable workloadsUnpredictable workloads –Each diamond (force “compute” object) and patch encapsulate variable amount of work –Static estimates are inaccurate –Very slow variations across timesteps Measurement-based load balancing frameworkMeasurement-based load balancing framework Compute Cell (patch)

30 Bipartite Graph Balancing Background load:Background load: –Patches (integration, etc.) and bond-related forces Migratable load:Migratable load: –Non-bonded forces Bipartite communication graphBipartite communication graph –Between migratable and non-migratable objects Challenge:Challenge: –Balance load while minimizing communication Cell (Patch) Compute Cell (patch)

31 Load Balancing Strategy Greedy variant (simplified): Sort compute objects (diamonds) Repeat (until all assigned) S = set of all processors that: -- are not overloaded -- generate least new commun. P = least loaded {S} Assign heaviest compute to P Refinement: Repeat - Pick a compute from the most overloaded PE - Assign it to a suitable underloaded PE Until (No movement) Cell Compute

33 Speedups in 1998 ApoA-I: 92k atoms

34Optimizations Series of optimizationsSeries of optimizations Examples discussed here:Examples discussed here: –Grainsize distributions (bimodal) –Integration: message sending overheads Several other optimizationsSeveral other optimizations –Separation of bond/angle/dihedral objects Inter-patch and intra-patchInter-patch and intra-patch –Prioritization –Local synchronization to avoid interference across steps

35 Grainsize and Amdahls’s Law A variant of Amdahl’s law, for objects, would be:A variant of Amdahl’s law, for objects, would be: –The fastest time can be no shorter than the time for the biggest single object! How did it apply to us?How did it apply to us? –Sequential step time was 57 seconds –To run on 2k processors, no object should be more than 28 msecs. Should be even shorterShould be even shorter –Grainsize analysis via projections showed that was not so..

36 Grainsize Analysis Problem Solution: Split compute objects that may have too much work: using a heuristics based on number of interacting atoms

37 Grainsize Reduced

38 Performance Audit Through the optimization process,Through the optimization process, –an audit was kept to decide where to look to improve performance TotalIdeal Actual Total57.0486 nonBonded52.4449.77 Bonds3.163.9 Integration1.443.05 Overhead07.97 Imbalance010.45 Idle09.25 Receives01.61 Integration time doubled

39 Integration Overhead Analysis integration Problem: integration time had doubled from sequential run

40 Integration Overhead Example The projections pictures showed the overhead was associated with sending messages.The projections pictures showed the overhead was associated with sending messages. Many cells were sending 30-40 messages.Many cells were sending 30-40 messages. –The overhead was still too much compared with the cost of messages. –Code analysis: memory allocations! –Identical message is being sent to 30+ processors. Simple multicast support was added to Charm++Simple multicast support was added to Charm++ –Mainly eliminates memory allocations (and some copying)

41 Integration Overhead: After Multicast

42 ApoA-I on ASCI Red 57 ms/step

43 ApoA-I on Origin 2000

44 ApoA-I on Linux Cluster

45 ApoA-I on O2K and T3E

46 ApoA-I on T3E

47 BC1 complex: 200k atoms

48 BC1 on ASCI Red 58.4 GFlops

49 Lessons Learned Need to downsize objects!Need to downsize objects! –Choose smallest possible grainsize that amortizes overhead One of the biggest challengeOne of the biggest challenge –Was getting time for performance tuning runs on parallel machines

50 ApoA-I with PME on T3E

51 Future and Planned Work Increased speedups on 2k-10k processorsIncreased speedups on 2k-10k processors –Smaller grainsizes –Parallelizing integration further –New algorithms for reducing communication impact –New load balancing strategies Further performance improvements for PMEFurther performance improvements for PME –With multiple timestepping –Needs multi-phase load balancing Speedup on small molecules!Speedup on small molecules! –Interactive molecular dynamics

52 More Information Charm++ and associated framework:Charm++ and associated framework: –http://charm.cs.uiuc.edu NAMD and associated biophysics tools:NAMD and associated biophysics tools: –http://www.ks.uiuc.edu Both include downloadable softwareBoth include downloadable software

53 Parallel Programming Laboratory Funding:Funding: –Dept of Energy (via Rocket center) –National Science Foundation –National Institute of Health Group MembersGroup Members Milind Bhandarkar Terry Wilmarth Orion Lawlor Neelam Saboo Arun Singla Karthikeyan Mahesh Joshua Unger Gengbin Zheng Jay Desouza Sameer Kumar Chee wai Lee Affiliated (NIH/Biophysics) Jim Phillips Kirby Vandivoort

54 The Parallel Programming Problem Is there one?Is there one? –We can all write MPI programs, right? –Several Large Machines in use But:But: –New complex apps with dynamic and irregular structure –Should all application scientists also be experts in parallel computing?

55 What makes it difficult? Multiple objectivesMultiple objectives –Correctness, Sequential efficiency, speedups –Nondeterminacy: affects correctness –Several obstacles to speedup: communication costscommunication costs Load imbalancesLoad imbalances Long critical pathsLong critical paths

56 Parallel Programming DecompositionDecomposition –Decide what to do in parallel Tasks (loop iterations, functions,.. ) that can be done in parallelTasks (loop iterations, functions,.. ) that can be done in parallel Mapping:Mapping: –Which processor does each task Scheduling (sequencing)Scheduling (sequencing) –On each processor Machine dependent expressionMachine dependent expression –Express the above decisions for the particular parallel machine

57 Spectrum of parallel Languages Specialization LevelLevel MPI Parallelizing Fortran compiler Machine dependent expression Scheduling (sequencing) Mapping Decomposition What is automated Charm++

58Charm++ Data Driven ObjectsData Driven Objects Asynchronous method invocationAsynchronous method invocation Prioritized schedulingPrioritized scheduling Object ArraysObject Arrays Object Groups:Object Groups: –global object with a “representative” on each PE Information sharing abstractionsInformation sharing abstractions –readonly data –accumulators –distributed tables

59 Data Driven Execution Scheduler Message Q Objects

60 Group Mission and Approach To enhance Performance and Productivity in programming complex parallel applicationsTo enhance Performance and Productivity in programming complex parallel applications Approach: Application Oriented yet CS centered researchApproach: Application Oriented yet CS centered research –Develop enabling technology, for many apps. –Develop, use and test it in the context of real applications ThemeTheme –Adaptive techniques for irregular and dynamic applications – Optimal division of labor: “system” and programmer: Decomposition done by programmer, everything else automatedDecomposition done by programmer, everything else automated Develop standard library for parallel programming of reusable componentsDevelop standard library for parallel programming of reusable components

61 Active Projects Charm++/ Converse parallel infrastructureCharm++/ Converse parallel infrastructure Scientific/Engineering appsScientific/Engineering apps –Molecular Dynamics –Rocket Simulation –Finite Element Framework Web-based interaction and monitoringWeb-based interaction and monitoring Faucets: anonymous compute powerFaucets: anonymous compute power ParallelParallel –Operations Research, discrete event simulation, combinatorial search

62 Charm++: Parallel C++ With Data Driven Objects Chares: dynamically balanced objectsChares: dynamically balanced objects Object Groups:Object Groups: –global object with a “representative” on each PE Object Arrays/ Object CollectionsObject Arrays/ Object Collections –User defined indexing (1D,2D,..,quad and oct-tree,..) –System supports remapping and forwarding Asynchronous method invocationAsynchronous method invocation Prioritized schedulingPrioritized scheduling Mature, robust, portableMature, robust, portable http://charm.cs.uiuc.eduhttp://charm.cs.uiuc.edu Data driven Execution

63 Multi-partition Decomposition Idea: divide the computation into a large number of piecesIdea: divide the computation into a large number of pieces –Independent of number of processors –Typically larger than number of processors –Let the system map entities to processors

64Converse Portable parallel run-time system that allows interoperability among parallel languagesPortable parallel run-time system that allows interoperability among parallel languages Rich features to allow quick and efficient implementation of new parallel languagesRich features to allow quick and efficient implementation of new parallel languages Based on message-driven execution that allows co- existence of different control regimesBased on message-driven execution that allows co- existence of different control regimes Support for debugging and performance analysis of parallel programsSupport for debugging and performance analysis of parallel programs Support for building parallel serversSupport for building parallel servers

65Converse Languages and paradigms implemented:Languages and paradigms implemented: –Charm++, a parallel object- oriented language –Thread-safe MPI and PVM –Parallel Java, message-driven Perl, pC++ Platforms supported:Platforms supported: –SGI Origin2000, IBM SP, ASCI Red, CRAY T3E, Convex Ex. –Workstation clusters (Solaris, HP- UX, AIX, Linux etc.) –Windows NT Clusters Paradigms Languages, Libraries, Parallel Machines Converse

66 Adaptive MPI A bridge between legacy MPI codes and dynamic load balancing capabilities of Charm++A bridge between legacy MPI codes and dynamic load balancing capabilities of Charm++ AMPI = MPI + dynamic load balancingAMPI = MPI + dynamic load balancing Based on Charm++ object arrays and Converse’s migratable threadsBased on Charm++ object arrays and Converse’s migratable threads Minimal modification needed to convert existing MPI programs (to be automated in future)Minimal modification needed to convert existing MPI programs (to be automated in future) Bindings for C, C++, and Fortran90Bindings for C, C++, and Fortran90 Currently supports most of the MPI 1.1 standardCurrently supports most of the MPI 1.1 standard

67 Converse Use in NAMD

68 Molecular Dynamics Collection of [charged] atoms, with bondsCollection of [charged] atoms, with bonds Newtonian mechanicsNewtonian mechanics At each time-stepAt each time-step –Calculate forces on each atom bonds:bonds: non-bonded: electrostatic and van der Waal’snon-bonded: electrostatic and van der Waal’s –Calculate velocities and Advance positions 1 femtosecond time-step, millions needed!1 femtosecond time-step, millions needed! Thousands of atoms (1,000 - 200,000)Thousands of atoms (1,000 - 200,000) Collaboration with Klaus Schulten, Robert Skeel

69 Spatial Decomposition in NAMD Space divided into cubes –Forces between atoms in neighboring cubes computed by individual compute objects –Compute objects are remapped by load balancer

70 NAMD: a Production-quality MD Program NAMD is used by biophysicists routinely, with several published resultsNAMD is used by biophysicists routinely, with several published results NIH funded collaborative effort with Profs. K. Schulten and R. SkeelNIH funded collaborative effort with Profs. K. Schulten and R. Skeel Supports full range electrostaticsSupports full range electrostatics –Parallel Particle-Mesh Ewald for periodic and Fast multipole for aperiodic systems Implemented ic C++/Charm++Implemented ic C++/Charm++ Supports visualization (via VMD), Interactive MD, and haptic interface:Supports visualization (via VMD), Interactive MD, and haptic interface: –see http://www.ks.uiuc.edu –Part of Biophysics collaboratory ApoLipoprotein A1

71 NAMD Scalable Performance Sequential Performance of NAMD (a C++ program) is comparable to or better than contemporary MD programs, written in Fortran. Speedup of 1250 on 2048 processors on ASCI red, simulating BC1 with about 200k atoms. (compare with best speedups on production-quality MD by others: 170/256 processors) Around 10,000 varying-size objects mapped by the load balancer

72 Rocket Simulation Rocket behavior (and therefore its simulation) is irregular, dynamicRocket behavior (and therefore its simulation) is irregular, dynamic We need to deal with dynamic variations adaptivelyWe need to deal with dynamic variations adaptively Dynamic behavior arises fromDynamic behavior arises from –Combustion: moving boundaries –Crack propagation –Evolution of the system

73 Rocket Simulation Our Approach:Our Approach: –Multi-partition decomposition –Data-driven objects (Charm++) –Automatic load balancing framework AMPI: Migration path for existing MPI+Fortran90 codesAMPI: Migration path for existing MPI+Fortran90 codes –ROCFLO, ROCSOLID, and ROCFACE

74 FEM Framework Objective: To make it easy to parallelize existing Finite Element Method (FEM) Applications and to quickly build new parallel FEM applications including those with irregular and dynamic behaviorObjective: To make it easy to parallelize existing Finite Element Method (FEM) Applications and to quickly build new parallel FEM applications including those with irregular and dynamic behavior Hides the details of parallelism; developer provides only sequential callback routinesHides the details of parallelism; developer provides only sequential callback routines Embedded mesh partitioning algorithms split mesh into chunks that are mapped to different processors (many-to-one)Embedded mesh partitioning algorithms split mesh into chunks that are mapped to different processors (many-to-one) Developer’s callbacks are executed in migratable threads, monitored by the run-time systemDeveloper’s callbacks are executed in migratable threads, monitored by the run-time system Migration of chunks to correct load imbalanceMigration of chunks to correct load imbalance Examples:Examples: –Pressure-driven crack propagation –3-D Dendritic Growth

75 FEM Framework: Responsibilities Charm++ (Dynamic Load Balancing, Communication) FEM Framework (Update of Nodal properties, Reductions over nodes or partitions) FEM Application (Initialize, Registration of Nodal Attributes, Loops Over Elements, Finalize) METISI/O PartitionerCombiner

76 Crack Propagation Explicit FEM codeExplicit FEM code Zero-volume Cohesive Elements inserted near the crackZero-volume Cohesive Elements inserted near the crack As the crack propagates, more cohesive elements added near the crack, which leads to severe load imbalanceAs the crack propagates, more cohesive elements added near the crack, which leads to severe load imbalance Framework handlesFramework handles –Partitioning elements into chunks –Communication between chunks –Load Balancing Decomposition into 16 chunks (left) and 128 chunks, 8 for each PE (right). The middle area contains cohesive elements. Pictures: S. Breitenfeld, and P. Geubelle

77 Dendritic Growth Studies evolution of solidification microstructures using a phase- field model computed on an adaptive finite element gridStudies evolution of solidification microstructures using a phase- field model computed on an adaptive finite element grid Adaptive refinement and coarsening of grid involves re- partitioningAdaptive refinement and coarsening of grid involves re- partitioning Work by Prof. Jon Dantzig, Jun-ho Jeong

79 Anonymous Compute Power What is needed to make this metaphor work? Timeshared parallel machines in the background effective resource management Quality of computational service contracts/guarantees Front ends that will allow agents to submit jobs on user’s behalf: Computational Faucets

80 Computational Faucets What does a Computational faucet do?What does a Computational faucet do? –Submit requests to “the grid” –Evaluate bids and decide whom to assign work –Monitor applications (for performance and correctness) –Provide interface to users: Interacting with jobs, and monitoring behaviorInteracting with jobs, and monitoring behavior What does it look like?What does it look like? A browser!

81 Faucets QoS User specifies desired job parameters such as: program executable name, executable platform, min PE, max PE, estimated CPU-seconds (for various PE), priority, etc. User does not specify machine. Faucet software contacts a central server and obtains a list of available workstation clusters, then negotiates with clusters and chooses one to submit the job. User can view status of clusters. Planned: file transfer, user authentication, merge with Appspector for job monitoring. Central Server Faucet Client Web Browser Workstation Cluster

82 Timeshared Parallel Machines Need resource managementNeed resource management –Shrink and expand individual jobs to available sets of processors –Example: Machine with 100 processors Job1 arrives, can use 20-150 processorsJob1 arrives, can use 20-150 processors Assign 100 processors to itAssign 100 processors to it Job2 arrives, can use 30-70 processors,Job2 arrives, can use 30-70 processors, –and will pay more if we meet its deadline Make resource allocation decisionsMake resource allocation decisions

83 Time-shared Parallel Machines To bid effectively (profitably) in such an environment, a parallel machine must be able to run well-paying (important) jobs, even when it is already running others. Allows a suitably written Charm++/Converse program running on a workstation cluster to dynamically change the number of CPU's it is running on, in response to a network (CCS) request. Works in coordination with a Cluster Manager to give a job as many CPU's as are available when there are no other jobs, while providing the flexibility to accept new jobs and scale down.

84Appspector Appspector provides a web interface to submitting and monitoring parallel jobs. Submission: user specifies machine, login, password, program name (which must already be available on the target machine). Jobs can be monitored from any computer with a web browser. Advanced program information can be shown on the monitoring screen using CCS.

1 Scalable Molecular Dynamics for Large Biomolecular Systems Robert Brunner James C Phillips Laxmikant Kale Department of Computer Science and Theoretical.

Similar presentations

Presentation on theme: "1 Scalable Molecular Dynamics for Large Biomolecular Systems Robert Brunner James C Phillips Laxmikant Kale Department of Computer Science and Theoretical."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Scalable Molecular Dynamics for Large Biomolecular Systems Robert Brunner James C Phillips Laxmikant Kale Department of Computer Science and Theoretical.

Similar presentations

Presentation on theme: "1 Scalable Molecular Dynamics for Large Biomolecular Systems Robert Brunner James C Phillips Laxmikant Kale Department of Computer Science and Theoretical."— Presentation transcript:

Similar presentations

About project

Feedback