Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R.

Similar presentations


Presentation on theme: "1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R."— Presentation transcript:

1 1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R. Quinn² ¹ University of Illinois at Urbana-Champaign ² University of Washington

2 2 Parallel Programming Laboratory @ UIUC 9/29/2016 Outline: ParallelGravity ● Motivations ● Charm++ ● Basic algorithm ● CacheManager ● Prefetching ● Interaction lists ● Load balancer ● Scalability

3 3 Parallel Programming Laboratory @ UIUC 9/29/2016 Motivations ● Need for simulations of the evolution of the universe ● Current parallel codes: – PKDGRAV – Gadget ● Scalability problems: – load imbalance – expensive domain decomposition – limit to 128 processors

4 4 Parallel Programming Laboratory @ UIUC 9/29/2016 Charm++ Overview ● work decomposed into objects called chares ● message driven User view System view P 1 P 3 P 2 ● mapping of objects to processors transparent to user ● automatic load balancing ● communication optimization

5 5 Parallel Programming Laboratory @ UIUC 9/29/2016 ParallelGravity ● Simulator of cosmological interaction (gravity) ● Particle based – high resolution where needed – based on tree structures ● different types of trees ● different domain decompositions ● Implemented in Charm++ – work divided among chares called TreePieces

6 6 Parallel Programming Laboratory @ UIUC 9/29/2016 Datasets and Systems lambs 3 million particles (47 MB)dwarf 5 million particles (80 MB)

7 7 Parallel Programming Laboratory @ UIUC 9/29/2016 Space decomposition TreePiece 1TreePiece 2TreePiece 3...

8 8 Parallel Programming Laboratory @ UIUC 9/29/2016 Basic algorithm... ● Newtonian gravity interaction – Each particle is influenced by all others: O(n²) algorithm ● Barnes-Hut approximation: O(nlogn) – Influence from distant particles combined into center of mass

9 9 Parallel Programming Laboratory @ UIUC 9/29/2016... in parallel ● Remote data – need to fetch from other processors ● Data reusage – same data needed by more than one particle

10 10 Parallel Programming Laboratory @ UIUC 9/29/2016 Overall algorithm Processor 1 local work (low priority) remote work mis s TreePiece C local work (low priority) remote work TreePiece B TreePiece A local work (low priority) global work Start computation End computation remote present? request node CacheManager YES: return Processor n reply with requested data NO: fetch callback TreePiece on Processor 2 buffer

11 11 Parallel Programming Laboratory @ UIUC 9/29/2016 Serial performance

12 12 Parallel Programming Laboratory @ UIUC 9/29/2016 CacheManager importance 1 million lambs dataset on HPCx

13 13 Parallel Programming Laboratory @ UIUC 9/29/2016 Prefetching 1) implicit in the cache ● computation performed with tree walks ● after visiting a node, its children will likely be visited ● while fetching remote nodes, the cache prefetches some of its children 2) explicit ● before force computation, data is requested for preload

14 14 Parallel Programming Laboratory @ UIUC 9/29/2016 Cache implicit prefetching 0 1 2 3

15 15 Parallel Programming Laboratory @ UIUC 9/29/2016 Interaction lists Node X opening criteria cut-off node X is undecided node X is accepted node X is opened

16 16

17 17 Parallel Programming Laboratory @ UIUC 9/29/2016 Interaction list: results ● 10% average performance improvement

18 18 Parallel Programming Laboratory @ UIUC 9/29/2016 Load balancer dwarf 5M dataset on BlueGene/L improvement between 15% and 35% flat lines good raising lines bad

19 19 Parallel Programming Laboratory @ UIUC 9/29/2016 Load balancer lambs 300K subset on 64 processors of Tungsten ● lightweight domain decomposition ● charm++ load balancing Time Processors while: high utilization dark: processor idle

20 20 Parallel Programming Laboratory @ UIUC 9/29/2016 Scalability comparison dwarf 5M comparison on Tungsten flat: perfect scaling diagonal: no scaling

21 21 Parallel Programming Laboratory @ UIUC 9/29/2016 ParallelGravity scalability flat: perfect scaling diagonal: no scaling

22 22 Parallel Programming Laboratory @ UIUC 9/29/2016 Future work ● Production level physics – Periodic boundaries – Smoothed Particle Hydrodynamics – Multiple timestepping ● New load balancers ● Beyond 1,024 processors

23 23 Parallel Programming Laboratory @ UIUC 9/29/2016 Questions? Thank you

24 24 Parallel Programming Laboratory @ UIUC 9/29/2016 Tree decomposition TreePiece 1TreePiece 3TreePiece 2 ● Exclusive ● Shared ● Remote

25 25 Parallel Programming Laboratory @ UIUC 9/29/2016 Interaction list X TreePiece A

26 26 Parallel Programming Laboratory @ UIUC 9/29/2016 Tree-in-cache lambs 300K subset on 64 processors of Tungsten


Download ppt "1 Scalable Cosmological Simulations on Parallel Machines Filippo Gioachin¹ Amit Sharma¹ Sayantan Chakravorty¹ Celso Mendes¹ Laxmikant V. Kale¹ Thomas R."

Similar presentations


Ads by Google