Galois Performance Mario Mendez-Lojo Donald Nguyen.

Galois Performance Mario Mendez-Lojo Donald Nguyen

Overview Galois system is a test bed to explore opts – Safe but not fast out of the box Important optimizations – Select least transactional overhead – Select right scheduling – Select appropriate data structure Quantify optimizations on applications 2

Algorithms 3 irregular algorithms topology operator ordering morph local computation reader general graph grid tree unordered ordered 1. Barnes-Hut 2. Delaunay Mesh Refinement 3. Preflow-push

Methodology Threads IdleSerialGC 4 Time Compute Abort Ratio: Aborted It/Total it GC options UseParallelGC UseParallelOldGC NewRatio=1

Terms Base – Default scheduling, Default graph Serial – Galois classes => No concurrency control classes Speedup – Best mean performance of a serial variant Throughput – # Serial Iterations / time 5

Numbers Runtime – Last of 5 runs in same VM – Ignore time to read and construct initial graph Other statistics – Last of 5 runs 6

Test Environment 2 x Xeon X5570 (4 core, 2.93 GHz) Java 1.6.0_0-b11 Linux 2.6.24-27 x86_64 20GB heap size 7

BARNES-HUT 8 Most Distant Galaxy Candidates in the Hubble Ultra Deep Field

Barnes-Hut N-body algorithm – Oct-tree acceleration structure – Serial Tree build, center of mass, particle update – Parallel Force computation Structure – Reader on tree Variants – Splash2, Reader Galois 9

Reader Optimization child = octree.getNeighbor(nn, 1); child = octree.getNeighbor(nn, 1, MethodFlag.NONE); 10

ParaMeter Profile 11

Barnes-Hut Results 100,000 points, 1 time step 12 Best serial: base Serial time: 10271 ms Best // time: 1553 ms Best speedup: 6.6X

Barnes-Hut Results 100,000 points, 1 time step 13 Best serial: base Serial time: 10271 ms Best // time: 1553 ms Best speedup: 6.6X

Barnes-Hut Scalability 14

DELAUNAY MESH REFINEMENT 16

Delaunay Mesh Refinement Refine “bad” triangles – Maintained in worklist Structure – Cautious operator on graph Variants – Flag optimized, locallifo 17 base: Priority.defaultOrder() local lifo: Priority.first(ChunkedFIFO.class). thenLocally(LIFO.class)

Cautious Optimization mesh.contains(item);... mesh.remove(preNodes.get(i));... mesh.add(node); mesh.contains(item, MethodFlag.CHECK_CONFLICT);... mesh.remove(preNodes.get(i), MethodFlag.NONE);... mesh.add(node, MethodFlag.NONE); No need to save undo info Only check conflicts up to first write

LIFO Optimization GaloisRuntime.foreach(..., Priority.defaultOrder()); GaloisRuntime.foreach(..., Priority.first(ChunkedFIFO.class).thenLocally(LIFO.class)); 19

DMR Results 0.5M triangles, 0.25M bad triangles Best serial: locallifo.flagopt Serial time: 17002 ms Best // time: 3745 ms Best speedup: 4.5X 21

PREFLOW-PUSH 23

Preflow-push Max-flow algorithm – Nodes push flow downhill Structure – Cautious, local computation Variants – Flag optimized, local computation graph base (discharge): Priority.first(Bucketed.class, numHeight+1, false, indexer). then(FIFO.class) base (relabel): Priority.first(ChunkedFIFO.class, 8)

Local Computation Optimization graph =... b = new LocalComputationGraph.ObjectGraphBuilder(); graph = b.from(graph).create() 25

Preflow-push Results From challenge problem (genmf-wide) 14 linearly connected grids(194x194), 526,904 nodes, 2,586,020 edges http://avglab.com/andrew/CATS/maxflow_synthetic.htm C: 11450 ms Java: 30234 ms Best serial: lc.flagopt Serial time: 57121 ms Best // time: 18242 ms Best speedup: 3.1X 27

Preflow-push Scalability 28

What performance did we expect? 30 Threads Time IdleSerialGC//ComputeMiss-Speculation Measured Indirectly Synchronization, … Error

What performance did we expect? Naïve: 31 r(x) = t 1 / x Amdahl: r(x) = t p / x + t s t 1 = t p + t s t s = t idle + t gc + t serial Simple: r(x) = (t p (i x / i 1 )) / x + t s

Barnes-Hut 32

Delaunay Mesh Refinement 33

Preflow-push 34

Summary Many profitable optimizations – Selecting among method flags, worklists, graph variants Open topics – Automation – Static, dynamic and performance analysis – Efficient ordered algorithms 35

Galois Performance Mario Mendez-Lojo Donald Nguyen.

Similar presentations

Presentation on theme: "Galois Performance Mario Mendez-Lojo Donald Nguyen."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Galois Performance Mario Mendez-Lojo Donald Nguyen.

Similar presentations

Presentation on theme: "Galois Performance Mario Mendez-Lojo Donald Nguyen."— Presentation transcript:

Similar presentations

About project

Feedback