Assessing the Scalability of Garbage Collectors on Many Cores (Funded by ANR projects: Prose and ConcoRDanT) Lokesh GidraGaël Thomas Julien SopenaMarc Shapiro Regal-LIP6/INRIA
Introduction Why? – MREs are ubiquitous! – GC, a vital component of it performance is critical? – Hardware is more and more multi-resourced. – Are GCs scaling with such hardware? – Current solutions not evaluated on true many-cores! What? – Assesses GC scalability : Empirical Results. – Possible factors affecting the GC scalability. Lokesh Gidra2
Multi-Node Architecture C0 C1 C5 L2 L3 MC DRAM C0 C1 C5 L2 L3 MC DRAM Our machine has 8 nodes with 6 cores each Remote access >> Local access To other nodes Lokesh Gidra
Parallel Copying Garbage Collection Pause Time Application Time Mutator Threads GC Threads From SpaceTo Space Live Object Dead Object Total Time Lokesh Gidra4
GCs effect on Application Scalability (Lusearch) Up-to 6 cores: 3X performance improvement. More than 6 cores: No improvement in total time. Proportion of pause time increases up-to 50%. Lokesh Gidra5 Mutator Threads = GC Threads = Varying Number of Cores
GC Scalability (Lusearch) Pause time increases with GC threads Negative Scalability! Lokesh Gidra6 Mutator Threads = Cores = 48 and, Varying Number of GC Threads
1. Remote Scanning From SpaceTo Space Live Object Dead Object Node 0 Node 1 Node 2 Node 3 GC Threads GC0GC1 GC2GC3 Lokesh Gidra7 87.7% scans were remote! Random (Default) object allocation
2. Remote Copying Node 0 Node 1 Node 2 Node 3 GC Threads From SpaceTo Space Live Object Dead Object GC0GC1 GC2GC3 Lokesh Gidra8 82.7% copies were remote!
3. Load Balancing Task Queue Owner: Push and Pop Other GC Threads: Steal (Pop) Based on work stealing technique. 1 task queue per GC thread. Highly unbalanced load: Requires a lot of stealing. Keep doing until all are done. Performance Impact: ≥ 2-4 cache misses/stealing! 33.3% improvement in pause time by disabling it! Shared Variable: size (task queue size) Lokesh Gidra9
Conclusion GC does affect application’s scalability it matters! GC doesn’t scale with the hardware! Bottlenecks: – Remote Scanning – Remote Copying – Load Balancing Future Work: – Fix the bottlenecks does it help GC to scale? Lokesh Gidra10
DaCapo Benchmarks’ Scalability Lokesh Gidra11
Revisiting App. (Lusearch) Scalability… Lokesh Gidra12