Download presentation
Presentation is loading. Please wait.
1
OCR on Knights Landing (Xeon-Phi)
31st Mar 2016 Acknowledgment: This material is based upon work supported by the Department of Energy Office of Science under cooperative agreement DE-SC and DE-SC , and Lawrence Livermore National Labs subcontract B
2
Knights Landing Overview
Three modes Self-boot processor Self-boot w/ integrated fabric Co-processor (PCIe addon card) MCDRAM: three memory modes Flat – entirely addressable Cache – on DDR, direct-mapped Hybrid – part cache, part memory Cluster modes (cc mesh interconnect) All-to-all: address uniformly hashed Quadrant: software-transparent, address hashed to dir same quadrant as memory Sub-NUMA: exposed as 4 NUMA nodes KNL presentation at Hotchips ‘15
3
OCR on KNL 1 policy domain with up to 288 workers
MCDRAM in flat mode, with two allocators $ numactl -H available: 2 nodes (0-1) node 0 cpus: 0 255 node 0 size: MB node 0 free: MB node 1 cpus: node 1 size: MB node 1 free: MB node distances: node 0: 1: Memory hints to choose allocator on MCDRAM (OCR_HINT_DB_HIGHBW)
4
Results – Stencil 2D weak scaling
Xeon KNL Preliminary results! Software under optimization
5
Results – MCDRAM vs DDR Stencil 2D with 256 threads
Preliminary results! Software under optimization Stencil 2D with 256 threads
6
Results – Stream Runtime bottlenecks?
Profiling underway Limited vectorization opportunities? Preliminary results! Software under optimization
7
Next Steps Rootcause & fix MCDRAM performance
Study all-to-all vs. sub-NUMA modes Single vs multiple policy domains Performance counters & introspection
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.