Download presentation
Presentation is loading. Please wait.
Published byBranden Fitzgerald Modified over 9 years ago
1
© 2003 IBM Corporation IBM Systems and Technology Group Operating System Attributes for High Performance Computing Ken Rozendal Distinguished Engineer IBM Linux Technology Center
2
IBM Systems and Technology Group © 2003 IBM Corporation Operating System Attributes for HPC Reducing NUMA Effects Exploiting larger page sizes Reducing operating system “jitter” Avoiding planned and unplanned downtime Other attributes
3
IBM Systems and Technology Group © 2003 IBM Corporation Reducing NUMA Effects Most systems have NUMA attributes due to memory bus and cache designs. The degree of NUMA behavior is substantially different between systems. The default OS behavior in placing new memory pages makes critical difference. The applications need to either code to the default NUMA behavior or explicitly place. OS needs to provide APIs for discovering NUMA topology and providing placement policies.
4
IBM Systems and Technology Group © 2003 IBM Corporation Exploiting Larger Page Sizes Larger page sizes reduce TLB reloads. Most of the benefit occurs with the first few doublings of the page size. Using both small and large page sizes requires very flexible allocation policies. Need to have OS adjust quickly for changing requirements for large pages. Need to be able to place large pages without changing application source code.
5
IBM Systems and Technology Group © 2003 IBM Corporation Reducing Operating System “Jitter” OS “jitter” - interruptions to execution on one node amplified across a cluster Types of interruption – hardware and software interrupts, daemons Approaches: Eliminate types of interrupts (e.g. timer ticks) Simplify – eliminate unused subsystems Daemon squashing Synchronizing interruptions across CPUs on node and nodes in cluster
6
IBM Systems and Technology Group © 2003 IBM Corporation Avoiding Planned and Unplanned Downtime Avoid hardware failures causing downtime. CPUs, Memory, I/O Avoid downtime due to software updates. Concurrent update to operating system components Avoid downtime due to hardware updates. OS migration between systems Application migration Recover from unplanned downtimes: Checkpoint/restart
7
IBM Systems and Technology Group © 2003 IBM Corporation Other Operating System Attributes for HPC Support for standard programming models Support for high performance interconnects Parallel file systems Performance analysis and tuning tools Parallel application debugging tools Cluster system management tools
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.