Download presentation
Presentation is loading. Please wait.
Published byAbigail Warner Modified over 8 years ago
1
Achieving Isolation in Consolidated Environments Jack Lange Assistant Professor University of Pittsburgh
2
Consolidated HPC Environments The future is consolidation of commodity and HPC workloads – HPC users are moving onto cloud platforms – Dedicated HPC systems are moving towards in-situ Consolidated with visualization and analytics workloads Can commodity OS/R’s effectively support HPC consolidation? – Commodity Design Goals Maximized resource utilization Fairness Graceful degradation under load
3
Current systems do support this, but… Interference still exists inside the system software – Inherent feature of commodity systems Cores Socket 1 Memory 1 2 3 4 Cores Socket 2 5 6 7 8 Memory HPC Partition Commodity Partition Hardware Partitioning Current approaches emphasize hardware space sharing
4
HPC vs. Commodity Systems Commodity systems have fundamentally different focus than HPC systems – Amdahl’s vs. Gustafson’s laws – Commodity: Optimized for common case HPC: Common case is not good enough – At large (tightly coupled) scales, percentiles lose meaning – Collective operations must wait for slowest node – 1% of nodes can make 99% suffer – HPC systems must optimize outliers (worst case)
5
Multi-stack Approach Dynamic Resource Partitions – Runtime segmentation of underlying hardware resources – Assigned to specific workloads Dynamic Software Isolation – Prevent interference from other workloads – Execute on separate system software stacks – Remove cross stack dependencies Implementation – Independent system software running on isolated resources
6
Least Isolatable Units Independently managed sets of isolated HW resources Our Approach: Decompose system into sets of isolatable components – Independent resources that do not interfere with other components Workloads execute on dedicated collections of LIUs – Units of allocation – CPU, memory, devices – Each are managed by independent system software stacks
7
Linux Memory Management Demand Paging – Primary goal is to optimize memory utilization – not performance – Reduce overhead of common application behavior (fork/exec) – Support many concurrent processes Large Pages – Integrated with overall demand paging architecture Implications for HPC – Insufficient resource isolation – System noise – Linux large page solutions contribute to these problems IPDPS 2014 Brian Kocoloski and Jack Lange, HPMMAP: Lightweight Memory Management for Commodity Operating Systems
8
Transparent Huge Pages Transparent Huge Pages (THP) – Fully automatic large page mechanism – no system administration or application cooperation – (1) Page fault handler uses large pages when possible – (2) khugepaged khugepaged – Background kernel thread – Periodically allocates a large page – “Merges” large page into address space of any process requesting THP support – Requires global page table lock – Driven by OS heuristics – no knowledge of application workload
9
Transparent Huge Pages Large page faults green, small faults delayed by merges blue Generally periodic, but not synchronized Variability increases dramatically under additional load
10
HugeTLBfs – RAM-based filesystem supporting large page allocation – Requires pre-allocated memory pools reserved by system administrator – Access generally managed through libhugetlbfs Limitations – Cannot back process stacks and other special regions – VMA permission/alignment constraints – Highly susceptible to overhead from system load
11
HugeTLBfs Overhead of small page faults increases substantially Due to memory exhaustion HugeTLBfs memory is removed from pools available to small page fault handler
12
HPMMAP Overview High Performance Memory Mapping and Allocation Platform – Lightweight memory management for unmodified Linux applications HPMMAP borrows from the Kitten LWK to impose isolated virtual and physical memory management layers Provide lightweight versions of memory management system calls Utilize Linux memory offlining to completely manage large contiguous regions Memory available in no less than 128 MB regions
13
HPMMAP Application Integration
14
Results
15
Evaluation – Multi-Node Scaling Sandia cluster (8 nodes, 1Gb Ethernet) – One co-located 4-core parallel kernel build per node No over-committed cores 32 rank improvement: 12% for HPCCG, 9% for miniFE, 2% for LAMMPS miniFE Network overhead past 4 cores Single node variability translates into worse scaling (3% improvement in single node experiment)
16
HPC in the cloud Clouds are starting to look like supercomputers… But we’re not there yet – Noise issues – Poor isolation – Resource contention – Lack of control over topology Very bad for tightly coupled parallel apps – Require specialized environments that solve these problems Approaching convergence – Vision: Dynamically partition cloud resources into HPC and commodity zones
17
Multi-stack Clouds With Jiannan Ouyang and Brian Kocoloski Virtualization overhead is not due to hardware costs – Results from underlying Host OS/VMM architectures and policies – Susceptible to performance overhead and Interference Goal to provide isolated HPC VMs on commodity systems – Each zone optimized for the target applications Hardware Kitten (Lightweight Kernel) Isolated VM Linux Commodity VM(s) Palacios VMMKVM
18
Multi-OS Architecture Goals: – Fully isolated and independent operation – OS Bypass communication – No cross kernel dependencies Needed Modifications: – Boot process that initializes subset of offline resources – Dynamic resource (re)assignment to the Kitten LWK – Cross stack shared memory communication – Block Driver Interface
19
Isolatable Hardware We view system resources as a collection of Isolatable Units – In terms of both Performance and Management Some hardware makes this easy – PCI (w/MSI, MSI-X) – APIC Some hardware makes this difficult – SATA – IO-APIC – IOMMU Some hardware makes this impossible – Legacy IDE – PCI (w/ Legacy PCI-INTx IRQs) Some hardware cannot be completely isolated – SRIOV PCI devices – HyperThreaded CPU cores
20
Cores Socket 1 Memory 1 2 3 4 Cores Socket 2 5 6 7 8 Memory Linux Offline Kitten NIC Infiniband SATA PCI
21
Cores Socket 1 Memory 1 2 3 4 Cores Socket 2 5 6 7 8 Memory Linux Offline Kitten NIC Infiniband SATA PCI
22
Cores Socket 1 Memory 1 2 3 4 Cores Socket 2 5 6 7 8 Memory Linux Offline Kitten NIC Infiniband SATA PCI
23
Cores Socket 1 Memory 1 2 3 4 Cores Socket 2 5 6 7 8 Memory Linux Offline Kitten NIC Infiniband SATA PCI
24
Cores Socket 1 Memory 1 2 3 4 Cores Socket 2 5 6 7 8 Memory Linux Offline Kitten NIC Infiniband SATA PCI
25
Cores Socket 1 Memory 1 2 3 4 Cores Socket 2 5 6 7 8 Memory Linux Offline Kitten NIC Infiniband SATA PCI
26
Cores Socket 1 Memory 1 2 3 4 Cores Socket 2 5 6 7 8 Memory Linux Offline Kitten NIC Infiniband SATA PCI
27
Cores Socket 1 Memory 1 2 3 4 Cores Socket 2 5 6 7 8 Memory Linux Offline Kitten NIC Infiniband SATA PCI
28
Multi-stack Architecture Allow multiple dynamically created enclaves – Based on runtime isolation requirements – Provides flexibility of fully independent OS/Rs Isolated Performance and resource management Linux Hardware Kitten (1) HPC VM Commodity VM(s) Palacios VMM KVM Kitten (2) HPC App Linux Hardware Palacio VMM Kitten LWK (Lightweight Kernel) HPC Application Commodity Application(s)
29
Performance Evaluation 8 Node Infiniband Cluster – Space shared between commodity and HPC workloads Commodity: Hadoop HPC: HPCCG – Infiniband passthrough for HPC VM – 1Gb Ethernet Passthrough for Commodity VM Compared Multi-stack (Kitten + Palacios) vs. full Linux environment (KVM) – 10 Experiment runs for each configuration CAVEAT: VM disks were all accessed from Commodity partition – Suffers significant interference (Current work) Linux + KVM Multi-stack (Kitten + Palacios) Mean Runtime (s) 55.75452.36 Std Dev 2.2314330220.386551707
30
Conclusion Commodity systems are not designed to support HPC workloads – Different requirements and behaviors than commodity applications A multi stack approach can provide HPC environments in commodity systems – HPC requirements can be met without separate physical systems – HPC and commodity workloads can dynamically share resources – Isolated system software environments are necessary
31
Thank you Jack Lange Assistant Professor University of Pittsburgh jacklange@cs.pitt.edu http://www.cs.pitt.edu/~jacklange
32
Multi-stack Operating Systems Future Exascale Systems are moving towards in situ organization Applications traditionally have utilized their own platforms Visualization, storage, analysis, etc Everything must now collapse onto a single platform
33
Performance Comparison Linux Memory ManagementLightweight Memory Management Occasional Outliers (Large page coalescing) Lowlevel noise
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.