Sharing Memory: A Kernel Approach AA meeting, March ‘09 High Performance Computing for High Energy Physics Vincenzo Innocente July 20, 2018 V.I. --

Slides:



Advertisements
Similar presentations
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
Advertisements

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Chapter 3 Operating Systems Introduction to CS 1 st Semester, 2015 Sanghyun Park.
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
Robert Fourer, Jun Ma, Kipp Martin Copyright 2006 An Enterprise Computational System Built on the Optimization Services (OS) Framework and Standards Jun.
Threads, Thread management & Resource Management.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
October 15, 2015V.I. -- MultiCore R&D1 The challenge of adapting HEP physics software applications to run on many-core cpus CHEP, March ’09 Vincenzo Innocente.
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Background: Operating Systems Brad Karp UCL Computer Science CS GZ03 / M th November, 2008.
Memory. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Main Memory. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The.
WLCG Overview Board, September 3 rd 2010 P. Mato, P.Buncic Use of multi-core and virtualization technologies.
Full and Para Virtualization
Threads, Thread management & Resource Management.
Martin Kruliš by Martin Kruliš (v1.1)1.
User Mode Linux (UML): An overview and experiences. Matthew Grove SLUG Tech Talk Red Hat Europe 12 th January 2007.
Hyukjin Kwon ESLAB SKKU Increasing memory density by using KSM.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
July 2, 2016V.I. -- MultiCore R&D1 The challenge of adapting HEP physics software applications to run on many-core cpus CHEP, March ’09 Vincenzo Innocente.
Threads prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University 1July 2016Processes.
The Docker Container Approach to Build Scalable and Performance Testing Environment Pankaj Rodge, VMware.
Compute and Storage For the Farm at Jlab
Introduction to threads
CS161 – Design and Architecture of Computer
Ran Liu (Fudan Univ. Shanghai Jiaotong Univ.)
Operating System Overview
Virtualization.
NFV Compute Acceleration APIs and Evaluation
Introduction to Operating Systems
OPERATING SYSTEM CONCEPT AND PRACTISE
Chapter 4: Threads.
Processes and threads.
Memory COMPUTER ARCHITECTURE
SuperB and its computing requirements
Distributed File Systems
CS 6560: Operating Systems Design
Multicore Computing in ATLAS
Chapter 2: System Structures
Task Scheduling for Multicore CPUs and NUMA Systems
Vincenzo Innocente CERN
MOM + oVirt: Nurturing our Virtual Machines
XenFS Sharing data in a virtualised environment
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
Chapter 4: Threads.
Introduction to Operating Systems
Operating System Concepts
Chapter 4 Multithreading programming
Chapter 4: Threads.
Chapter 4: Threads.
Virtualization Techniques
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 8 11/24/2018.
Outline Midterm results summary Distributed file systems – continued
Modified by H. Schulzrinne 02/15/10 Chapter 4: Threads.
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 9 12/1/2018.
A Survey on Virtualization Technologies
Chapter 8: Memory management
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
(A Research Proposal for Optimizing DBMS on CMP)
Multithreaded Programming
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 9 4/5/2019.
Lecture 7: Flexible Address Translation
OPERATING SYSTEMS MEMORY MANAGEMENT BY DR.V.R.ELANGOVAN.
Lecture Topics: 11/1 Hand back midterms
Chapter 3: Process Management
Presentation transcript:

Sharing Memory: A Kernel Approach AA meeting, March ‘09 High Performance Computing for High Energy Physics Vincenzo Innocente July 20, 2018 V.I. -- MultiCore R&D

Bringing IA Programmability and Parallelism to High Performance & Throughput Computing Cache Special Function & I/O … IA++ Highly parallel, IA programmable architecture in development Ease of scaling for software ecosystem Array of enhanced IA cores New Cache Architecture New Vector Processing Unit Scalable to TFLOPS performance Future options subject to change without notice.

How to share common data between different process? Event parallelism Opportunity: Reconstruction Memory-Footprint shows large condition data How to share common data between different process? multi-process vs multi-threaded Read-only: Copy-on-write, Shared Libraries Read-write: Shared Memory or files

Experience and requirements Complex and dispersed software Difficult to manage/share/tune resources (memory, I/O) w/o support from OS and compiler Coding and maintaining thread-safe software at user-level is difficult Need automatic tools to identify code to be made thread-aware Geant4: 10K lines modified! Not enough, many hidden (optimization) details such as state-caches Multi process seems more promising ATLAS: fork() (exploit copy-on-write), shmem (needs library support) LHCb: python Proof-lite Other limitations are at the door (I/O, communication, memory) Proof: client-server communication overhead in a single box Proof-lite: I/O bound >2 processes per disk Online (Atlas, CMS) limit in in/out-bound connections to one box

Exploit Copy on Write Modern OS share read-only pages among processes dynamically A memory page is copied and made private to a process only when modified Prototype in Atlas and LHCb (the latter using WP8 personnel) Encouraging results as memory sharing is concerned (50% shared) Concerns about I/O (need to merge output from multiple processes) Memory (ATLAS) One process: 700MB VMem and 420MB RSS COW (before) evt 0: private: 004 MB | shared: 310 MB (before) evt 1: private: 235 MB | shared: 265 MB . . . (before) evt50: private: 250 MB | shared: 263 MB

Kernel Shared Memory KSM is a linux driver that allows dynamically sharing identical memory pages between one or more processes. It has been developed as a backend of KVM to help memory sharing between virtual machines running on the same host. User code registers to KSM memory pages that are potentially sharable KSM keeps them in a hashed data structure Hashing based on page content KSM scans the registered pages and “merge” those identical Merged pages are marked read-only so standard COW will split them again if modified

Use of KSM All interactions with KSM deamon goes through the pseudo-file /dev/ksm Registering a page requires writing in /dev/ksm KSM registration and, even more, KSM check and merge algorithm are not cheep: KSM can easily eat 100% of CPU Pages registered shall be large Best is to register only pages worth sharing De-registration is not yet possible Do not use KSM in case of memory churn! Let KSM check for identical pages only if many potential merges are possible Two strategies possible: Control KSM at application level Use a “system-wide” watch-dog

Testing “KSM” Ad-hoc memory-block registration is intrusive Requires a specific allocator Effort similar to use explicit shared memory Generic registration can be performed using a “Malloc hook” It happens that we all have malloc-hooks in hand (usually LD_PRELOADed) Test performed “retrofitting” TCMalloc with KSM Just one single line of code added! TCMalloc NEVER redeem memory: perfect for KSM!

Testing “KSM” CMS ATLAS reconstruction of real data (Cosmics with full detector) No code change (use patched TCMalloc) 400MB private data; 250MB shared data; 130MB shared code Prototype of a simple KSM watchdog script ATLAS In a Reconstruction job of 1.6GB VM, up to 1GB can be shared with KSM. KSM controlled from within ATHENA

Future Work KSM, as any other shared memory approach, requires a job submission scheme that can handle multiple processes/threads to run on the same box. General critical item for effective multi-core exploitation In Addition KSM need to be controlled either by the application or by a watchdog Red-Hat is developing “something” Experiment may integrate KSM control in their framework (at application or workflow level) It may in any case be useful to separate the memory used for long-term constant data and the one used for tactical allocation

Summary KSM has demonstrated to be the most powerful and less intrusive mechanism to share read-only data among “similar” processes New, faster, more secure version already available (arrived tonight!) It shall be easily deployed with SL5 Requires malloc hooks and user access to /dev/ksm The most promising deployment is as a fully kernel-internal mechanism Possibly available in future RHEL6 Difficult to predict if back-portable to SL5 KSM will only reduce/delay the real problem: Experiments shall work hard to reduce memory footprint and to better control memory allocation and locality