User-Level Process towards Exascale Systems Akio Shimada [1], Atsushi Hori [1], Yutaka Ishikawa [1], Pavan Balaji [2] [1] RIKEN AICS, [2] Argonne National.

Slides:



Advertisements
Similar presentations
1 Overview Assignment 10: hints  Deadlocks & Scheduling Assignment 9: solution  Scheduling.
Advertisements

Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Day 10 Threads. Threads and Processes  Process is seen as two entities Unit of resource allocation (process or task) Unit of dispatch or scheduling (thread.
Chapter 4: Threads. Overview Multithreading Models Threading Issues Pthreads Windows XP Threads.
Chapter 5 Processes and Threads Copyright © 2008.
1 Threads, SMP, and Microkernels Chapter 4. 2 Process: Some Info. Motivation for threads! Two fundamental aspects of a “process”: Resource ownership Scheduling.
Processes CSCI 444/544 Operating Systems Fall 2008.
Scheduler Activations Effective Kernel Support for the User-Level Management of Parallelism.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Based on Silberschatz, Galvin and Gagne  2009 Threads Definition and motivation Multithreading Models Threading Issues Examples.
3.5 Interprocess Communication
Threads CSCI 444/544 Operating Systems Fall 2008.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
Real-Time Kernels and Operating Systems. Operating System: Software that coordinates multiple tasks in processor, including peripheral interfacing Types.
 2004 Deitel & Associates, Inc. All rights reserved. Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for Threads.
Threads. Processes and Threads  Two characteristics of “processes” as considered so far: Unit of resource allocation Unit of dispatch  Characteristics.
Process Concept An operating system executes a variety of programs
A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.
1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.
Threads Chapter 4. Modern Process & Thread –Process is an infrastructure in which execution takes place  (address space + resources) –Thread is a program.
ThreadsThreads operating systems. ThreadsThreads A Thread, or thread of execution, is the sequence of instructions being executed. A process may have.
1 MPI-2 and Threads. 2 What are Threads? l Executing program (process) is defined by »Address space »Program Counter l Threads are multiple program counters.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Processes Part I Processes & Threads* *Referred to slides by Dr. Sanjeev Setia at George Mason University Chapter 3.
Operating Systems CSE 411 CPU Management Sept Lecture 11 Instructor: Bhuvan Urgaonkar.
AICS Café – 2013/01/18 AICS System Software team Akio SHIMADA.
Operating System Principles Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information Engineering Da-Yeh.
Multithreading Allows application to split itself into multiple “threads” of execution (“threads of execution”). OS support for creating threads, terminating.
Process Management. Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Operating System Concepts Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information Engineering Da-Yeh University.
Chapter 2 (PART 1) Light-Weight Process (Threads) Department of Computer Science Southern Illinois University Edwardsville Summer, 2004 Dr. Hiroshi Fujinoki.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Thread Scheduling.
 2004 Deitel & Associates, Inc. All rights reserved. 1 Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for.
Copyright ©: University of Illinois CS 241 Staff1 Threads Systems Concepts.
Source: Operating System Concepts by Silberschatz, Galvin and Gagne.
CS333 Intro to Operating Systems Jonathan Walpole.
Chapter 4: Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th edition, Jan 23, 2005 Chapter 4: Threads Overview Multithreading.
Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.
Chapter 4: Threads. 2 Overview Multithreading Models Threading Issues Pthreads Windows XP Threads Linux Threads.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
Multithreaded Programing. Outline Overview of threads Threads Multithreaded Models  Many-to-One  One-to-One  Many-to-Many Thread Libraries  Pthread.
Chapter 4: Multithreaded Programming. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts What is Thread “Thread is a part of a program.
Operating Systems CSE 411 CPU Management Sept Lecture 10 Instructor: Bhuvan Urgaonkar.
Department of Computer Science and Software Engineering
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
1 OS Review Processes and Threads Chi Zhang
Threads-Process Interaction. CONTENTS  Threads  Process interaction.
Threads. Thread A basic unit of CPU utilization. An Abstract data type representing an independent flow of control within a process A traditional (or.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
ITEC 502 컴퓨터 시스템 및 실습 Chapter 2-2: Threads Mi-Jung Choi DPNM Lab. Dept. of CSE, POSTECH.
1 Threads, SMP, and Microkernels Chapter 4. 2 Process Resource ownership - process includes a virtual address space to hold the process image Scheduling/execution-
Processes Chapter 3. Processes in Distributed Systems Processes and threads –Introduction to threads –Distinction between threads and processes Threads.
Advanced Operating Systems CS6025 Spring 2016 Processes and Threads (Chapter 2)
Embedded Real-Time Systems
1 Chapter 5: Threads Overview Multithreading Models & Issues Read Chapter 5 pages
Processes and threads.
Threads & multithreading
Operating System Concepts
More examples How many processes does this piece of code create?
Lecture 10: Threads Implementation
Mid Term review CSC345.
Prof. Leonardo Mostarda University of Camerino
Lecture 6: Multiprogramming and Context Switching
Chapter 3: Processes.
Lecture 10: Threads Implementation
CS Introduction to Operating Systems
Presentation transcript:

User-Level Process towards Exascale Systems Akio Shimada [1], Atsushi Hori [1], Yutaka Ishikawa [1], Pavan Balaji [2] [1] RIKEN AICS, [2] Argonne National Laboratory

Background MPI processes running on a HPC cluster communicate with each other to exchange the data for parallel computation – An MPI process must wait for a completion of a communication Latency hiding can be considered as an important issue towards Exascale systems – Network system of a HPC cluster will be larger

Methods for Latency Hiding Non-blocking communication – Overlapping communication and computation Oversubscription – Binding multiple processes to one CPU core – Switching process when a process is blocked to wait for a completion of a communication

Problem Process context switch is slow – The overhead of process context spoils the benefit of the process oversubscription in some cases [ Lancu et al. IPDPS 2010 ] The overhead of jumping into the kernel context The overhead of the address space switching

Conventional Approach The oversubscription using user-level thread (e.g. FG-MPI) – Invoking multiple user-level threads within a process – Assigning a role of an MPI process to a user-level thread Pros and cons – Pros Fast context switch – The context switch between user-level threads can be conducted in the user-space – The context switch between user-level threads does not require address space switching – Cons Modification to the application is required – Program code (text) and data (data, bss and heap) are shared among user-level threads playing a role of an MPI process

Our Solution User-level process (ULP) – ULP is a “process”, which can be schedules in the user- space The ULP has the beneficial features of the user-level thread The ULP has its own program code and data. (Therefore, we equate the ULP with “process”.) – Capability of ULP The ULP enables the low-overhead process oversubscription Modification to the application is not required Kernel-level ProcessUser-level ThreadUser-level Process Context switchSlowFast Modification to the application Not requiredRequiredNot required

Overview of User-level Process Task Scheduler (Kernel-space) data bss text data heap data bss text heap data bss text heap Task Scheduler (User-space) data bss text heap data bss text heap data bss text heap Kernel-level Process User-level Process User-level Process User-level Process Kernel-level Thread Kernel-level Thread Kernel-level Thread User-level Thread User-level Thread User-level Thread Execution Context C CPU Core (a) Kernel-level Process Kernel-level Process (b) User-level Process (c) Kernel-level Thread (d) User-level Thread Kernel-level Process Kernel-level Process stack bss heap text data bss heap text stack Address Space Boundary Task Scheduler (User-space) C C CC Task Scheduler (Kernel-space) The ULP can be scheduled in the user-space – The low-overhead oversubscription can be achieved by avoiding the overhead of the process context switch The ULP has its own program code and data – Modification to the application is not required

Address Space Design TEXT DATA&BSS HEAP STACK KERNEL ULP 0 Address low high TEXT DATA&BSS HEAP STACK KERNEL ULP 1 ULP 2 TEXT DATA&BSS HEAP KERNEL STACK 1 STACK 0 STACK N-1 ULP N-1 STACK 2 Process User-level Thread User-level Process

Context Switch text data & bss heap stack Partition for ULP 0 Partition for ULP 1 registers text data & bss heap stack registers CPU core ① save context of user-level process 0 ② load context of user-level process 1 Low High Address Context switch from ULP 0 to ULP 1 Segment registers must be considered on x86_64 architectures – Segment registers are not accessible from user-space – The fs register is used for implementing Thread Local Storage (TLS) – Thread safe functions must be build without using TLS

ULP API int pvas_ulp_create(int *pvd) – pvas_ulp_create creates address space for ULPs int pvas_ulp_destroy(int pvd) – pvas_ulp_destroy destroys a created address space int pvas_ulp_spawn(int pvd, int pvid, char *filename, char **argv, char **environ) – pvas_ulp_spawn spawns kernel-level process with a ULP int pvas_ulp_exec(int pvid, char *filename, char **argv, char **environ) – pvas_ulp_exec creats and executes a new ULP int pvas_ulp_switch(int pvid) – pvas_ulp_switch conducts context from the current ULP to the indicated ULP

Preliminary Evaluation (context switch performance) Benchmark – Invoking multiple parallel processes on a single CPU core – A parallel process may be a kernel-level process or a kernel-level thread or a user-level thread or a user-level process – Measuring a time elapsed until all parallel process performs context switch 1000 times The performance of the ULP is competitive with that of the user-level thread Environment CPU: Intel Xeon X GHz OS : Linux el6 for x86_64 Lower is better

Summary and Future Work Summary – The ULP enables the low-overhead oversubscription by avoiding the overhead of the process context switch – The oversubscription using ULP does not require any modification to the application Future work – Future work is to embed the capability of the ULP in the MPI runtimes and evaluate it