MP3: VIRTUAL MEMORY PAGE FAULT MEASUREMENT Keun Soo Yim University of Illinois at Urbana-Champaign Department of Computer Science CS423 – Fall 2011.

Slides:



Advertisements
Similar presentations
Device Drivers. Linux Device Drivers Linux supports three types of hardware device: character, block and network –character devices: R/W without buffering.
Advertisements

RT_FIFO, Device driver.
CSCC69: Operating Systems
Memory management.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Computer Systems/Operating Systems - Class 8
A. Frank - P. Weisberg Operating Systems Process Scheduling and Switching.
Processes CSCI 444/544 Operating Systems Fall 2008.
1 CS 333 Introduction to Operating Systems Class 2 – OS-Related Hardware & Software The Process Concept Jonathan Walpole Computer Science Portland State.
Introduction to Kernel
Process in Unix, Linux and Windows CS-3013 C-term Processes in Unix, Linux, and Windows CS-3013 Operating Systems (Slides include materials from.
CS-502 Fall 2006Processes in Unix, Linux, & Windows 1 Processes in Unix, Linux, and Windows CS502 Operating Systems.
Computer Organization and Architecture
1 Process Description and Control Chapter 3 = Why process? = What is a process? = How to represent processes? = How to control processes?
Christo Wilson Project 3: Virtual Memory in Pintos
1 I/O Management in Representative Operating Systems.
Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.
Processes in Unix, Linux, and Windows CS-502 Fall Processes in Unix, Linux, and Windows CS502 Operating Systems (Slides include materials from Operating.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Operating System Program 5 I/O System DMA Device Driver.
Process in Unix, Linux, and Windows CS-3013 A-term Processes in Unix, Linux, and Windows CS-3013 Operating Systems (Slides include materials from.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Introduction to Processes CS Intoduction to Operating Systems.
Processes and Threads CS550 Operating Systems. Processes and Threads These exist only at execution time They have fast state changes -> in memory and.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Multiprogramming. Readings r Silberschatz, Galvin, Gagne, “Operating System Concepts”, 8 th edition: Chapter 3.1, 3.2.
Threads G.Anuradha (Reference : William Stallings)
Source: Operating System Concepts by Silberschatz, Galvin and Gagne.
Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
Processes CS 6560: Operating Systems Design. 2 Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode.
Chapter 4: Multithreaded Programming. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts What is Thread “Thread is a part of a program.
Chapter 4: Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th edition, Jan 23, 2005 Chapter 4: Threads Overview Multithreading.
Operating Systems CSE 411 CPU Management Sept Lecture 10 Instructor: Bhuvan Urgaonkar.
Interfacing Device Drivers with the Kernel
Processes, Threads, and Process States. Programs and Processes  Program: an executable file (before/after compilation)  Process: an instance of a program.
Processes and Virtual Memory
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
MP2: RATE-MONOTONIC CPU SCHEDULING Based on slides by Gourav Khaneja, Raoul Rivas and Keun Yim University of Illinois at Urbana-Champaign Department of.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
1 Structure of Processes Chapter 6 Process State and Transition Data Structure for Process Layout of System Memory THE DESIGN OF THE UNIX OPERATING SYSTEM.
COMP 3438 – Part I - Lecture 5 Character Device Drivers
MP2: Rate-Monotonic CPU Scheduling
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Advanced Operating Systems CS6025 Spring 2016 Processes and Threads (Chapter 2)
Processes and Threads Chapter 3 and 4 Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community College,
Input/Output (I/O) Important OS function – control I/O
Introduction to Kernel
OPERATING SYSTEM CONCEPT AND PRACTISE
Processes and threads.
Process concept.
Process Management Process Concept Why only the global variables?
OPERATING SYSTEMS CS3502 Fall 2017
Linux Kernel Driver.
Intro to Processes CSSE 332 Operating Systems
Structure of Processes
Making Virtual Memory Real: The Linux-x86-64 way
Page Replacement.
Processes in Unix, Linux, and Windows
System Structure and Process Model
Process & its States Lecture 5.
Operating Systems.
Threads Chapter 4.
CS510 Operating System Foundations
Buddy Allocation CS 161: Lecture 5 2/11/19.
Presentation transcript:

MP3: VIRTUAL MEMORY PAGE FAULT MEASUREMENT Keun Soo Yim University of Illinois at Urbana-Champaign Department of Computer Science CS423 – Fall 2011

G OAL  Understand the Linux virtual to physical page mapping and page fault rate.  Design a lightweight tool that can profile page fault rate.  Implement the profiler tool as a Linux kernel module.  Learn how to use the kernel-level APIs for workqueue, character device driver, vmalloc, and mmap.  Test the kernel-level profiler by using a given user-level benchmark program.  Analyze, plot, and document the profiled data as a function of the workload characteristics. CS423 MP3 2 A Linux kernel module to profile VM system events

I NTRODUCTION  Due to growing perf. gap btwn. memory and disk, management efficiency of OS virtual memory (VM) becomes more important Inefficient replacement of pages can seriously harm the response time of user-level programs  To optimize VM system, it is necessary to understand the characteristics of current VM system under various workloads. CS423 MP3 3

M ETRICS  Major and minor page fault counts provide: Major page fault is a fault handled by using a disk I/O operation (e.g., memory mapped file or page replacement causing a page swapping) Minor page fault is a fault handled without using a disk I/O operation (e.g., allocated by malloc()). Plotted as a function of allocated memory size shows the thrashing effect.  CPU utilization provides: Plotted as a function of the degree of multiprogramming shows the correlation between workload size and system utilization. CS423 MP3 4

M EASUREMENT C HALLENGE  To accurately measure such metrics, many profiling operations are needed in a short time interval.  Because such data are available only in the OS kernel address space, this would cause a non-negligible performance overhead Switching contexts between user and kernel and copying data between these two address spaces CS423 MP3 5

A S OLUTION  This measurement overhead problem can be addressed by using mmap(): Creating a shared buffer between the OS kernel and the user-level process. By mapping a set of physical pages allocated in the kernel space to the virtual address space of the user-level process, The user-level process can access the data stored in the buffer without any extra overhead other than accessing the memory. CS423 MP3 6

O VERVIEW CS423 MP3 7 Linux Kernel MP3 Profiler Kernel Module Work Process 1 (100MB) Monitor Process Work Process 2 (10MB) Work Process 3 (1GB) Disk Post-Mortem Analysis  A kernel module to profile page fault counts and CPU utilization of registered processes.

I NTERFACE OF K ERNEL M ODULE  Three types interfaces between the OS kernel module and user processes: a Proc file a character device driver a shared memory area CS423 MP3 8

P ROC F ILE S YSTEM  Proc filesystem entry (/proc/mp3/status) Register: Application to notify its intent to monitor its page fault rate and utilization. ○ ‘R ’ Deregister: Application to notify that the application has finished using the profiler. ○ ‘U ’ Read Registered Task List: To query which applications are registered. ○ Return a list with the PID of each application CS423 MP3 9

C HAR D EVICE & S HARED M EM  A character device driver is used as a control interface of the shared memory Map Shared Memory (i.e., mmap()): To map the profiler buffer memory allocated in the kernel address space to the virtual address space of a requesting user-level process  Shared memory Normal memory access: Used to deliver profiled data from the kernel to user processes CS423 MP3 10

S YNTHETIC W ORKLOAD  Work program (given for case studies) A single threaded user-level application with three parameters: memory size, locality pattern, and memory access count per iteration ○ Allocates a request size of virtual memory space (e.g., up to 1GB) ○ Accesses them with a certain locality pattern (i.e., random or temporal locality) for a requested number of times ○ The access step is repeated for 20 times. Multiple instances of this program can be created (i.e., forked) simultaneously. CS423 MP3 11

M ONITORING P ROGRAM  Monitor application is also given Requests the kernel module to map the kernel- level profiler buffer to its user-level virtual address space (i.e., using mmap()). ○ This request is sent by using the character device driver created by the kernel module. The application reads profiling values (i.e., major and minor page fault counts and utilization of all registered processes). By using a pipe, the profiled data is stored in a regular file. ○ So that these data are plotted and analyzed later. CS423 MP3 12

D ESIGN CS423 MP3 13 Kernel Space Proc FS Write Op. B3 A1 Linked List for Reg. Tasks Control a Work Queue mmap() Process Control Block Work Process A5 B1 Monitor Process Monitor Work Queue Char. Device Driver Interface B4. Close A1. Register A5. UnregisterB1. Open Profiler buffer A2 A3 A4 B2 B4 A2. Allocate Memory Block A3. Memory Accesses A4. Free Memory Blocks B2. mmap()B3. Read Profiled Data Module Init/Exit Allocate or free

W ORK Q UEUE  Work queue The simplest to use among all bottom-halves (e.g., thread/sleep, tasklet). Only bottom-half mechanism runs in process context.  Work queues run in process context. Work queues can sleep, invoke the scheduler, and so on. The kernel schedules bottom halves running in work queues. The other bottom-halves run in interrupt context. ○ Interrupt context cannot perform blocking operation. e.g., semaphore, copying to/from user memory, or non-atomically allocating memory. CS423 MP3 14 Reference:

W ORK Q UEUE  A default set of kernel threads handles WQs One of these default kernel threads runs per processor (named events/n - n is processor ID). The work queue threads execute user’s bottom half as a specific function, called a work queue handler.  It is possible to run work queues in users’ own kernel thread. Whenever your bottom half is activated, your unique kernel thread, wakes up and handles it. Having a unique work queue thread is useful only in certain performance-critical situations. CS423 MP3 15

W ORK Q UEUE I NTERFACE  Header #include  Creates a work queue structure void my_wq_handler(void *arg); Static: DECLARE_WORK(name, my_wq_handler, data) This macro creates and inits a struct work_struct Dynamic: INIT_WORK(p, function, data) p is a pointer to a work_struct structure INIT_DELAYED_WORK(p, function) CS423 MP3 16

W ORK Q UEUE I NTERFACE  Schedule to run immediately int schedule_work(struct work_struct *work) Returns zero on error  Schedule to run after a delay int schedule_delayed_work(struct work_struct *work, unsigned long delay) Example, to run after at least 5 seconds, schedule_delayed_work(&my_work, 5*HZ) CS423 MP3 17

W ORK Q UEUE I NTERFACE  To wait on all work queue pending void flush_scheduled_work(void)  Cancel a delayed work int cancel_delayed_work(struct work_struct *work) CS423 MP3 18

W ORK Q UEUE I NTERFACE  When user’s own thread is used, struct workqueue_struct * create_workqueue(const char *name) int queue_work(struct workqueue_struct *wq, struct work_struct *work) int queue_delayed_work(struct workqueue_struct *wq, struct work_struct *work, unsigned long delay) void flush_workqueue(struct workqueue_struct *wq) CS423 MP3 19

C HARACTER D EVICE D RIVER  Initialize data structure void cdev_init(struct cdev *cdev, struct file_operations *fops); or struct cdev *my_cdev = cdev_alloc( ); my_cdev->ops = &my_fops;  Add to the kernel int cdev_add(struct cdev *dev, dev_t num, unsigned int count);  Delete from the kernel void cdev_del(struct cdev *dev); CS423 MP3 20

C HARACTER D EVICE D RIVER static int my_open(struct inode *inode, struct file *filp); static struct file_operations my_fops = {.open = my_open,.release = my_release,.mmap = my_mmap,.owner = THIS_MODULE, }; CS423 MP3 21

M EMORY M AP CS423 MP3 22 Profiler Buffer 3GB 4GB 0GB 3GB 4GB 0GB Virtual Addr. Physical Addr. vmalloc() kmalloc() “PG_reserved” Profiler Buffer Profiler Buffer

M EMORY M AP  Gets Page Frame Number pfn = vmalloc_to_pfn(virt_addr);  Maps a virtual page to a physical frame remap_pfn_range(vma, start, pfn, PAGE_SIZE, PAGE_SHARED); CS423 MP3 23

I NTERFACE FOR U SER P ROCESS  Character device file $ insmod mp3.ko $ cat /proc/devices $ mknod node c 0 CS423 MP3 24

I NTERFACE FOR U SER P ROCESS  Open and mmap requests (in monitor.c) if ((buf_fd=open(fname,O_RDWR|O_SYNC))<0) { printf("file open error. %s\n", fname); return NULL; } kadr = mmap(0, buf_len, PROT_READ | PROT_WRITE, MAP_SHARED, buf_fd, 0); munmap(kadr, buf_len); CS423 MP3 25

I NTERFACE FOR U SER P ROCESS $ cat /proc/ /maps r-xp : /root/mp3dev/monitor rw-p : /root/mp3dev/monitor 3183e e1f000 r-xp : /lib64/ld-2.14.so e f000 r--p 0001e000 08: /lib64/ld-2.14.so f rw-p 0001f000 08: /lib64/ld-2.14.so rw-p : f000 r-xp : /lib64/libc-2.14.so f f p 0018f000 08: /lib64/libc-2.14.so f r--p 0018f000 08: /lib64/libc-2.14.so rw-p : /lib64/libc-2.14.so a000 rw-p :00 0 7f9b65eda000-7f9b65f5a000 rw-s : /root/mp3dev/node 7f9b65f5a000-7f9b65f5d000 rw-p :00 0 7f9b65f f9b65f76000 rw-p :00 0 7fffc fffc16a4000 rw-p :00 0 [stack] 7fffc17ff000-7fffc r-xp :00 0 [vdso] ffffffffff ffffffffff r-xp :00 0 [vsyscall]  start-end perm offset major:minor inode image Start-end: The beginning andending virtual addresses for this memory area. Perm: a bit mask with the memroy are’s read, write, and execute permissions Offset: Where the memory area begins in the file Major/Minor: Majnor and minor numbers of the device holding the file (or partition) CS423 MP3 26

C ASE S TUDY 1  Thrashing and locality. Work process 1: 512MB Memory, Random Access, and 50,000 accesses per iteration Work process 2: 512MB Memory, Random Access, and 10,000 accesses per iteration $ nice./work 512 R & nice./work 512 R & … $./monitor > profile1.data  Plot a graph where x-axis is the time and y-axis is the accumulated page fault count of the two work processes (work processes 1 and 2).  Analyze the quantitative difference between graphs and discuss where such differences come from. CS423 MP3 27