Lightweight virtual system mechanism Gao feng

Slides:



Advertisements
Similar presentations
Lecture 101 Lecture 10: Kernel Modules and Device Drivers ECE 412: Microcomputer Laboratory.
Advertisements

Sogang University Advanced Operating Systems (Linux Device Drivers) Advanced Operating Systems (Linux Device Drivers) Sang Gue Oh, Ph.D.
PlanetLab: An Overlay Testbed for Broad-Coverage Services Bavier, Bowman, Chun, Culler, Peterson, Roscoe, Wawrzoniak Presented by Jason Waddle.
Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
AppSec USA 2014 Denver, Colorado Implications & Opportunities at the Bleeding Edge of DevOps Chris Swan, CTO
Chorus Vs Unix Operating Systems Overview Introduction Design Principles Programmer Interface User Interface Process Management Memory Management File.
PlanetLab Operating System support* *a work in progress.
Performance Evaluation of Open Virtual Routers M.Siraj Rathore
KVM and Container Performance and Isolation Deep Dive.
1 Case Study 1: UNIX and LINUX Chapter History of unix 10.2 Overview of unix 10.3 Processes in unix 10.4 Memory management in unix 10.5 Input/output.
Zap Steven Osman Dinesh Subhraveti Gong Su Jason Nieh A System for Migrating Computing Environments.
Linux Operating System
Operating Systems Sameer Mahajan. Overview Process management Interrupts Memory management File system Device drivers Networking (TCP/IP, UDP) Security.
Chapter 3 Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Chapter 8 Windows Outline Programming Windows 2000 System structure Processes and threads in Windows 2000 Memory management The Windows 2000 file.
26/4/2001VMware - HEPix - LAL 2001 Windows/Linux Coexistence : VMware Approach HEPix – LAL Apr Michel Jouvin
CS 6560 Operating System Design Lecture 13 Finish File Systems Block I/O Layer.
WING: A Consistent Computing Platform Yiwei Ci 22/02/2012
CSC 322 Operating Systems Concepts Lecture - 4: by Ahmed Mumtaz Mustehsan Special Thanks To: Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Group 1 Members: SMU CSE 8343 Wael Faheem Professor:Dr.M.KHALIL. Hazem Morsy Date: Poramate Ongsakorn Payal H Patel Samatha Devi Malka.
The Linux /proc Filesystem CSE8343 – Fall 2001 Group A1 – Alex MacFarlane, Garrick Williamson, Brad Crabtree.
1 Project 3: An Introduction to File Systems CS3430 Operating Systems University of Northern Iowa.
Recent advances in the Linux kernel resource management Kir Kolyshkin, OpenVZ
4P13 Week 1 Talking Points. Kernel Organization Basic kernel facilities: timer and system-clock handling, descriptor management, and process Management.
1 Introduction to Microsoft Windows 2000 Windows 2000 Overview Windows 2000 Architecture Overview Windows 2000 Directory Services Overview Logging On to.
Scott Ferguson Section 1
Processes CS 6560: Operating Systems Design. 2 Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode.
User-Space-to-Kernel Interface
CSC414 “Introduction to UNIX/ Linux” Lecture 2. Schedule 1. Introduction to Unix/ Linux 2. Kernel Structure and Device Drivers. 3. System and Storage.
CSE 466 – Fall Introduction - 1 User / Kernel Space Physical Memory mem mapped I/O kernel code user pages user code GPLR virtual kernel C
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Docker Overview Automating.
VM vs Container Xen, KVM, VMware, etc. Hardware emulation / paravirtualization Can run different OSs on the same box Dozens of instances OS sprawl problem.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 2.
OS Labs 2/25/08 Frans Kaashoek MIT
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 5.
Kernel Modules – Introduction CSC/ECE 573, Sections 001 Fall, 2012.
Intro To Virtualization Mohammed Morsi
Virtualization which isn't LXC (Linux Containers) Dobrica Pavlinušić DORS/CLUC, Zagreb,
VM vs Container Xen, KVM, VMware, etc. ● Hardware emulation / paravirtualization ● Can run different OSs on the same box ● Dozens of instances ● OS sprawl.
Introduction to Operating Systems Concepts
OpenShift & SELinux Dan Walsh Twitter: #rhatdan
Linux Containers and Docker
Virtualization Mini Summit, Austin 2008 mmm, tasty penguins...
Bash on Ubuntu on Windows
Operating Systems Review ENCE 360.
Application Sandboxes
Containers: The new network endpoint
Linux Containers Overview & Roadmap
Container-based Operating System Virtualization: A scalable, High-performance Alternative to Hypervisors Stephen Soltesz, Herbert Potzl, Marc E. Fiuczynski,
Case Study 1: UNIX and LINUX
Proc File System Sadi Evren SEKER.
CASE STUDY 1: Linux and Android
Chapter 5: Threads Overview Multithreading Models Threading Issues
Chapter 5: Threads Overview Multithreading Models Threading Issues
Containers and Virtualisation
Containers and Performance Co-Pilot Nathan Scott
Project 3: An Introduction to File Systems
Agenda Intro Why use containers at all? Linux Kernel: a pop of history
Chapter 5: Threads Overview Multithreading Models Threading Issues
OPERATING SYSTEMS DESIGN AND IMPLEMENTATION Third Edition ANDREW S
Microsoft Ignite NZ October 2016 SKYCITY, Auckland.
Chapter 2: The Linux System Part 2
Intro about Contanier and Docker Technology
Linux Architecture Overview.
Chapter 5: Threads Overview Multithreading Models Threading Issues
Process Description and Control in Unix
Process Description and Control in Unix
Chapter 1: Introduction CSS503 Systems Programming
Building, Debugging & Deploying Containerized
Presentation transcript:

Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com LXC(Linux Container) Lightweight virtual system mechanism Gao feng gaofeng@cn.fujitsu.com

Outline Introduction Comparison Problems Future work Namespace System API Libvirt LXC Comparison Problems Future work

Introduction Container: Operation System Level virtualization method for Linux Guest1 Guest2 Container Management Tools P1 P2 P1 P2 Relies on the kernel namespace to isolate the resources of the system. The Guest shares the same Kernel with Host The Kernel provides the System API/ABI for the userspace Userspace container Management Tools use the API to create namespace and These tools is used to manager the containers too(libvirt-lxc, lxc-tools) API/ABI Kernel Namespace Set 1 Namespace Set 2

Introduction Why Container Easy to set up Multi-Tenancy environment Better Performance Easy to set up Multi-Tenancy environment Kvm Container App Guest-OS Emulator-Lay App Host-OS Host-OS

Namespace Namespace isolates the resources of system, currently there are 6 kinds of namespaces in linux kernel. Mount namespace UTS namespace IPC namespace Net namespace Pid namespace User namespace

Mount Namespace Each mount namespace has its own filesystem layout. /proc/<p1>/mounts / /dev/sda1 /home /dev/sda2 /proc/<p2>/mounts / /dev/sda1 /home /dev/sda2 /proc/<p3>/mounts / /dev/sda3 /boot /dev/sda4 Isolates the file system layout, every mount namespace has its own file system layout.The tasks running in each mount namespace doing mount/unmount will not affect the file system layout of the other mount namespace. Mount namespace can provide an private mount points for processes. it’s just like chroot but better than chroot. Chroot can only change the root directory of process. /proc/<pid>/mounts is also virtualized to only show the mount points of mount namespace that the <pid> process belongs to. P1 P2 P3 Mount Namespace1 Mount Namespace2

UTS Namespace Every uts namespace has its own uts related information. ostype: Linux osrelease: 3.8.6 version: … hostname: uts1 domainname: uts1 ostype: Linux osrelease: 3.8.6 version: … hostname: uts2 domainname: uts2 Unalterable Every uts namespace has its own OS-type,OS-release,version,host name,domain name… the uname() systemcall will return different information in different uts namespaces. Every uts namespace has its own proc files. The processes can only see and change the proc files of uts namespace which these processes belong to. “Alterbale” File(s) can be changed. “Unalterable” can not be changed alterable

IPC Namespace IPC namespce isolates the interprocess communication resource(shared memory, semaphore, message queue) P1 P2 P3 P4 IPC namespace isolates the inter-process communication resources. The processes can communicate with each other through IPC only when they are in the same IPC namespace. IPC namespace1 IPC namespace2

Net Namespace Net namespace isolates the networking related resources Net devices: eth0 IP address: 1.1.1.1/24 Route Firewall rule Sockets Proc sysfs … Net Namespace2 Net devices: eth1 IP address: 2.2.2.2/24 Route Firewall rule Sockets Proc sysfs … Every net namespace has its own network devices, ip address, firewall rules, routing tables, sockets and so on The proc and sysfs interface of net namespace has been virtualized too.

PID namespace1 (Parent) PID namespace isolates the Process ID, implemented as a hierarchy. PID namespace1 (Parent) (Level 0) ls /proc 1 2 3 4 P1 P4 pid:1 pid:4 P2 P3 pid:2 pid:3 The Pid namespace isolates the Process ID Pid namespace is a hierarchy comprised of “Parent” and “Child”. The parent pidns can have many children pidns. Each Child pidns only has one parent pidns. In the creation process tasks are created. Each task is allocated one pid in one pid namespace. Furthermore, a task is also allocated to each pid in parent and ancestor pid namespaces. The Parent pid namespace can access this Task through the pid allocated to the parent pid namespace. Every pid namespace has a level Value - n, used for marking the level of each pid namespace. The level of init_pid_ns is 0,it’s children pid namespace’s level is 1,the level of each grandchild is 2. Because the processes in the child pid namespace will allocate a pid to the ancestor pid namespace,the kernel limits the max level to MAX_PID_NS_LEVEL(32) /proc/<pid> has also been virtualized. Please note that in the graphic we also have P2! What this ,means is that each Parent and Child can have its own PID – PID:1 for the Child1 and PID:2 for the Parent P3 is the same as P2, it also has two PID’s. PID:1 for the Child2 and PID:3 for the Parent. The Child Namespace2 has one task that is made up of the proc and the Parent PID Namespace has 4 tasks – Tasks 1, 2, 3, and 4 The Parent PID ns can access the Task P2 through PID2 PID Namespace2 (Child) (Level 1) pid:1 PID Namespace3 (Child) (Level 1) pid:1 ls /proc 1 ls /proc 1

User Namespace kuid/kgid: Original uid/gid, Global uid/gid: user id in user namespace, will be translated to kuid/kgid finally Only parent User NS has rights to set map Before I begin to explain the User Namespace, I would like to explain the KUID and KGID KUID: means the Kernel User ID. Before we have the User Namespace we us the UID. KGID: Is the same as KUID but it is the Kernel Group ID. UID: will only be used in the User Namespace User namespace has two private proc files /proc/pid/uid_map and gid_map. These two files can be used for setting the mapping table of the user namespace. uid_map and gid_map contain multiple lines, The format of each line is “user_uid kernel_uid count”. kuid: 2000-2004 kuid: 1000-1009 uid_map 10 2000 5 uid_map 0 1000 10 uid: 10-14 uid: 0-9 User namespace1 User namespace2

User Namespace Create and stat file in User namesapce root root #touch /file root #stat /file uid_map: 0 1000 10 User namespace File : “/file” Access: uid (0/root) We Have the User ID Map to which P1 belongs. UID map maps the Root User of the User Namespace to a Normal User. P1 is the Root User of userns1 which wants to create the H-File. When we write the H-file to the disk, the owner of this file will be mapped to the normal user(uid 1000). Finally in the disk, the owner of this file is the Normal user. P2 is in the same User NS. It wants to get information of the H-file. When the userid information is returned to the P2, the owner of this file will be mapped to the root user of user namespace1. Disk /file (kuid:1000)

System API/ABI Proc System Call /proc/<pid>/ns/ clone unshare setns In this Slide I will discuss the System API / ABI interface. The Namespace provides proc files to the Userspace In the System Call,we have three Namespaces related functions available: Clone, Unshare, and SETNS

Proc /proc/<pid>/ns/ipc: ipc namespace /proc/<pid>/ns/mnt: mount namespace /proc/<pid>/ns/net: net namespace /proc/<pid>/ns/pid: pid namespace /proc/<pid>/ns/uts: uts namespace /proc/<pid>/ns/user: user namespace If the proc file of two processes is the same, these two processes must be in the same namespace. In the Proc File we have 6 sub-files; The first sub-file is the IPC which stands for the IPC NS

System Call clone CLONE_NEWIPC,CLONE_NEWNET, 6 new flags: int clone(int (*fn)(void *), void *child_stack, int flags, void *arg, …); 6 new flags: CLONE_NEWIPC,CLONE_NEWNET, CLONE_NEWNS,CLONE_NEWPID, CLONE_NEWUTS,CLONE_NEWUSER Within the System Call parameter we have the “Flags”. The Namespace increases the parameter of the Flags of the System Call Clone which then adds 6 new Flags.

System Call clone create process2 and IPC namespace2 Mount1 Mount1 P1 clone(,, CLONE_NEWIPC,) IPC1 IPC2 (new created) Each Flag represents one type of Namespace. For example: P1 creates P2 through the System Call Clone with increased parameter Flag – Clone(Underscore _ )NewIPC. Furthermore, it will not only create a Task P2 but it will also create a new IPC NS(IPC2). And the P2 will be “run” in the IPC2. Others1 Others1

System Call unshare Namespace extends the system call unshare int unshare(int flags); Namespace extends the system call unshare too. User space can use unshare to create new namespace and the caller will run in this new created namespace.

System Call unshare create net namespace2 Mount1 Mount1 P1 P1 Net1 unshare(CLONE_NEWNET) Net1 Net2 (new created) Others1 Others1

System Call setns int setns(int fd, int nstype); setns is a new added system call for namespace. Process can use setns to set which namespace the process will belong to. @fd: file descriptor of namespace(/proc/<pid>/ns/*) @nstype: type of namespace.

System Call setns Change the PID namespace of P2 P1 P1 PID1 PID1 P2 P2 setns(open(/proc/p1/ns/pid,) , 0) PID2 PID2

Libvirt LXC Libvirt LXC: userspace container management tool, Implemented as one type of libvirt driver. Manage containers Create namespace Create private filesystem layout for container Create devices for container Resources controller by cgroup LIBVIRT LXC has 5 main functions which work together towards the preparation work for the Container.

Comparison The feature that host share the same kernel with guest makes container different from other virtualization method Container KVM performance Great Normal OS support Linux Only No Limit Security Completeness Low

/proc/meminfo, cpuinfo… Problems /proc/meminfo, cpuinfo… Kernel space (relate to cgroup) User space (poor efficiency) New namespace Audit (assign to user namespace?) Syslog (do we really need it?) We currently have no new flags available for API clone system. It’s a problem we will need to resolve when we try to add a new namespace. The meminfo,cpuinfo etc files are not completely virtualized.

Problems Bandwidth control Disk quota TC Qdisc Netfilter On host (How to handle setting nic to container?) On container (user can change it) Netfilter How to control Ingress bandwidth Disk quota Uid/Gid Quota (Many users ) Project Quota (xfs only)

Future Work Improve Libvirt LXC Unchanged systemd in Libvirt LXC Use interface of systemd to set cgroup Libvirt LXC based Docker Audit namespace

Thank you! Q&A