System Config IPC Iris Zhu iris.zhu@sun.com
Agenda IPC Overview System V IPC Posix IPC Tools for performance analysis System configuration Hands On Lab
Process overview Concept Execution of program Fundamental abstraction of unix system Basic unit Process Virtual memory environment Resource for Process Memory Open file list Thread
Conceptual view of a Process
Process virtual address space
IPC history Message passing Data synchronization Pipe,named pipe(FIFO) System V message queue Posix message queue(1003.1b-1993) RPC Data synchronization System V semaphore,System V shared memory Posix semaphore PIPE-ealiest IPC System V message queue /semaphore 20century 80s other types of thread lock/lock/mutex/condition variable
Inter Process Communication Function Sharing of data shared memory Exchange Information and data Pipe,FIFO,message queues Synchronization of access to shared resources semaphore,mutex,lock, Remote procedure call Sun RPC,Solaris Door
Unix Processes share data Posix message queue/semaphore shared memory at least kernel persist ,dependent on implementation.
IPC object persistence Process-persistent Pipe FIFO Kernel persistent System V Shared memory System V Message queue System V Semaphore Filesystem persistent
Pipe Special type of file that do not hold data but can be opened by 2 different processed so that data can be passed between them. API Int pipe(int fd[2]); Fd[0] – read Fd[1] – write //For example: Feature Half duplex. Atomic operation. Usable between processes with the same parent. Pipe rarely used in only 1 process 2 or more pipes used between parent/child process to exchange data. Mostly used in shell the reader will block if there were no data in the pipe
Named pipe - FIFO Provide a bidirectional communication path between processed on the same system. API Int mkfifo(const char *pathname,mode_t mode); fopen/fclose //For example: Features Half duplex Automatic block SIGPIPE is generated if a process write to a pipe and read is terminated Atomic operation Can be used between different processes.
System limitation -<limits.h> OPEN_MAX – by sysconf() PIPE_BUF -- pathconf() or fpathconf() in Posix standard
Other commands to get system variables limit(csh)/unlimit(sh/bash/ksh) Sysdef lists all hardware devices, as well as pseudo devices, system devices, loadable modules, and the values of selected kernel tunable parameters. For example,sysdef(1M) //For exercise: Getconf get configuration values For example,getconf(1) getconf OPEN_MAX //For exercise:
Doors Provide a facility for processes to issue procedure calls to functions in other processes running on the same system. //Sun Solaris OS Overview libdoor.so lightweight RPC 16bytes sent only for a remote call
Agenda IPC Overview System V IPC* Posix IPC Tools for performance analysis System configuration Hands On Lab Why smf what can smf do
System V IPC Overview Types System V shared memory System V message queues System V semaphore API
System V IPC overview key_t and ftok #include <sys/ipc.h> key_t ftok (const char * pathname,int id); Create and open IPC key=id(8bit)+st_dev(12bit)+st_ino(12bit) Same id+same path = same key IPC_PRIVATE will create an unique key and return the unique Id
Data structure in kernel for IPC ipc_perm data structure in /usr/include/sys/ipc.h
System V shared memory Concept Feature Kernel resource consumption sharing the same physical(RAM) memory pages by multi processes Feature Extremely efficient(*) Dynamically loaded when required,eg.modload Unload when system reboot or by command modunload Kernel resource consumption Shmid actual shared RAM pages data structure about the shared segment
System V shared memory - API Header file and system calls #include <sys/types.h> #include <sys/ipc.h> #include <sys/shm.h> int shmget(key_t key, size_t size, int shmflg); void *shmat(int shmid, const void *shmaddr, int shmflg); int shmctl(int shmid, int cmd, struct shmid_ds *buf); int shmdt(char *shmaddr); Cmd – IPC_STAT IPC_RMID IPC_SET SHM_LOCK/UNLOCK
Data structure Struct shmid_ds{ struct ipc_perm shm_perm /* operation permission structure */ size_t shm_segsz /* size of segment in bytes */ pid_t shm_lpid /* process ID of last shared memory operation */ pid_t shm_cpid /* process ID of creator */ shmatt_t shm_nattch /* number of current attaches */ time_t shm_atime /* time of last shmat() */ time_t shm_dtime /* time of last shmdt() */ time_t shm_ctime /* time of last change by shmctl() */ }
Example,process with shared memory
System overheads with shared memory User level Quantity of created shared memory Size Kernel level System memory Quantity of identifiers Max size for shared memory segment Translation table of shared memory Swap space Shared memory need more swap. ISM locked in kernel memory and not able to be paged,thus others need to be swapped out.
System tunable parameters System profile(before Solaris 10) /etc/system
Caveat Static mechanism Values specified are read and applied when system boot. Any changes are not applied until the system reboot Values specified in /etc/system are global and affect all processes on the system The obsolete tunable settings are ignored from Solaris 10.
Resource control available Dynamically resource control Prctl(1) get or set the resource controls of running processes, tasks, and projects rctladm(1M) display or modify global state of system resource controls Man resource_controls(5) Setrctl(2) API,set or get resource control values Man rctlblk_get_local_action(3C)
Dynamic resource control Settings in Solaris 10
Obsolete tunable parameter
Example
Optimization of shared memory - ISM Concept ISM - Intimate shared memory Sharing of the translation tables involved in virtual-to-physical address translation No need to share the actual physical memory pages Contrast Non-ISM per-processes mapping for shared memory pages
Background introduction – memory basic MMU - Virtual memory management unit Management and translation of the virtual view of memory(address space) to physical memory. HAT - Hardware address translation layer Management mapping of virtual to physical memory TLB - Translation lookaside buffer Hardware cache of recent address translation information MMU – V to P change table HAT – get the physical address in RAM TLB – HW cache for address transition.
Hardware address translation
Sharing the memory translation table ISM Sharing the memory translation table
Non-ISM
When attach shared memory by Shmat() ,use parameter SHM_SHARE_MMU SHM_PAGEABLE virtual memory resources are shared and the dynamic shared memory (DISM) framework is created. The dynamic shared memory can be resized dynamically.
ISM Feature Avoid generate redundant mappings to physical pages Intimate shared memory is an important optimization that makes more efficient use of the kernel and hardware resources involved in the implementation of virtual memory and provides a means of keeping heavily used shared pages locked in memory. Large page size automatically enabled
Example - ISM used for Database Without ISM 400 database processes 2GB shared segment ~262144 8KB pages 8Bytes for each page mapping-->2M 2M*400 800Mbytes for mapping! With ISM 2Mbytes only.
System V message queues Concept For process to send and receive messages in various size in an asynchronous fashion. Feature Dramatical loadable module Depends on /kernel/sys/msgsys /kernel/misc/ipc Kernel resource consumption Kernel memory Resource map Dramatical?
System V IPC - message queue API Header file and system calls #include <sys/msg.h> int msgget(key_t key, int msgflg); int msgctl(int msqid, int cmd, struct msqid_ds *buf); int msgsnd(int msqid, const void *msgp, size_t msgsz, int msgflg); size_t msgrcv(int msqid, void *msgp, size_t msgsz, long int msgtyp, int msgflg); More available from man page e,g, man msg
Data Structure Struct msqid_ds{ Struct ipc_perm msg_perm; /*read-write perms*/ Struct msg msg_first; /ptr to first message on queue*/ Struct msg *msg_last; /ptr to last message on queue*/ msglen_t msg_cbytes;/*current #bytes on queue*/ msgqnum_t msg_qnum; /*current #of messages on queue*/ msglen_t msg_qbytes;/*max # of bytes allowed on queue*/ pid_t msg_lspid; /*pid of last msgsnd()*/ pid_t msg_lrpid; /*pid of last msgrcv()*/ time_t msg_stime; /*time of last msgsnd()*/ time_t msg_rtime; /*time of last msgrcv()*/ time_t msg_ctime; /*pid of last msgctl()*/ }
System V message queues structures Data on message queue header with each message ID address map from kernel memory to user memory kernel msglock for each message queue
System overhead with message queues Kernel memory check first,no greater than 25% available kernel memory Resource map Identifier msqid_ds sys/msg.h Struct msg
System tunable parameter /etc/system
Available resource control process.max-msg-messages msginfo_msgtql(obsolete) maximum number of messages on a message queue process.max-msg-qbytes msginfo_msgmnb(obsolete) maximum number of bytes of messages on a message queue project.max-msg-ids msginfo_msgmni(obsolete) maximum number of message queue Ids allowed for a project
System V Semaphore Concept mechanical signaling device or a means of doing visual signaling. A method of synchronizing to a sharable resource by multi processes. Feature P – try or attempt; V – increase Semaphore sets available Kernel resource consumption Kernel memory Resource map
System V IPC semaphore -API Header file and system calls #include <sys/types.h> #include <sys/ipc.h> #include <sys/sem.h> int semget(key_t key, int nsems, int semflg); int semctl(int semid, int semnum, int cmd, ...); int semop(int semid, struct sembuf *sops, size_t nsops); More available from man page Man sem e,g.semget(2) semop(......,size_t nsops)-> how many arrays there is in sembuf data structure?
System V semaphore Binary semaphore Counting semaphore 0 or 1,like mutex Counting semaphore 0~system limitation ,P/V Set of counting semaphores One or more,each set has a limitation System limitation – for Posix IPC
Data structures for semaphore Struct semid_ds Struct sem_buf
System overhead with semaphore Kernel memory Must less than 25% of kernel memory Resource map allocation Sem structure based on semmns Identifier semid_ds Undo structure pointer Undo structure themselves Kernel mutex lock Semmns -> total size of semaphore semusz -> total bytes of undo structure
System tunable parameter /etc/system(part)
System tunable parameter /etc/system
Available resource control process.max-sem-ops seminfo_semopm(obsolete) Maximum operations per semop(2) call process.max-sem-nsems seminfo_semmsl(obsolete) Maximum semaphores per identifiers project.max-sem-ids seminfo_semmni(obsolete) Number of semaphore identifiers
Agenda IPC Overview System V IPC Posix IPC Tools for performance analysis System configuration Hands On Lab Why smf what can smf do
Types Posix message queues Posix semaphore Posix shared memory API Posix IPC Overview Types Posix message queues Posix semaphore Posix shared memory API Based on file system open files /memory spaces – the basic limitations. OPEN_MAX - max # of files a process can have open
Posix IPC and System V IPC Common points Types/Function Difference-Implementation Library – libposix4.so vs libc.so Implementation Built on Solaris file memory mapping interface -- mmap(2) acquired desired resource by using a file name convention
Difference with System V IPC (continue...) No APIs entering into kernel are executed No tunable parameters are required Limiting factors - Per-process limit open files memory spaces Common routines _pos4obj _open/_pos4obj _name _open _nc/_close _nc _pos4obj _lock/_pos4obj _unlock All POSIX IPCs call mmap directly or indirectly. Message Queue/Semaphore calls mmap directly. Shared memory calls mmap in the middle. open/shm_open ,then mmap()
mmap Function mapping a file or some other named objects into a process's space address. Achievement Common files used for memory reflecting Special files for anonymous memory reflecting shm_open for unallied processes shared memory API #include <sys/mman.h> void *mmap(void *addr,size_t len,int prot,int flags,int fd,off_t offset); A file or shared object will be mapped into different process's address space. Special files—not exist in the file system. Process with different parents-- shm_open first Not all the Fds can be used to mmap(),e.g.socket fd. Only those who can be used by open/write will do this. FIFO/Pipe/message queue will involved in kernel->user land data exchange. Mmap will not. Kernel will do the mmap in the background and won't do any I/O operation.
Process address space with mmap(2) Memory mapped – can be private or shared. shm_open can decided that.
Posix Semaphores Named semaphore sem_open()/sem_close()/sem_unlink Unnamed semaphore(memory based) sem_init()/sem_destroy Named-- process with no parents' relationship , easy to use by the name unnamed-- unnecessary to get the name,different threads in a process. 3 types of realization of named semaphore : FIFO/ memory mapping I/O and system V semaphore named semaphore: based on file system files exists/non-exists. unnamed :based on memory. If the semaphore is used in 1 process- destroyed when process exit. Otherwise,kernel persist. Used between threads/processes. It will use FD when is created,but free it later.
System-imposed limits <unistd.h> SEM_NSEMS_MAX Maximum number of semaphores per process SEM_VALUE_MAX Maximum value of a semaphore e,g. #getconf -a|grep SEM
Posix message queue With the right privilege will be able to write/read messages,with priority. Different with System V message queue: 1.return the message with the highest priority 2.signal /thread will be issue whenever message is put onto the queue. System V: no priority no signal
System limits in /usr/include/limits.h Posix message queue System limits in /usr/include/limits.h MQ_OPEN_MAX MQ_PRIO_MAX e,g getconf -a |grep MQ Tunable parameters mq_open(3R) mq_setattr(3R)
Agenda Solaris multi-thread process Inter-processes IPC System V IPC Posix IPC Tools for performance analysis System configuration Hands On Lab
Performance analysis process Understand the problem. Collect data with tools for performance analysis. iostat,kstat,mdb,pmap,vmstat,ps,etc. Proc tools Dtrace Sun studio11 Separate all the data you get from different layers. CPU Memory System I/O File system ...
Performance tuning process Find the area and the period of the bottleneck. Set up performance goal. Performance tuning E,g. Resource control with IPC. Review and repeat until meet tuning goals . Report with comparison available. Tools for performance analysis related to IPC
kmstat(1M)
vmstat(1M) Rports virtual memory statistics regarding kernel thread, virtual memory, disk, trap, and CPU activity. memory-report on usage of virtual and real memory mf-minor faults pi- kilobytes paged in po- kilobytes paged out
pmap -x(1M)
Pmap -x example
ipcs(1M) report inter-process communication facilities status -m active shared memory segments. -q active message queues. -s active semaphores. -i number of ISM attaches to shared memory segments. -p process number -A all print options, -b, -c, -i, -J, -o, -p, and -t. -z information about facilities associated with the specified zone -Z information about all zones
ipcrm(1) Remove a message queue, semaphore set, or shared memory ID -m shmid -q msqid -s semid -M shmkey -Q msgkey -S semkey -z zone
dtrace(1M) Comprehensive dynamic tracing framework for the Solaris Operating system. Powerful infrastructure to permit administrators,developers to understanding behavior of the operating system and user processes. Observe the system call related to ipc fbt provider provides probe into every function in the kernel shmsys provider provides probe into system API with ipc ipc module,probe related to ipc in kernel
Sun Studio- Performance analyzer Combination of compiler,libraries of optimized functions and tools for performance analysis. Command and sub-command er_print-- generate a plain-text report of performance data Collect – simplest interface for collecting data Dbx collector – performance analyzing for active process Analyzer – GUI tool #collect -L unlimited -A copy -F on -d /tmp/a.out ->/tmp/test.1.er #analyzer test.1.er
Agenda Solaris multi-thread process Inter-processes IPC System V IPC Posix IPC Tools for performance analysis System configuration Hands On Lab Why smf what can smf do
How to modify kernel parameters? System configuration How to modify kernel parameters? Modify the /etc/system Use the kernel debugger(kmdb) Use the modular debugger(Mdb) Use the ndd command to set TCP/IP parameters Modify the /etc/default files Viewing Solaris system configuration information Sysdef(1M) command Kstat(1M) or Kstat(3Kstat)
Example Setting a parameter in /etc/system Set nfs:nfs_nra=4(read-ahead blocks that are read for file system mounted using NFS version 2 software) Using mdb to change a value
Sysdef command
Reference Solaris Internal Unix Network Programming Vol II Solaris Dynamic Tracing Guide Solaris Modular Debugger Guide Solaris Tunable Parameters Reference Manual
System Config IPC Iris Zhu iris.zhu@sun.com