User-Level Interprocess Communication for Shared Memory Multiprocessors Bershad, B. N., Anderson, T. E., Lazowska, E.D., and Levy, H. M. Presented by Chris Eigner
Review of LRPC RPC concept can be used within a single machine as IPC Caller/callee in RPC are on same machine…room for optimizations –Run client thread in context of server, avoid scheduler –Argument stacks allocated in shared memory, avoid message copying –Domain caching to reduce context-switch overhead
Problems with RPC/LRPC Kernel mediates every cross-address space call - 70% of total overhead Poor performing cross-address space communication –Kernel-level communication + user-level thread management Opportunity for more SMP optimizations
SMP Optimizations No need to switch processor to another address space –Remove kernel from equation! –Address spaces share memory directly –Processor reallocation can be avoided Preserves valuable cache/TLB contexts Cost can be amortized over independent calls –Inexpensive thread management; orders of magnitude less than kernel-level.
URPC Responsibilities URPC design isolates three components of IPC –Thread management –Data transfer –Processor reallocation
Thread Management Context switch –Switching processor to another thread in same address space Processor reallocation –Reallocating processor to a thread in a different address space –via Processor.Donate
An Example
Data Transfer Bi-directional shared memory queue –Test-and-set locks (non-spinning) on each end Client/server model –send, receive, start, stop
Processor Reallocation URPC makes certain assumptions to reduce processor reallocation –Client has other threads to run or incoming messages –Server has or will have a processor to service message Allows inexpensive context switch during blocking phase of cross-address call Enables parallel execution of URPC while avoiding processor reallocation
Performance Firefly workstation –Four C-VAX processors –32Mb RAM!!! Taos OS –Provided kernel level threads FastThreads –User-level thread library URPC –Channel management –Message primitives
Performance
worse than LRCP
Performance
Deficiencies Optimistic assumptions won’t always hold –Single-threaded applications –High-latency I/O Processor reallocation occurs after two optimization checks (approx. 100 μs) –Is there an idle processor? –Is there an underpowered address space to which it can be reallocated? Voluntary return of processors can’t be guaranteed Two processors for single computation, only one active at a time
Summary SMP allows new freedoms in RPC design No need to switch processor to another address space –Preserves valuable cache/TLB contexts –1-2 orders of magnitude improvement But, not ideal for all application types –Single-threaded applications –High-latency I/O