LAIO: Lazy Asynchronous I/O For Event Driven Servers Khaled Elmeleegy Alan L. Cox
Outline Available I/O APIs and their shortcomings. Available I/O APIs and their shortcomings. Event driven programming and its challenges. Event driven programming and its challenges. Lazy Asynchronous I/O (LAIO). Lazy Asynchronous I/O (LAIO). Experiments and results. Experiments and results. Conclusions. Conclusions.
Key Idea Existing I/O APIs come short of event driven server needs. Existing I/O APIs come short of event driven server needs. LAIO fixes that. LAIO fixes that.
Non-Blocking I/O System call may return without fully completing the operation. System call may return without fully completing the operation. Ex: write to a socket. Ex: write to a socket. System call may also return with completion. System call may also return with completion. Disadvantages: Disadvantages: Not available for disk operations. Not available for disk operations. Program using it needs to maintain state. Program using it needs to maintain state.
Asynchronous I/O (AIO) System call returns immediately. System call returns immediately. Operation always runs to completion and sends notification on completion. Operation always runs to completion and sends notification on completion. Via signal, event or polling. Via signal, event or polling. Disadvantages Disadvantages Missing disk operations like open and stat. Missing disk operations like open and stat. Always receive completion via a notification even if the operation didn’t block. Always receive completion via a notification even if the operation didn’t block. Lower performance. Lower performance.
Event Driven Programming with I/O event_loop(..) { … while(true) { event_list= get available events; for each event ev in event_list do call handler of ev; } handler(…) { …/* do stuff 1 */ open(..); /*may block*/ …/* do stuff 2 */ return; /* to event_loop */ } (What we have)
Event Driven Programming with I/O event_loop(..) { … while(true) { event_list= get available events; for each event ev in event_list do call handler of ev; } handler(…) { …/* do stuff 1 */ open(..); /*may block*/ …/* do stuff 2 */ return; /* to event_loop */ } If Blocks Server Stalls (What we have)
Event Driven Programming with I/O event_loop(..) { … while(true) { event_list= get available events; for each event ev in event_list do call event_handler of ev; } (What we want) handler1(…) { …/* do stuff 1 */ open(..); /*may block*/ if open blocks { set handler2 as callback for open; return; /* to event_loop */ } …/* do stuff 2 */ return; /* to event_loop */ }
Event Driven Programming with I/O event_loop(..) { … while(true) { event_list= get available events; for each event ev in event_list do call event_handler of ev; } (What we want) handler1(…) { …/* do stuff 1 */ open(..); /*may block*/ if open blocks { set handler2 as callback for open; return; /* to event_loop */ } …/* do stuff 2 */ return; /* to event_loop */ }
Event Driven Programming with I/O event_loop(..) { … while(true) { event_list= get available events; for each event ev in event_list do call event_handler of ev; } handler2(…) { …/* do stuff 2 */ return; /* to event_loop */ } (What we want) handler1(…) { …/* do stuff 1 */ open(..); /*may block*/ if open blocks { set handler2 as callback for open; return; /* to event_loop */ } …/* do stuff 2 */ return; /* to event_loop */ }
Lazy Asynchronous I/O (LAIO) Like AIO on blocking: asynchronous completion notification. Like AIO on blocking: asynchronous completion notification. Also like AIO operations are done in one shot and no partial completions. Also like AIO operations are done in one shot and no partial completions. Similar to non-blocking I/O if operations completes without blocking. Similar to non-blocking I/O if operations completes without blocking. Scheduler activation based. Scheduler activation based. Scheduler activation is an upcall delivered by kernel when a thread blocks or unblocks. Scheduler activation is an upcall delivered by kernel when a thread blocks or unblocks.
LAIO API int laio_syscall (int number,…) Performs the specified syscall asynchronously. void* laio_gethandle (void) Returns a handle to the last laio operation. laio_list laio_poll (void) Returns a list of handles to completed laio operations. Function Name Description
laio_syscall(int number, …) Enable upcalls. Save context Invoke system call System call blocks? Disable upcalls Return retval errno = EINPROGRESS Return -1 upcall_handler(..) {. Steals old stack using stored context } No Yes Invoked via kernel upcall
Experiments and Experimental setup. Performance evaluated using both micro- benchmarks and event driven web servers (thttpd and Flash). Performance evaluated using both micro- benchmarks and event driven web servers (thttpd and Flash). Pentium Xeon 2.4 GZ with 2 GB RAM machines. Pentium Xeon 2.4 GZ with 2 GB RAM machines. FreeBSD-5 with KSE, FreeBSD’s scheduler activation implementation. FreeBSD-5 with KSE, FreeBSD’s scheduler activation implementation. Two web traces, Rice and Berkeley, with working set sizes 1.1 GB and 6.4 GB respectively. Two web traces, Rice and Berkeley, with working set sizes 1.1 GB and 6.4 GB respectively.
Micro-benchmarks Read a byte from a pipe 100,000 times two cases blocking and non-blocking: Read a byte from a pipe 100,000 times two cases blocking and non-blocking: For non-blocking (byte ready on pipe) For non-blocking (byte ready on pipe) LAIO is 320% faster than AIO. LAIO is 320% faster than AIO. LAIO is 40% slower than non-blocking I/O. LAIO is 40% slower than non-blocking I/O. For blocking (byte not ready on pipe) For blocking (byte not ready on pipe) AIO is 8% faster than LAIO. AIO is 8% faster than LAIO. Call getpid(2) 1,000,000 times in two cases KSE enabled and disabled. Call getpid(2) 1,000,000 times in two cases KSE enabled and disabled. When disabled program was 5% faster (KSE overhead) When disabled program was 5% faster (KSE overhead)
thttpd Experiments thttpd is an event driven server modified to use libevent an event notification library. thttpd is an event driven server modified to use libevent an event notification library. Two versions of thttpd, libevent-thttpd and LAIO-thttpd. Two versions of thttpd, libevent-thttpd and LAIO-thttpd. For LAIO-thttpd, thttpd was modified by breaking up event handlers around blocking operations like open. For LAIO-thttpd, thttpd was modified by breaking up event handlers around blocking operations like open.
thttpd Results (Berkeley Throughput)
thttpd Results (Berkeley Response Time)
thttpd Results (Rice Throughput)
thttpd Results (Rice Response Time)
thttpd Results (Rice Throughput 512 MB RAM)
thttpd Results (Rice Response Time 512 MB RAM)
Flash An event driven web server. An event driven web server. 3 flavors: 3 flavors: Pure event driven. Pure event driven. AMPED: Asymmetric Multiprocess Event Driven. AMPED: Asymmetric Multiprocess Event Driven. Event driven core. Event driven core. Potentially blocking I/O handed off to a helper process. Potentially blocking I/O handed off to a helper process. Helper does an explicit read to bring data in memory. Helper does an explicit read to bring data in memory. LAIO: uses LAIO to do all I/O asynchronously. LAIO: uses LAIO to do all I/O asynchronously. For each of the three flavors files are sent either with sendfile(2), or using mmap(2). For each of the three flavors files are sent either with sendfile(2), or using mmap(2).
Flash Experiments All experiments are done with 500 clients. All experiments are done with 500 clients. All sockets are blocking. All sockets are blocking. For mmap: File maped to memory, then written to socket. For mmap: File maped to memory, then written to socket. Page faults may happen. Page faults may happen. mincore(2) is used to check if pages are in memory. mincore(2) is used to check if pages are in memory. For sendfile: File is sent via the sendfile(2) syscall which may block. For sendfile: File is sent via the sendfile(2) syscall which may block. Optimized sendfile: Kernel is modified that sendfile returns if blocking on disk occurs. Optimized sendfile: Kernel is modified that sendfile returns if blocking on disk occurs.
Flash Throughput (mmap) Berkeley-Cold 81 Mbps 134 Mbps 132 Mbps Berkeley-Warm 78 Mbps 127 Mbps 131 Mbps Rice-Cold 203 Mbps 386 Mbps 299 Mbps Rice-Warm 830 Mbps 800 Mbps 797 Mbps Configuration Flash-event (mmap) FLASH-AMPED (mmap) FLASH-LAIO (mmap) For Rice-Cold: callouts to the helper process for AMPED. For LAIO page faults. For Rice-Cold: callouts to the helper process for AMPED. For LAIO page faults. Performance difference is due to prefetching. Performance difference is due to prefetching.
Flash Throughput (sendfile) Berkeley-Cold 122 Mbps 171 Mbps Berkeley-Warm 125 Mbps 180 Mbps 179 Mbps Rice-Cold 277 Mbps 398 Mbps 382 Mbps Rice-Warm 845 Mbps 843 Mbps 815 Mbps Configuration Flash-event (sendfile) FLASH-AMPED (sendfile) FLASH-LAIO (sendfile)
Conclusions LAIO subdues shortcomings of other I/O APIs. LAIO subdues shortcomings of other I/O APIs. LAIO is more than 3 times faster than AIO when data is in memory. LAIO is more than 3 times faster than AIO when data is in memory. LAIO serves well event driven servers. LAIO serves well event driven servers. LAIO increased thttpd throughput by 38%. LAIO increased thttpd throughput by 38%. LAIO matched Flash performance with no kernel modifications. LAIO matched Flash performance with no kernel modifications.
Questions?