GridOS: Operating System Services for Grid Architectures Pradeep Padala, Joseph N Wilson CISE,University of Florida Presented by Aseem R. Deshpande
Software infrastructures: Globus, Legion, Unicore etc. Grid: enables sharing , selection and aggregation of a wide variety of geographically distributed resources including supercomputers, storage systems, data sources and specialized devices owned by different organizations administered with different policies. Software infrastructures: Globus, Legion, Unicore etc. Operating system support for writing grid computing applications is minimal or non-existent resulting in non-efficient and sub-optimal code. 2
Overview Previous work GridOS design principles Module design description Core modules Additional modules Implementation Performance Conclusions and future work
Previous work Motivation Middleware infrastructure solutions : Globus, Legion, Unicore etc. Run on top of OS and provide support for writing applications on grid. Use tools and libraries of the native OS which have not been designed with high performance computing applications in mind. Results in low performance and often erroneous situations.
Previous work Previous work includes WebOS: Operating system services for wide area applications. PODOS: High performance services through optimized services in kernel. They make extensive changes to the core kernel making it difficult to port work to newer kernels. Legacy system support. Solutions are monolithic. Amoeba, Sprite : Underlying principles have been used for design.
Design Issues Principles of GridOS: Modularity: Linux kernel architecture is exploited Policy neutrality: Policy-free mechanisms , Policy-free API is provided to develop high level applications Universality of infrastructure: Basic set of services like high performance i/o, resource management common to all software infrastructures are provided. Minimal changes: No extensive modifications are made so that upgrade to new kernel becomes easy.
Module design
Module design Core modules gridos_io: High performance I/O module: Provides high performance network I/O capabilities for GridOS. a) No user space copying. b) TCP WAN throughput gridos_comm: Communication module: Various communication mediums exist. Communications operations can be specified independent of the implementation methods. gridos_rm: Resource management module: Middlewares that manage resources have to allocate and co-locate resources acc. to application requirements and many of them use a common set of services. Provides these facilities so that higher level issues like co-location, online-control etc can be addressed.
Module design gridos_pm: Process management module: Allows creation and management of processes in GridOS and communication primitives. Also provides services for process accounting. Additional modules gridos_ftp_common: Provides facilities common to FTP server and client like command parsing, handling responses etc. gridos_ftp_server: Implements FTP server in kernel space. Provides low overhead and high performance. Module is designed with security in mind. All operations are performed with unprivileged user-id and group-id. gridos_ftp_client: Implements FTP client in kernel.
Implementation A subset of GridOS was developed on Linux kernel 2.4.20. Adds basic ioctl and syscall interface. Current implementation include gridos_io, gridos_ftp_server, gridos_ftp_client and libgridos and middleware gridos-middleware. I/O module: Designed to minimize copy operations. Network API: Reading and writing from gridos managed socket. gridos_io_sync_read, gridos_io_async_read, gridos_io_write, gridos_io_buffer_setopt, gridos_io_buffer_getopt. File system API: uses similar functions for writing data to files. Calls Linux’s VFS.
Implementation FTP client and server modules: FTP server allows user to create FTP server on a specified port. sysctl mechanism is used for dynamic configuration. Threading model: High performance kernel threads are used. There are 2 thread pools: I/O threads and Listener threads. Library wrapper: Three ways to control GridOS behaviour from user-space sys_gridos system call, ioctl on gridos device and using /proc interface. libgridos provides an easy-to-use interface to these service.
Implementation C function wrappers are provided to read and write from/to network read and write from/to files set configuration options start and stop FTP server use the FTP client. GridOS middleware: Authors claim that gridos enables easier development of toolkits like Globus. Transporting a file: The example shows transferring a file from one location to another. User space file copying is avoided.
Implementation
Performance
Performance
Performance
Conclusions and future work Described are a set of operating system services for grid architecture. Require minimal changes and make commodity OS like Linux highly suitable for grid computing. Experiments show that high performance can be achieved. Future work Testing on a larger test-bed like GriPhyN. Implementation of all the modules. Provide a complete infrastructure.
My evaluation The area of research is very peculiar and not much research has been done or being done in this area. Eventually, when grid computing becomes pervasive, there would be a need for operating systems that provide functional support for grid computing. Since, Linux kernel is being used, developers all over the globe could contribute. Modular architecture will do a lot of good. Legacy application support is available.
My evaluation Limitations Only the file system module has been implemented. Other module performance needs to be evaluated to make any bold statement. The primary need is to have resource management and process management modules implemented in the OS. The project’s success depends on the successful implementation and adaptation of these modules. Graphs shown are ambiguous in terms of final results. sendfile() could be used. Nothing has been mentioned about the advantages of adopting the threading model of FLASH and TUX.
My evaluation Was it necessary to implement all the modules in kernel space? Experiments have to be conducted on larger test-beds and they have to be conducted in a real-time world. Authors have not mentioned anything (quantitatively and effect-wise) about the changes they have made to the kernel modules. There has to be a unified effort as in the case of globus. Only if a community effort is taken will there be any serious and effectual work in this area.