Download presentation
Presentation is loading. Please wait.
Published byPenelope Crawford Modified over 9 years ago
1
The Center for Autonomic Computing is supported by the National Science Foundation under Grant No. 0758596. NSF CAC Seminannual Meeting, October 5 & 6, 2010 Technical Approach – Parallel File System Virtualization Per application virtual PFS’s Dynamically created and destroyed based on application lifecycles Application-specific I/O bandwidth allocation per virtual PFS Proxy-based PFS virtualization Indirection of parallel I/Os between PFS clients and servers Create per-application virtual PFS’s and enforce I/O resource allocation QoS-driven Storage Management for High-end Computing Systems Yonggang Liu and Renato Figueiredo University of Florida Ming Zhao Florida International University Abstract We propose a QoS-driven storage management scheme for high- end Computing (HEC) storages. In this system, a virtualized Parallel File System (PFS) is implemented for the I/O bandwidth partitioning. This virtualization layer is realized by proxies which intercept the packets between the clients and the data servers. The I/O request scheduling algorithms are deployed on the virtualization layer, i.e., the proxies. For the reason that testing scheduling algorithms on a real PFS is very expensive, as a necessary part of the project, we have developed a PFS simulator for scheduling algorithm evaluations. From the current testing results, we see that this simulator is scalable, easy-to-use, and is capable of showing system details and algorithm performance within acceptable time. Motivation and Goals Motivations The growing popularity of parallel storage systems in HEC The diversity of application I/O requirements in HEC The lack of QoS differentiation in typical HEC parallel storage systems Goals Application-QoS driven storage resource management in high-end computing systems Milestones and Deliverables We have developed the Parallel File System (PFS) simulator, which is able to provide a PVFS2 system with adequate details for scheduling algorithm tests. The system is has been tested to simulate a 256 clients / 32 data- server system. On a 2.0 GHz single core, 2 GB MEM desktop, the simulated-time/run-time ratio is about 1:20~1:10. This system is easy to use. The scheduling algorithms can be deployed in C++ via the data server API. The network topology is illustrated in script files in simple semantics. Future Work Improve the simulator efficiency, by optimizing the system architecture Improve the simulator scale to support more clients and servers Make the simulator more general by importing other parallel file system models, such as PanFS, Lustre. Research the data flow characteristics in HEC PFS’s Develop and evaluate the scheduling algorithms for HEC PFS Acknowledgements Dulcardo Clavijo, Yiqi Xu and Lixi Wang, Florida International University Greg Ganger, Carnegie Mellon University Experimental Results – PFS Simulator To show that the PFS Simulator is able to reflect the performance of different algorithms, we conducted two groups of tests. System Setup 32 clients equally divided to 2 groups 4 data servers &1 metadata server Trace files generated by IOR, each data request size is 256 MB. Each client has 100 MB checkpoint write to perform Affect of Weight in SFQ(D) Group 1 (blue) and Group 2 (red) share the same resource (4 data servers) For SFQ(D), the throughput roughly reflects the weight assignment. Global Fairness of DSFQ(D) Group 1 accesses 4 data servers, while Group 2 accesses 3 data servers From the throughput, we can see local SFQ(D) scheduling algorithm can not achieve global fairness, in contrast, Distributed SFQ(D) can achieve fairness by knowing the global scheduling information. Virtual PFS1 Compute nodes APP1 APP2 APPn PFS Proxy Virtual PFS2 Storage nodes Technical Approach – PFS Simulator Simulate PFS network Based on discrete event simulation library (OMNeT++ 4.0) Network topology, file stripping strategy and scheduling algorithms are deployed here Simulate PFS disks Based on disk system simulator (DiskSim 4.0) Each Disksim instance simulates one data server. They communicate with the data server instances in OMNeT++ via different TCP connections. Scheduling Algorithm output Stripping Strategy Metadata Server Local FS Disk queue Metadata Server Client trace Client trace Client trace Local FS Disk queue Local FS Disk queue Disksim instances OMNeT++ Simulated Network TCP Connections SFQ(D) (2:1) SFQ(D) (10:1) DSFQ(D) (1:1) SFQ(D) (1:1)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.