First Look at the New NFSv4.1 Based dCache Art Kreymer, Stephan Lammel, Margaret Votava, and Michael Wang for the CD-REX Department CD Scientific Computing Facilities Leaders Meeting 13 December 2011
First look at dCache with NFSv Michael Wang Introduction –Investigate alternatives to the BlueArc-based IF central disk servers BA has performed fairly well for the, currently, relatively modest requirements of the IF experiments (~0.5 PB) Will it continue to satisfy the growing requirements of the IF experiments in the years to come in a reasonable, cost-effective way? –Started surveying storage options available: NFSv4.1: all the rage among the major storage vendors (Panasas, IBM, EMC, NetApp, even BlueArc) Despite all the hype, no stable server implementation readily available for evaluation –Stumbled upon presentation on the web by DESY dCache team: Described a stable NFSv4.1 implementation in a new Chimera-based version of dCache All the nice features of the old dCache PLUS all files in exported filesystem tree now directly accessible (POSIX compliant) without special protocols (like DCAP) ! i.e. dCache filesystem can now appear & behave like a regular nfs accessible area on a worker node
First look at dCache with NFSv Michael Wang Introduction –Approached our local dCache experts: REX and DMS meeting where DMS gave overview of the new dCache. DMS department set up a test dCache system (version ) for us to evaluate (many thanks to Dmitry Litvintsev, Yujun Wu, Terry Jones, Stan Naymola and Gene Oleynik from DMS for their support). –Brief overview of talk: Description of test setup Present some initial test results Focus is on technical I/O performance: no discussion on other nice features of NFSv4.1 (e.g. ACLs) no cost comparisons and studies (relative to BA)
First look at dCache with NFSv Michael Wang Test setup –Client side: SLF6 Virtual Machines on Fermicloud (many thanks to Steve Timm and Farooq Lowe of Fermigrid Dept.) Linux kernel (a renamed 3.0 kernel) –Server side: dCache with one head node, two pool nodes Each pool node has 2 RAID6 partitions with 4x250GB SATA drives each
First look at dCache with NFSv Michael Wang Throughput test results IOzone in cluster mode with sequential write and read test. Increased number of clients beyond 10 (multiple clients per VM) but aggregate data transferred Is fixed to 40GB. One 4GB file transferred per client. One client per VM.
First look at dCache with NFSv Michael Wang Monitoring pool node disk activity during IOzone test Pool node 1 Partition A Pool node 1 Partition B Pool node 2 Partition A Pool node 2 Partition B Strip chart recording: (x-axis, y-axis) = (time, MB/sec) Disk write rateDisk read rate
First look at dCache with NFSv Michael Wang Metadata test results Mdtest with multiple MPI tasks, each creating/”stat”-ing/removing 100 directories zero-length files.
First look at dCache with NFSv Michael Wang Conclusion –Presented some preliminary test results on the new NFSv4.1 Chimera-based version of dCache –Results look promising, throughput scales well with number of pool nodes –Metadata performance may be adequate for now but may be a cause for concern in the future (need to consult and discuss with the developers) –Will do more “real-world” tests, e.g. with Art Kreymer’s BlueArc performance monitoring scripts –More details can be found in a write-up in CD DocDB (CS-doc- 4583): –Details on setting up VM clients with pNFS-enabled Linux kernels available on Fermi Redmine IF-storage project Wiki: –Many thanks to DMS and Fermigrid Depts. for their unwavering support!
First look at dCache with NFSv Michael Wang End
First look at dCache with NFSv Michael Wang Monitoring pool node disk activity Pool node 1 Partition A Pool node 1 Partition B Pool node 2 Partition A Pool node 2 Partition B