B ENCHMARK ON D ELL 2950+MD1000 ATLAS Tier2/Tier3 workshop Wenjing wu AGLT2 / University of Michigan 2008/05/27
D ELL MD1000 2
CURRENT SETUP
2950 H ARDWARE EQUIPMENT Chassis Model: PowerEdge CPUS: Quad core, Intel Xeon CPU Model 15 Stepping 11 Memory : 16GB DDR II SDRAM, Memory Speed: 667 MHz NIC : Broadcom NetXtreme II BCM5708 Gigabit Ethernet Myricom 10G-PCIE-8A-C Raid controllers PERC 5/E Adapter Version (Slot 1 PCI-e 8x) PERC 5/E Adapter Version (Slot 2 PCI-e 4x) PERC 6/E Adapter Firmware version (Slot 1 PCI-e 8x) (extra 700$) PERC 6/E Adapter Firmware version (Slot 2 PCI-e 4x) (extra 700$) Storage Enclosures 4 MD1000 (each has15 SATA-II 750GB disks)
2950 S OFTWARE EQUIPMENT OS Scientific Linux CERN SLC release 4.5 (Beryllium) Kernel version: UL3smp (current UL5smp ) Version Report BIOS Version : (current 2.2.6) BMC Version : 1.33 (current 2.0.5) DRAC 5 Version : 1.14 (current 1.33)
B ENCHMARK T OOL Benchmark tool: iozone ( iozone el4.rf.x86_64 ) Raid configuration tool: omconfig ( srvadmin-omacore i386 ) Soft Raid: mdadm ( mdadm x86_64 )
M ETRICS OF B ENCHMARK Controller Level (both perc5/perc6) raid setup (R0, R5,R50,R6,R60) Read and write policy (ra, ara,nra, wb, wt, fwb) Threshold of both Controllers Stripe size (8KB,16KB,32KB,64KB, 128KB, 256KB,512Kb,1024KB) Perc5 support maximum 128KB stripe size, perc6 support maximum 1024KB stripe size Kernel tuning ( UL3smp) read Ahead size Request queue length IO scheduler File System tuning (xfs) inode size su/sw size internal/external log device
G ENERAL PRINCIPLE FOR B ENCHMARK There are various factors which would impact the benchmark result, to measure one, we are trying to fix the other affecting factors on a best value we have got or we anticipate.. We need to benchmark different IO patterns (sequence read/write random read/write/mix workload) In all, we need a benchmark for all best options for our Dell2950.
C ONTROLLER L EVEL raid setup (R5,R50,R6,R60) Read and write policy (ra, ara,nra, wb, wt, fwb) Threshold of Controller(perc5/perc6) Stripe size (8KB,16KB,32KB,64KB, 128KB, 256KB,512Kb,1024KB) Perc5 support maximum 128KB stripe size, perc6 support maximum 1024KB stripe size
P ERC 5 VS PERC 6 System setup: Controller=perc6/perc5 PCI slots= both pci express x4 and x8 raid=r60/r6/r50 stripe size =128KB read=ra, write=wb OS kernel= UL3smp readAhead size=10240Blocks=5MB nr_queue=128 queue_depth=128 IO_scheduler=deadline file system options: su=0, sw=0 isize=256, bsize=4096, log=internal bsize=4096 iozone options: filesize=32GB, ram size=16GB record size=512KB, multiple threads Measure: perc 5/6
READ
WRITE
R AID SETUP System setup: Controller=perc5 /perc6 PCI slots= both pci express x4 and x8 stripe size =128KB OS kernel= UL3smp readAhead size=10240Blocks=5MB nr_queue=128 queue_depth=128 IO_scheduler=deadline file system options: su=0, sw=0 isize=256, bsize=4096, log=internal bsize=4096 iozone options: filesize=32GB, ram size=16GB record size=512KB, multiple threads Measure: different raid (r5, r50,r6,r60)
W RITE
S OFT RAID ON PERC 5 Soft raid 0 over 2 r5: Soft raid stripe size should be the same as the hard raid5 stripe size(128KB) Soft raid 0 over 2 r50: Soft raid stripe size should be the same as the hard raid5 stripe size(128KB)
WRITE
R EAD
R EAD AND W RITE POLICY System setup: Controller=perc5 PCI slots= both pci express x4 and x8 raid=r50 stripe size =128KB OS kernel= UL3smp readAhead size=10240Blocks=5MB nr_queue=128 queue_depth=128 IO_scheduler=deadline file system options: su=0, sw=0 isize=256, bsize=4096, log=internal bsize=4096 iozone options: filesize=32GB, ram size=16GB different record size Measure: different policies (ra, nra,ara, wb,wt,fwb)
W RITE
R EAD
P ERC 5 THRESHOLD System setup: Controller=perc5 Pci slots= pci express x8 raid=r0 stripe size =128KB read=ra, write=wb OS kernel= UL3smp readAhead size=10240Blocks=5MB nr_queue=128 queue_depth=128 IO_scheduler=deadline file system options: su=0, sw=0 isize=256, bsize=4096, log=internal bsize=4096 iozone options: filesize=32GB, ram size=16GB record size=512KB Measure single controller with different number of disks.(4-30disks)
P ERC 5 THRESHOLD
P ERC 6 THRESHOLD System setup: Controller=perc6 Pci slots= pci express x8 raid=r60 stripe size =512KB read=ra, write=wb OS kernel= UL3smp readAhead size=10240Blocks=5MB nr_queue=512 queue_depth=128 IO_scheduler=deadline file system options: su=0, sw=0 isize=256, bsize=4096, log=internal bsize=4096 iozone options: filesize=32GB, ram size=16GB record size=512KB Measure single controller with different number of disks.(8, 12,24,30,45)
P ERC 6 THRESHOLD
S TRIPE SIZE System setup: Controller=perc6 PCI slots= both pci express x4 and x8 raid=r60 stripe size =(64,128,256,512,1024)KB read=ra, write=wb OS kernel= UL3smp readAhead size=10240Blocks=5MB nr_queue=512 queue_depth=128 IO_scheduler=deadline file system options: su=0, sw=0 isize=256, bsize=4096, log=internal bsize=4096 iozone options: filesize=32GB, ram size=16GB record size=512KB, multiple threads Measure: different stripe size (64,128,256,512,1024)KB
R60 – STRIPE SIZE
R60- STRIPE SIZE
K ERNEL TUNING read Ahead size Request queue length IO scheduler
R EAD A HEAD SIZE System setup: Controller=perc5 PCI slots= both pci express x4 and x8 raid=r50 stripe size =128KB read=ra, write=wb OS kernel= UL3smp nr_queue=128 queue_depth=128 IO_scheduler=deadline file system options: su=0, sw=0 isize=256, bsize=4096, log=internal bsize=4096 iozone options: filesize=32GB, ram size=16GB record size=512KB, Measure: different readAhead size
R EAD
R EQUEST Q UEUE LENGTH System setup: Controller=perc6 PCI slots= both pci express x4 and x8 raid=r60 stripe size =128KB read=ra, write=wb OS kernel= UL3smp readAhead size=10240Blocks=5MB queue_depth=128 IO_scheduler=deadline file system options: su=0, sw=0 isize=256, bsize=4096, log=internal bsize=4096 iozone options: filesize=32GB, ram size=16GB record size=512KB, multiple threads Measure: different request queue length
R EAD
W RITE
IO SCHEDULER System setup: Controller=perc6 PCI slots= both pci express x4 and x8 raid=r50 stripe size =128KB read=ra, write=wb OS kernel= UL3smp readAhead size=10240Blocks=5MB nr_queue=512 queue_depth=128 file system options: su=0, sw=0 isize=256, bsize=4096, log=internal bsize=4096 iozone options: filesize=32GB, ram size=16GB record size=512KB, multiple threads Measure: different scheduler
READ
WRITE
R ANDOM READ
F ILESYSTEM TUNING inode size su/sw size internal/external log device
F ILE S YSTME System setup: Controller=perc5 Raid=r50 PCI slots= both pci express x4 and x8 stripe size =128KB OS kernel= UL3smp readAhead size=10240Blocks=5MB nr_queue=128 queue_depth=128 IO_scheduler=deadline file system options: su=0, sw=0 isize=256, bsize=4096, dd options: filesize=10GB, ram size=320MB record size=1MB Measure: internal or external log device for xfs
W RITE
R EAD
X FS INODE SIZE System setup: Controller=perc5 Raid=r50 PCI slots= both pci express x4 and x8 stripe size =128KB OS kernel= UL3smp readAhead size=10240Blocks=5MB nr_queue=128 queue_depth=128 IO_scheduler=deadline file system options: su=0, sw=0 bsize=4096, Internal Log, isize=256, bsize=4096 dd options: filesize=10GB, ram size=320MB record size=1MB Measure: xfs inode size
X FS INODE SIZE
X FS SU / SW SIZE System setup: Controller=perc5 Raid=r50 PCI slots= both pci express x4 and x8 stripe size =128KB OS kernel= UL3smp readAhead size=10240Blocks=5MB nr_queue=128 queue_depth=128 IO_scheduler=deadline file system options: isize=256KB, bsize=4096KB, Internal Log, isize=256KB, bsize=4096KB iozone options: filesize=10GB, ram size=320MB record size=1MB Measure: xfs sw/su size
S U / SW SIZE
O UR SETUP N OW System setup: Controller=perc56 Raid=r60 PCI slots= both pci express x4 and x8 stripe size =512KB OS kernel= UL5smp Kernel options: readAhead size=10240Blocks=5MB nr_queue=512 queue_depth=128 IO_scheduler=deadline file system options: isize=256KB, bsize=4096KB, Internal Log, isize=256KB, bsize=4096KB
O UR PERFORMANCE NOW Single read=670MB/s Aggregate read=1500MB/s (threads>=2) Even with 40 concurrent readers, it can still achieve 1200MB/s.. Single write=320MB/s Aggregate write=680MB/s (threads>=2) This is not the best IO, r60 with stripe size 128KB can achieve 760MB/s of single read and single write performs almost the same. For a production system, we focus more on the aggregate performance…
O NGOING PROJECT CITI people of UM are doing: Disk-to-disk transfer over 10 GbE Deliverables Monthly report on performance tests, server configurations, kernel tuning, and kernel bottlenecks Final report on performance tests, server configurations, kernel tuning, and kernel bottlenecks UltraLight kernel Deliverables Tuned and tested UltraLight kernel with full feature set Current 10GbE NIC drivers Current storage drivers Tuned for WAN data movement Web100 patches Other patches for performance, security, and stability Release document and web page updates for UltraLight kernel Recommend sustainable options for the Ultralight kernel in the near and intermediate term
O NGOING PROJECT ( CONT ) QoS experiments Deliverable Document throughput performance with and without QoS in the face of competing traffic
M ORE INFORMATION AGLT2 IO benchmark page: stOnRaidSystems References: ml