Download presentation
Presentation is loading. Please wait.
1
Summer 2002 at SLAC Ajay Tirumala
2
Summer 2002 at SLAC – Ajay Tirumala
Main Projects Measuring disk throughputs on remote hosts considering parameters like File System Read[write]-block size Sequential/random reads[writes] Committing sequence for writes File sizes Iperf QUICK mode A new algorithm which reduces the time for measuring end-to-end bandwidth And thus also the network traffic generated Summer 2002 at SLAC – Ajay Tirumala Aug 23rd, 2002
3
Summer 2002 at SLAC – Ajay Tirumala
Disk Throughputs File Systems NFS uses client’s main-memory as cache. Data can be lost during reads/writes. So, need to perform small sized reads and commit often. AFS uses session semantics Local disk is the cache UFS – default file system for Solaris fwrites write to the disk buffer, committed to disk on fsync, buffer is full or when disk caching is disabled EXT – most popular file system for Linux Layer below the VFS Has the concept of pre-allocation (allotting upto 8 adjacent file blocks when a block is requested). Mount option available for greater write speeds (with lesser consistency). Summer 2002 at SLAC – Ajay Tirumala Aug 23rd, 2002
4
Summer 2002 at SLAC – Ajay Tirumala
Disk Reads First read will necessitate a disk-read in most cases A memory read will indicate minimal memory activity a very large memory since the tests are performed with an interval of days. Second read (performed immediately after first read) will generally be read from memory unless disk caching is disabled Since there is a good probability that even the first read can be from memory, we consider disk writes as the primary metric for disk speeds. Summer 2002 at SLAC – Ajay Tirumala Aug 23rd, 2002
5
Summer 2002 at SLAC – Ajay Tirumala
Disk writes Commit modes – used fsync to commit files to disk Plain (no commit) Commit each write Commit at end – Most indicative of the disk bandwidth achievable Block sizes For local disks use large block sizes (1-2 MB) For remote writes, 64KB/128KB will suffice File sizes Using a large file size (2GB) increased the throughput in some cases. Default was 64MB. Caution: NFS may not return error during fwrites, it may return an error only on an fsync Summer 2002 at SLAC – Ajay Tirumala Aug 23rd, 2002
6
Possible areas to investigate
Could consider different disk subsystems like RAID Analysis of parallel disk-transfers using BBCP. Initial tests have indicated that in cases where disk is the limiting factor, using single thread is the best option. Algorithm to estimate disk speeds without using large writes*. Manufacturers’ specs lose meaning with Network File Systems and even for local file systems with multiple disks. Summer 2002 at SLAC – Ajay Tirumala Aug 23rd, 2002
7
Summer 2002 at SLAC – Ajay Tirumala
Iperf QUICK Mode Problem Current TCP apps cannot detect when they are out of slow-start Bandwidth measurement apps have to run for a considerable time to counter the effects of slow-start. Solution Use Web100 to detect the end of slow-start Measure bandwidth for a small period after slowstart (say 1s). This should save about 90% of estimation time and traffic generated. Summer 2002 at SLAC – Ajay Tirumala Aug 23rd, 2002
8
Detecting end of Slow-start
Outline Determine a sampling period for Congestion Window Detect the absence of exponential increase every RTT Handle pathological cases Connection may not get out of slow-start Multiple slow-starts Connection may have a very small bandwidth-delay product. E.g. localhost transfers, with latency in nano-seconds. At present, it handles Reno and Vegas It should handle Net100/Floyd stacks with minor modifications. Summer 2002 at SLAC – Ajay Tirumala Aug 23rd, 2002
9
The Quick mode Algorithm
Initialize Iperf sockets and initialize Web100 connection for the for the Iperf socket. Start Web100 data collection thread This will indicate when the connection is definitely out of slow-start Detect the end of slow-start in the data transfer thread If congestion window does not stabilize, do NOT report QUICK mode results Measure bandwidth for 1s (or user specified time) after slow-start Summer 2002 at SLAC – Ajay Tirumala Aug 23rd, 2002
10
Summer 2002 at SLAC – Ajay Tirumala
Salient results Slow-starts can be From 0.2 seconds for low-latency networks Up to 5 sec for long haul high bandwidth networks. Maximum gains here by using Iperf in QUICK mode. Unless, we use it in quick mode, we can never be sure that the connection is out of slow-start Differs with throughputs for running Iperf for 20s by less than 10% Even performed some tests on dialup links (as receiver) with good results. Summer 2002 at SLAC – Ajay Tirumala Aug 23rd, 2002
11
Summer 2002 at SLAC – Ajay Tirumala
Web100 experiences A must use tool (I’m a fan) User-APIs can be improved Behaves well for a sampling time of 20ms. Summer 2002 at SLAC – Ajay Tirumala Aug 23rd, 2002
12
Possible areas to investigate
Integrate with BW tests. Perform tests with slow-senders. Empirical estimates immediately after slow-start : Using RTT and rate of increase of congestion window. Summer 2002 at SLAC – Ajay Tirumala Aug 23rd, 2002
13
Summer 2002 at SLAC – Ajay Tirumala
Links Disk : Iperf Quick mode : Documentation and results of tests with all IEPM-BW managed nodes available from these links. Summer 2002 at SLAC – Ajay Tirumala Aug 23rd, 2002
14
Summer 2002 at SLAC – Ajay Tirumala
Other stuff… Miniperf is a small Iperf-like program written to Monitor user-specified Web100 variable(s) Allows setting window sizes and test times Can include parallel thread functionality Generate graphs (rate based, sum based) Generate HTML Created a single Iperf version to run on IPv4/v6 (Web100)/(no Web1000). Summer 2002 at SLAC – Ajay Tirumala Aug 23rd, 2002
15
Thank you!!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.