Download presentation
Presentation is loading. Please wait.
Published byStanley Morton Modified over 9 years ago
1
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA
2
2/33 Motivation Scientific experiments are generating large amounts of data Scientific experiments are generating large amounts of data Education research & commercial videos are not far behind Education research & commercial videos are not far behind Data may be generated and stored at multiple sites Data may be generated and stored at multiple sites How to efficiently store and process this data ? How to efficiently store and process this data ? Applic ation First Data Data Volume (TB/yr) Users SDSS199910100s LIGO2002250100s ATLAS /CMS 20055,0001000s Source: GriPhyN Proposal, 2000WCER2004500+100s
3
3/33 Motivation Grid enables large scale computation Grid enables large scale computation Problems Problems Data intensive applications have suboptimal performance Data intensive applications have suboptimal performance Scaling up creates problems Scaling up creates problems Storage servers thrash and crash Storage servers thrash and crash Users want to reduce failure rate and improve throughput Users want to reduce failure rate and improve throughput
4
4/33 Profiling Protocols and Servers Profiling is a first step Profiling is a first step Enables us to understand how time is spent Enables us to understand how time is spent Gives valuable insights Gives valuable insights Helps Helps computer architects add processor features computer architects add processor features OS designers add OS features OS designers add OS features middleware developers to optimize the middleware middleware developers to optimize the middleware application designers design adaptive applications application designers design adaptive applications
5
5/33 Profiling We (middleware designers) are aiming for automated tuning We (middleware designers) are aiming for automated tuning Tune protocol parameters, concurrency level Tune protocol parameters, concurrency level Depends on dynamic state of network, storage server Depends on dynamic state of network, storage server We are developing low overhead online analysis We are developing low overhead online analysis Detailed Offline + Online analysis would enable automated tuning Detailed Offline + Online analysis would enable automated tuning
6
6/33 Profiling Requirements Requirements Should not alter system characteristics Should not alter system characteristics Full system profile Full system profile Low overhead Low overhead Used OProfile Used OProfile Based on Digital Continuous Profiling Infrastructure Based on Digital Continuous Profiling Infrastructure Kernel profiling Kernel profiling No instrumentation No instrumentation Low overhead/tunable overhead Low overhead/tunable overhead
7
7/33 Profiling Setup Two server machines Moderate server: 1660 MHzAthlon XP CPU with 512 MB RAM Powerful server: dual Pentium 4 Xeon 2.4 GHz CPU with 1 GB RAM. Client Machines were more powerful – dual Xeons Client Machines were more powerful – dual Xeons To isolate server performance To isolate server performance 100 Mbps network connectivity, GridFTP server 2.4.3, NeST prerelease Linux kernel 2.4.20, GridFTP server 2.4.3, NeST prerelease
8
8/33 GridFTP Profile Read Rate = 6.45 MBPS, Write Rate = 7.83 MBPS =>Writes to server faster than reads from it
9
9/33 GridFTP Profile Writes to the network more expensive than reads Writes to the network more expensive than reads => Interrupt coalescing
10
10/33 GridFTP Profile IDE reads more expensive than writes
11
11/33 GridFTP Profile File system writes costlier than reads => Need to allocate disk blocks
12
12/33 GridFTP Profile More overhead for writes because of higher transfer rate
13
13/33 GridFTP Profile Summary Writes to the network more expensive than reads Writes to the network more expensive than reads Interrupt coalescing Interrupt coalescing DMA would help DMA would help IDE reads more expensive than writes IDE reads more expensive than writes Tuning the disk elevator algorithm would help Tuning the disk elevator algorithm would help Writing to file system is costlier than reading Writing to file system is costlier than reading Need to allocate disk blocks Need to allocate disk blocks Larger block size would help Larger block size would help
14
14/33 NeST Profile Read Rate = 7.69 MBPS, Write Rate = 5.5 MBPS
15
15/33 NeST Profile Similar trend as GridFTP
16
16/33 NeST Profile More overhead for reads because of higher transfer rate
17
17/33 NeST Profile Meta data updates (space allocation) makes NeST writes more expensive
18
18/33 GridFTP versus NeST GridFTP Read Rate = 6.45 MBPS, write Rate = 7.83 MBPS NeST Read Rate = 7.69 MBPS, write Rate = 5.5 MBPS GridFTP is 16% slower on reads GridFTP I/O block size 1 MB (NeST 64 KB) Non-overlap of disk I/O & network I/O NeST is 30% slower on writes Lots (space reservation/allocation)
19
19/33 Effect of Protocol Parameters Different tunable parameters Different tunable parameters I/O block size I/O block size TCP buffer size TCP buffer size Number of parallel streams Number of parallel streams Number of concurrent transfers Number of concurrent transfers
20
20/33 Read Transfer Rate Read Transfer Rate
21
21/33 Server CPU Load on Read
22
22/33 Write Transfer Rate
23
23/33 Server CPU Load on Write
24
24/33 Transfer Rate and CPU Load
25
25/33 Server CPU Load and L2 DTLB misses
26
26/33 L2 DTLB Misses Parallelism triggers the kernel to use larger page size => lower DTLB miss
27
27/33 Profiles on powerful server Next set of graphs were obtained using the powerful server Next set of graphs were obtained using the powerful server
28
28/33 Parallel Streams versus Concurrency
29
29/33 Effect of File Size (Local Area)
30
30/33 Transfer Rate versus Parallelism in Short Latency (10 ms) Wide Area
31
31/33 Server CPU Utilization
32
32/33 Conclusion Full system profile gives valuable insights Full system profile gives valuable insights Larger I/O block size may lower transfer rate Larger I/O block size may lower transfer rate Network, disk I/O not overlapped Network, disk I/O not overlapped Parallelism may reduce CPU load Parallelism may reduce CPU load May cause kernel to use larger page size May cause kernel to use larger page size Processor feature for variable sized pages would be useful Processor feature for variable sized pages would be useful Operating system support for variable page size would be useful Operating system support for variable page size would be useful Concurrency improves throughput at increased server load Concurrency improves throughput at increased server load
33
33/33 Questions Contact Contact kola@cs.wisc.edu kola@cs.wisc.edu kola@cs.wisc.edu www.cs.wisc.edu/condor/publications.html www.cs.wisc.edu/condor/publications.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.