Download presentation
Presentation is loading. Please wait.
PublishCharity Pierce Modified over 8 years ago
1
Samuel Wood Manikandan Punniyakotti Supervisors: Brad Smith, Katia Obraczka, JJ Garcia-Luna-Aceves http://gnet.soe.ucsc.edu
2
As genome sequencing becomes cheaper and more frequent researchers need quick methods of transferring large datasets over long distances for collaboration 1000 Genomes Project Data transfers of dozens of genomes between data centers on opposite sides of the United States are daily occurrences Examine genomic data transfer between hosts on a high speed network such as Internet2, with long round trip times (RTTs) These are called Long Fat Network (LFN) due to the large bandwidth delay product
3
TCP is popular for unicast communications and has been packaged with commodity operating systems and networking APIs. But for long-haul high bandwidth networks (Long Fat Networks) commodity TCP has been found to be less suitable because, (i) TCP’s conservative congestion control mechanisms reduce the throughput heavily when there are errors (ii) Reliability through ACKS and retransmissions and hence the latency of a packet recovery is at least an RTT (iii) Huge buffers at the end hosts to fully utilize the capacity Solutions: - Tuning the TCP parameters at the end-hosts - Using better congestion control algorithms - Using sophisticated data transfer tools
4
Provide a set of guidelines to end hosts of a large genomic data transfer that will reduce the total transmission duration (assuming an immutable intermediate network) Secondary goals include providing secure encryption and network fairness
5
Refers to adjusting the TCP parameters in the kernel Most applications do not try to understand the network =>TCP auto-tuning with pre-configured limits Some default values not optimized for LFNs Fasterdata: changes in the TCP kernel settings /etc/sysctl.conf to improve TCP auto-tuning
6
Fasterdata - knowledge base for network administrators transferring large datasets over LFNs - is part of Esnet SpeedGuide – an online Broadband Internet performance guide
7
ParametersMeaning net.core.rmem_maxMaximum OS receive buffer size for all types of connections net.core.wmem_maxMaximum OS send buffer size for all types of connections net.ipv4.tcp_rmemMemory reserved for TCP receive buffers (per connection default) net.ipv4.tcp_wmemMemory reserved for TCP send buffers (per connection default) net.ipv4.tcp_congestion_controlPluggable congestion control algorithms net.ipv4.tcp_sackSelective acknowledgement net.ipv4.tcp_window_scalingSupport for large TCP Windows
8
PerfSONAR -infrastructure for network performance monitoring solve end-to-end performance problems on paths crossing several networks Includes several network monitoring tools: BWCTL (Bandwidth Test Controller) that can use Iperf OWAMP (One Way Ping) NDT (Network Diagnostic Tool) ping traceroute
10
Dummynet link emulator tool (ipfw) run experiments in user-configurable network environments Simulates/enforces queue and bandwidth limitations, delays, packet losses, and multipath effects
11
Tools like SCP and SFTP don’t work well in LFNs No parallel streams Assume a LAN Faster Data Transfer (FDT) GridFTP paraFetch
12
UCSC to.. (Or) full-factorial experiment* (i) solution application (GridFTP, paraFetch, FDT) (ii) number of parallel streams (iii) host TCP settings (iv) with or without encryption (v) memory-to-memory versus disk-to-disk transfers. RTT, packet loss, bandwidth and no. of hops remain unchanged within each scenario CasesGenome Datacenter siteRTTNo. of hops ABaylor College of Medicine in Houston, Texas~ 42ms9 BBroad Institute in Massachusetts ~92ms16
13
Latency10ms40ms60ms80ms100ms Packet Loss0 pkt/sec1 pkt/sec2 pkt/sec3pkt/sec4pkt/sec Bandwidth100Mbps250Mbps500Mbps750Mbps1000Mbps Jitter0(1/3)Latency(1/2)Latency(2/3)LatencyLatency
19
Disk-Disk Mem-Mem
20
Disk-Disk Mem-Mem
21
Disk-DiskMem-Mem
22
TCP sequence graphs to better understand the performance differences Content-specific compression to reduce the transmission duration Other tools like ASPERA and UDP Blasters Changes in network infrastructure Virtual circuits Proxies Multipath
23
ESnet (2011) SpeedGuide.net (2011) Ha, S., Rhee, I. & Xu, L. (2008), `Cubic: a new tcp-friendly high-speed tcp variant', SIGOPS Oper. Syst. Rev. Mathis, M., Heffner, J. & Reddy, R. (2003), `Web100: extended tcp instrumentation for research, education and diagnosis', SIGCOMM Comput. Commun. Rev. Wang, C. & Zhang, D. (2011), `A novel compression tool for efficient storage of genome resequencing data', Nucleic Acids Research.
24
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.