Research Opportunities in IP Wide Area Storage George Porter Li Yin Department of EECS U.C. Berkeley 12/26/2018 SAHARA Retreat
Outline Trends Challenges for wide-area storage Programmability inside networks Common techniques to hide latency Functionality that will benefit applications Network Support for that functionality Reconsidering the programmability model and application space Feedback
Outline Trends Challenges for wide-area storage Programmability inside networks Common techniques to hide latency Functionality that will benefit applications Network Support for that functionality Reconsidering the programmability model and application space Feedback
Storage Wide-Area Networking Metro Area SAN technology works well in relatively small area (metro-wide) There is a desire to implement storage applications in the wide-area Comparable performance as small area storage applications
Outline Trends Challenges for wide-area storage Programmability inside networks Common techniques to hide latency Functionality that will benefit applications Network Support for that functionality Reconsidering the programmability model and application space Feedback
Challenges in Wide-Area Storage Speed of light is constant Long distance implies propagation delay Network dynamics Variation of cross traffic load Routes changes Increasing storage capacity Transmit huge amount of data across wide-area
Challenges in Wide-Area Storage Simple operation: Host writes data to the remote target disk Send data to the remote target Write disk Where is the bottleneck? Disk? Network Link? Distance? Performance Degradation Local Disk Write Operation Time Extra Delay Caused by Network Local Operation Time Extra Delay Caused By the Network Time
Challenges in Wide-Area Storage Three Cases: Case 1: Limited link bandwidth Case 2: Small data set with high bandwidth Case 3: Large data set with high bandwidth
Challenges in Wide-Area Storage Case 1: Limited link bandwidth Transmission Time Local Disk Write Operation Time More data to be transmitted Time Time Extra Delay Caused By the Network
Challenges in Wide-Area Storage Case 1: Limited link bandwidth As more data to be transmitted: The performance degradation caused by the transmission delay gets larger Propagation delay does not matter As the disk getting faster, more bandwidth is required to shift the bottleneck away from the network
Challenges in Wide-Area Storage Case 2: Small data set with high link bandwidth In this case, the throughput is very sensitive to the distance, especially when it becomes of the order of the disk latency Local Disk Write Operation Time Larger Distance Time Time
Challenges in Wide-Area Storage Case 3: Large data set with high link bandwidth Local Disk Write Operation Time More Data to be Transmitted Time Time Time Extra Delay Caused By the Network
Challenges in Wide-Area Storage Case 3: Large data set with high link bandwidth In this case, disk is the bottleneck, the network only introduces the propagation delay which can be ignored as more data to be transmitted As the disk getting faster, more bandwidth is required to shift the bottleneck away from the network
Challenges in Wide-Area Storage Where is the bottleneck? Link Bandwidth Size of data to be transmitted Disk Speed Key issue in the wide-area storage is how to reduce the latency Latency introduced by the network Latency introduced by the storage
Outline Trends Challenges for wide-area storage Programmability inside networks Common techniques to hide latency Functionality that will benefit applications Network Support for that functionality Reconsidering the programmability model and application space Feedback
Common Techniques to Hide Latency Caching Parallelism Pipelining Prefetching … Where and how to implement these techniques for wide-area storage applications?
Code at edge –vs- in the fabric Location of data separated from use of data Idea is to put processing near the data it acts on Better visibility into network conditions, dynamics Big performance gains if we can act on streams of data in the datapath Network processors are more powerful today A good match?
Outline Trends Challenges for wide-area storage Programmability inside networks Common techniques to hide latency Functionality that will benefit applications Network Support for that functionality Reconsidering the programmability model and application space Feedback
Gather Digital animation editing Large dataset visualization Synchronous Asynchronous N-to-1 disk copies (KaZaa) Recreate dataset from multiple sources/disks (scientific experiment) Restore backup Digital animation editing Large dataset visualization
Gather Techniques Network Primitives Digital animation editing Caching Parallelism Prefetching Network Primitives FS semantic information Store block location state in router View into network routes/conditions Table lookup in router Modify disk requests to point to correct locations Join data streams to deliver coherent data to app Orthogonal path selection Synchronous Digital animation editing Large dataset visualization
Gather Digital animation editing Large dataset visualization Synchronous Asynchronous N-to-1 disk copies (KaZaa) Recreate dataset from multiple sources/disks (scientific experiment) Restore backup Digital animation editing Large dataset visualization
Gather Techniques Network Primitives Join data streams to deliver coherent data to app Orthogonal path selection Volume state in routers Replicate SCSI requests Reorder SCSI responses Techniques Pipelining Avoid congestion/optimize for bandwidth Network Primitives FS semantic information Store block location state in router View into network routes/conditions Table lookup in router Modify disk requests to point to correct locations Asynchronous N-to-1 disk copies (KaZaa) Recreate dataset from multiple sources/disks (scientific experiment) Restore backup
Scatter State dissemination Updating mapping tables Synchronous Asynchronous Disaster-recovery application Experimental data unloading State dissemination CDN/web server updating? Gaming? Updating mapping tables
Scatter Techniques Network Primitives State dissemination Delay-sensitive path selection Congestion avoidance Synchronization Network Primitives Network monitoring FS semantic information Store block location state in router View into network routes/conditions Table lookup in router … Synchronous State dissemination CDN/web server updating? Gaming? Updating mapping tables
Scatter State dissemination Updating mapping tables Synchronous Asynchronous Disaster-recovery application Experimental data unloading State dissemination CDN/web server updating? Gaming? Updating mapping tables
Scatter Techniques Network Primitives Disk location/selection Load balancing Physical distance knowledge Network Primitives Network monitoring FS semantic information Store block location state in router View into network routes/conditions Table lookup in router … Asynchronous Disaster-recovery application Experimental data unloading
Outline Trends Challenges for wide-area storage Programmability inside networks Common techniques to hide latency Functionality that will benefit applications Network Support for that functionality Reconsidering the programmability model and application space Feedback
Useful Network Primitives What is reasonable and possible? FS semantic information Store block location state in router View into network routes/conditions Table lookup in router Modify disk requests to point to correct locations Join data streams to deliver coherent data to app Orthogonal path selection Volume state in routers Replicate SCSI requests Reorder SCSI responses Others?
Outline Trends Challenges for wide-area storage Programmability inside networks Common techniques to hide latency Functionality that will benefit applications Network Support for that functionality Reconsidering the programmability model and application space Feedback
Your Feedback? 12/26/2018 SAHARA Retreat