In-network datacenter computing draft-he-coin-datacenter-00

In-network datacenter computing draft-he-coin-datacenter-00
Jeffrey He, Huawei Rachel Chen, Huawei Marie-José Montpetit, Triangle Video (ed).

Draft Overview Informative Goal:
Review the state of the art, current research and open questions Goal: Initiate a dialogue on the directions of in-network computing in the data-centers both of native networking functions but also for applications

Recent advances in research
In-Network Computing for Data-intensive applications eg, machine learning, graph analysis and map reduce etc a potential traffic reduction ranging from 48% to 93% [SAPIO]. Improvement of a factor of 3.24 in the latency , declining from to ms [Graham]. In-Network Cache for key-value store frontend cache for highly-skewed and rapidly-changing workload throughput can be improved by 3~10 times [JIN] In-Network Computing for high-performance consensus Switches play the role of coordinators and acceptors NetPaxos achieves a maximum throughput of 57,457 messages/s, while basic Paxos is only able to send 6,369 messages/s [DANG15] NetCompute 2018: In-Network Computing workshop in sigcomm2018. Sapio, Amadeo, Abdelaziz Ibrahim, Aldilaijan, Abdulla, Canini, Mario and Kalnis Panos, " In net computing is a dumb idea whose time has come ", Hotnets 2017 In graph analysis, three algorithms with various characteristics have been considered in [SAPIO]: PageRank, Single Source Shortest Path (SSSP) and Weakly Connected Components (WCC) with a commutative and associative aggregation function. Experiment shows that the potential traffic reduction ratio in the three applications is signification. data intensive applications that can be improved in terms of network load by in-network computing include: machine learning, graph analysis, data analytics and map reduce. aggregation functions in the computing hardware provides a reduction of potential network congestion; and the overall application performance is improved. The traffic reduction was shown to range from 48% up to 93% [SAPIO]. Machine learning is a very active research area for in-network computing because of the large datasets it both requires and generates. For example, in TensorFlow [TENSOR], parameters updates are small deltas that only change a subset of the overall tensor and can be aggregated by a vector addition operation. The overlap of the tensor updates, i.e. the portion of tensor elements that are updated by multi workers at the same time, is representative of the possible data reduction achievable when the updates are aggregated inside the network. In graph analysis, three algorithms with various characteristics have been considered in [SAPIO]: PageRank, Single Source Shortest Path (SSSP) and Weakly Connected Components (WCC) with a commutative and associative aggregation function. Experiment shows that the potential traffic reduction ratio in the three applications is signification. Finally, in map-reduce, experiments in the same paper show that after aggregation computing, the number of packets received by the reducer decreases by 88%~90% compared UDP, by 40% compared to using TCP. There is thus great promise for mapReduce-like to take advantage of computing and storage optimization. Michael Alan Chang, Scott Shenker, etc. “ChangNetwork Evolution for DNNs”, SysML, Feb 2018, Palo Alto, California Optimizing the network fabric can improve DNN training time R. L. Graham, P. Lui, etc. Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction. In COM-HPC, 2016. Improvement of a factor of 3.24 in the latency of a 4096 byte MPI_Allreduce() operations, declining from to microseconds. NetCompute 2018:In-Network Computing workshop in sigcomm2018. (this August) [SAPIO] Sapio et al., In net computing is a dumb idea whose time has come, Hotnets 2017 [Graham] Graham et al., Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction. [JIN]Jin et al., “NetCache: Balancing Key-Value Stores with Fast In-Network Caching", SOSP2017 [DANG15] Dang Huynh Tu, et al., NetPaxos: Consensus at Network Speed SOSR15 For more informational references:

Research Topics for COIN in DCs(1)
Fixed-point calculation for current application vs. of floating-point calculation for more complex operations and services network switches typically do not support floating-point calculation. Is it necessary to introduce this capability and scale to the demand? What are the gains brought by aggregation in distributed and decentralized networks? The built-in buffer of the network device is often limited, and, for example the AI parameters can reach hundreds of megabytes. What is the relationship between the depth of packet inspection and not only performance but security and privacy? There is a need to find what application layer cryptography is ready to expose to other layers and even “collaborating nodes”

Research Topics for COIN in DCs (2)
Protocol design and network architecture How can in-network optimization interact (or not) with transport/application layer optimizations? Can the end-to-end assumptions of existing transport like TCP still be applicable in the in-network compute era? There is heritage in middlebox interactions with existing flows.

Next Steps Request comments from the COIN mailing list
Generate a v1 for IETF 104 (Prague) (Eventually) have it adopted as a COIN RG Document Contributors Welcome!

QUESTIONS? Jeffrey.he@huawei.com chenlijuan5@huawei.com

In-network datacenter computing draft-he-coin-datacenter-00

Similar presentations

Presentation on theme: "In-network datacenter computing draft-he-coin-datacenter-00"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

In-network datacenter computing draft-he-coin-datacenter-00

Similar presentations

Presentation on theme: "In-network datacenter computing draft-he-coin-datacenter-00"— Presentation transcript:

Similar presentations

About project

Feedback