湖南大学-信息科学与工程学院-计算机与科学系

Slides:



Advertisements
Similar presentations
SDN + Storage.
Advertisements

Deconstructing Datacenter Packet Transport Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker Stanford University.
Lecture 18: Congestion Control in Data Center Networks 1.
Improving Datacenter Performance and Robustness with Multipath TCP Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik,
PFabric: Minimal Near-Optimal Datacenter Transport Mohammad Alizadeh Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, Scott Shenker.
Copyright © 2005 Department of Computer Science 1 Solving the TCP-incast Problem with Application-Level Scheduling Maxim Podlesny, University of Waterloo.
Defense: Christopher Francis, Rumou duan Data Center TCP (DCTCP) 1.
Portland: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric Offense Kai Chen Shih-Chi Chen.
Information-Agnostic Flow Scheduling for Commodity Data Centers
A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat Presented by Gregory Peaker and Tyler Maclean.
Srikanth Kandula, Sudipta Sengupta, Albert Greenberg, Parveen Patel, Ronnie Chaiken Microsoft Research IMC November, 2009 Abhishek Ray
Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.
Detail: Reducing the Flow Completion Time Tail in Datacenter Networks SIGCOMM PIGGY.
On the Data Path Performance of Leaf-Spine Datacenter Fabrics Mohammad Alizadeh Joint with: Tom Edsall 1.
Demystifying and Controlling the Performance of Big Data Jobs Theophilus Benson Duke University.
Wei Bai with Li Chen, Kai Chen, Dongsu Han, Chen Tian, Hao Wang SING HKUST Information-Agnostic Flow Scheduling for Commodity Data Centers 1 SJTU,
The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
TimeThief: Leveraging Network Variability to Save Datacenter Energy in On-line Data- Intensive Applications Balajee Vamanan (Purdue UIC) Hamza Bin Sohail.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Network Traffic Characteristics of Data Centers in the Wild
Scalable Data Scale #2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries.
Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research.
Data Centers and Cloud Computing 1. 2 Data Centers 3.
MMPTCP: A Multipath Transport Protocol for Data Centres 1 Morteza Kheirkhah University of Edinburgh, UK Ian Wakeman and George Parisis University of Sussex,
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
GENERAL SCALABILITY CONSIDERATIONS
VL2: A Scalable and Flexible Data Center Network
Network Requirements for Resource Disaggregation
Data Center TCP (DCTCP)
Yiting Xia, T. S. Eugene Ng Rice University
CIS 700-5: The Design and Implementation of Cloud Networks
Lecture 2: Cloud Computing
Inside the Social Network’s (Datacenter) Network
Heitor Moraes, Marcos Vieira, Italo Cunha, Dorgival Guedes
Data Center Network Architectures
Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le
Lecture 20: WSC, Datacenters
Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.
Chuanxiong Guo, et al, Microsoft Research Asia, SIGCOMM 2008
Improving Datacenter Performance and Robustness with Multipath TCP
Data Streaming in Computer Networking
Improving Datacenter Performance and Robustness with Multipath TCP
PA an Coordinated Memory Caching for Parallel Jobs
Lecture 11: DMBS Internals
On the Scale and Performance of Cooperative Web Proxy Caching
VDN: Virtual Machine Image Distribution Network for Cloud Data Centers
Scaling Deep Reinforcement Learning to Enable Datacenter-Scale
ECF: an MPTCP Scheduler to Manage Heterogeneous Paths
EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16.
The Tail At Scale Dean and Barroso, CACM 2013, Pages 74-80
Carnegie Mellon University, *Panasas Inc.
CS Lecture 2 Network Performance
Ch 4. The Evolution of Analytic Scalability
NTHU CS5421 Cloud Computing
TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for Online Search Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, and T. N. Vijaykumar.
AMP: A Better Multipath TCP for Data Center Networks
Cross-Layer Optimizations between Network and Compute in Online Services Balajee Vamanan.
Jellyfish: Networking Data Centers Randomly
Internet and Web Simple client-server model
Centralized Arbitration for Data Centers
Hawk: Hybrid Datacenter Scheduling
Lecture 16, Computer Networks (198:552)
Lecture 17, Computer Networks (198:552)
CSE 550 Computer Network Design
Database System Architectures
2019/5/13 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Peng Wang, George Trimponias, Hong Xu,
Elmo Muhammad Shahbaz Lalith Suresh, Jennifer Rexford, Nick Feamster,
Lecture 8, Computer Networks (198:552)
Presentation transcript:

湖南大学-信息科学与工程学院-计算机与科学系 云计算技术 陈果 副教授 湖南大学-信息科学与工程学院-计算机与科学系 邮箱:guochen@hnu.edu.cn 个人主页:1989chenguo.github.io

Course website available! https://1989chenguo.github.io/Courses/CloudComputing2018Spring

Notification to group projects Form group and tell me whether your group wants to give a presentation Group leader should email me and CC all TAs (email address on website) before the deadline. Email title should be 云计算技术2018-项目分组报名-[组长姓名](E.g., 云计算技术2018-项目分组报名-陈果) Email should include: Group member (include leader) information: Name + Class + Student ID Who is group leader Whether to give presentation Deadline: 2018/4/30 11:59 PM DO NOT miss the deadline, otherwise -20 points Do not miss the deadline! Do not miss the deadline! Do not miss the deadline!

What we have learned Clos networks What is cloud computing Definition Architecture Techniques Cloud Networking Physical Structure Scale of Cloud What Cloud Physically Looks Like Data center network topology Clos networks

Strict sense non-blocking Re-arrangeable non-blocking Clos network Non-blocking types Re-arrangeable non-blocking Can route any permutation from inputs to outputs. Strict sense non-blocking Given any current connections through the switch, any unused input can be routed to any unused output. Strict sense non-blocking If k  2n-1 Re-arrangeable non-blocking If k  n Use small, cheap elements to build large capacity-rich networks

What we have learned Fat-tree What is cloud computing Cloud Networking Definition Architecture Techniques Cloud Networking Physical Structure Scale of Cloud What Cloud Physically Looks Like Data center network topology Fat-tree

Applications and network traffic Part I: Cloud networking Applications and network traffic Most materials from UIUC MOOC Ankit Singla ETH Zürich P. Brighten Godfrey UIUC Credits to

[Image: NASA/Goddard/UMBC] Pretty much every popular Web app —> DC But DC also run data analytics which make these work, e.g. search index Such infra also used for “big science” apps like climate modeling (Image: NASA, public use: https://cds.nccs.nasa.gov/tools-services/merra-analytics-service/) massive amounts of data being moved around inside data centers …

How a Web search works Let’s take a slightly closer look at how something like a Web search query works: I make a search query

“Speeding up Distributed Request-Response Workflows”, ACM SIGCOMM’13 How a Web search works Let’s take a slightly closer look at how something like a Web search query works: I make a search query “Speeding up Distributed Request-Response Workflows”, ACM SIGCOMM’13

How a Web search works It hits a server in a data center This server might query several other servers, which further might communicate with … - Image: free use (https://pixabay.com/en/datacenter-servers-computers-286386/)

Scatter-gather traffic pattern How a Web search works Scatter-gather traffic pattern These responses are then collated, and the final search response page sent to me This kind of traffic pattern is referred to as “scatter-gather”, or “partition-aggregate” For one query, there might be a large number of server-server interactions within the data center Extremely tight deadlines … (poor result quality on misses) And really, this illustration is very very simplified … - Image: free use (https://pixabay.com/en/datacenter-servers-computers-286386/) Extremely short response deadlines for each server — 10ms

“Up to 150 stages, degree of 40, path lengths of 10 or more” Request Scatter this is what Bing’s query workflow for producing the first page results looks like From the request at the top to the response at the bottom, there may be several stages (like getting search results, making snippets, ads, spellcheck, etc.), with a degree of up to 40 in a stage. Each stage is internally complex! This one stage, queries 1000s of servers to produce the search results in a partition-aggregate manner. Not exclusive to search. FB also sees several internal reqs … Gather Response Image source: Talk on “Speeding up Distributed Request-Response Workflows” by Virajith Jalaparti at ACM SIGCOMM’13

Other Web application traffic Facebook: loading one of their popular pages causes avg. of 521 … For that page 95-th percentile causes 1740 items!

Other Web application traffic Facebook: loading one of their popular pages causes avg. of 521 … For that page 95-th percentile causes 1740 items! One popular page loaded ⇒ average of 521 distinct memcache fetches 95th percentile: 1740 distinct memcache fetches

… Big data analytics Hadoop Spark Storm Database joins Further, many applications move large amounts of data straggling jobs block the whole app and impact result quality So what does actual measured traffic in these facilities look like? …

What does data center traffic look like? It depends No “representative” dataset is available But let’s take a look at some of the published data It depends … on applications, scale, network design, …

Traffic characteristics: growing volume both Web workloads and data analytics drive internal traffic rapidly growing: doubling roughly every year (Google paper isn’t clear if this is internal traffic only) FB: machine-machine traffic inside doubling faster than every year “Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network”, Arjun Singh et al. @ Google, ACM SIGCOMM’15

Traffic characteristics: growing volume both Web workloads and data analytics drive internal traffic rapidly growing: doubling roughly every year (Google paper isn’t clear if this is internal traffic only) FB: machine-machine traffic inside doubling faster than every year Facebook: “machine to machine” traffic is several orders of magnitude larger than what goes out to the Internet “Introducing data center fabric, the next-generation Facebook data center”, @ Facebook, 2014 Facebook official blog

Traffic characteristics: rack locality “Inside the Social Network’s (Datacenter) Network” Arjun Roy et al., ACM SIGCOMM’15 Facebook all of Facebook’s machines during a 24-hour period in January 2015 13% of all traffic is rack local; 58% is cluster-local but not rack, etc. Interestingly, inter data center traffic exceeds rack-local traffic! In this DC, Hadoop is the largest single driver of traffic Data from one Google cluster: different granularity (rack < block < cluster) Block-local traffic is small For data availability, they spread storage blocks around non-fate-sharing devices (e.g. power for one server block may be fate-sharing) “Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network” Arjun Singh et al., ACM SIGCOMM’15 Google

Traffic characteristics: rack locality Benson et al. evaluated 3 university clusters, 2 private enterprise networks, and 5 commercial cloud networks univ, prv, have few hundred to 2000 servers each CLD have 10-15k each CLD 1-3 many apps (Web, mail, etc.) but CLD 4-5 more Mapreduce style apps much higher rack locality here! 70%+ for the CLD4-5, which are supposedly Mapreduce style many possible reasons for difference: workload differences (not all MR jobs are the same!) different ways of organizing storage and compute 5 year gap: perhaps people are just doing things differently now; or maybe app sizes have grown so substantially … different scale Rack-local traffic “Network Traffic Characteristics of Data Centers in the Wild” Theophilus Benson et al., ACM IMC’10

Traffic characteristics: concurrent flows “Web servers and cache hosts have 100s to 1000s of concurrent connections” “Inside the Social Network’s (Datacenter) Network” Arjun Roy et al., ACM SIGCOMM’15 Facebook “Hadoop nodes have approximately 25 concurrent connections on average.” FB: 100s-1000s (paper also notes, grouping by destination host doesn’t reduce these numbers by more than a factor of two) Very different number for Hadoop hosts Another measurement of a 1500 server cluster running MR style jobs found only 2-4 destinations / server Lessons: (a) differences across applications; (b) even MR jobs are not created all the same “The Nature of Datacenter Traffic: Measurements & Analysis” Srikanth Kandula et al. (Microsoft Research), ACM IMC’09 1500 server cluster @ ?? “median numbers of correspondents for a server are two (other) servers within its rack and four servers outside the rack” “Data Center TCP (DCTCP)” Mohammad Alizadeh et al., ACM SIGCOMM’10 Microsoft web search

Traffic characteristics: flow arrival rate “median inter-arrival times of approximately 2ms” “Inside the Social Network’s (Datacenter) Network” Arjun Roy et al., ACM SIGCOMM’15 Facebook at both Hadoop and Web servers, FB reports flow inter-arrival times at a server of around 2ms in the median The mystery cluster has inter arrival times in the tens of milliseconds Note: with a 1000 server cluster, overall arrival time would be in the microseconds. “The Nature of Datacenter Traffic: Measurements & Analysis” Srikanth Kandula et al. (Microsoft Research), ACM IMC’09 1500 server cluster @ ?? < 0.1x Facebook’s rate

Traffic characteristics: flow sizes Hadoop: median flow <1KB <5% exceed 1MB or 100sec “Inside the Social Network’s (Datacenter) Network” Arjun Roy et al., ACM SIGCOMM’15 Facebook Most Hadoop flows are very small For caching, long-lived flows, but only transmit data burstily Heavy hitters are not too much larger than median flow rate, not persistent, i.e., present instantaneous h.h. not heavy soon after Here there’s some agreement between data sets, but the news is largely negative: TE difficult Some part of it (at least for Web / caching servers) is that app-level load balancing is doing the work (heavy hitters similar to median) Caching: most flows are long-lived … but bursty internally Heavy-hitters ≈ median flow, not persistent “The Nature of Datacenter Traffic: Measurements & Analysis” Srikanth Kandula et al. (Microsoft Research), ACM IMC’09 1500 server cluster @ ?? > 80% of the flows last <10sec > 50% bytes are in flows lasting less <25sec

Traffic characteristics: flow sizes Most Hadoop flows are very small For caching, long-lived flows, but only transmit data burstily Heavy hitters are not too much larger than median flow rate, not persistent, i.e., present instantaneous h.h. not heavy soon after Here there’s some agreement between data sets, but the news is largely negative: TE difficult Some part of it (at least for Web / caching servers) is that app-level load balancing is doing the work (heavy hitters similar to median) Fig. from “MQECN”, USENIX NSDI’16 Web search Data mining Cache, Hadoop “DCTCP”, ACM SIGCOMM’10 “VL2”, ACM SIGCOMM’09 “Inside Facebook DCN”, ACM SIGCOMM’15

What does data center traffic look like? so where do we go from here? well, there are some conclusions we can draw from the nature of data center applications and by points of agreement across the measurements. It depends … on applications, scale, network design, … … and right now, not a whole lot of data is available.

Implications for networking Data center internal traffic is BIG 1 We’ll look at some of these in more detail. Tight deadlines for network I/O 2 Congestion and TCP incast 3 Need for isolation across applications 4 Centralized control at the flow level may be difficult 5

Implications for networking Data center internal traffic is BIG 1 Need high-throughput intra-DC network this growth is driving the need for big, high capacity DCNs need to do this cheaply, scalably, in a fault tolerant manner want high capacity network design, efficient routing “Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network”, Arjun Singh et al. @ Google, ACM SIGCOMM’15 “Introducing data center fabric, the next-generation Facebook data center”, @ Facebook, 2014 Facebook official blog

Implications for networking Tight deadlines for network I/O 2 Applications like Web search impose tight latency requirements

Implications for networking Tight deadlines for network I/O 2 Suppose: server response-time is 10ms for 99% of requests; 1s for 1% Internal deadlines of ~10ms are common, including the application logic! Network envelope is small! Can’t afford excessive queuing delays Further, tail latency really matters … To use an example from Google’s Jeff Dean … blah blah Given what we noted earlier about each request generating … 100 requests internally is not that large, and obviously, problem gets worse with #requests #Servers Requests 1s or slower 1 1% 100 63% Measured by me, at Microsoft production DC, 2015 10ms (99th) 330us (50th) Need to reduce variability and tolerate some variation

Implications for networking Congestion and TCP incast 3 large numbers of flows sharing bandwidth scatter-gather also creates incast … (more detail in later lesson) long queues increase latencies and the variance various app-layer fixes, but ultimately complicate app logic TCP does not work very well

Implications for networking Complex network shared by applications 4 Isolation across applications possibly multiple tenants (in cloud setting) Applications with different objectives sharing the network

Implications for networking Centralized control at the flow level may be difficult 5 large flow rates with short flows very hard to scale any type of centralized flow control Distributed control, perhaps with some centralized tinkering

Reading materials for group projects “Inside the Social Network's (Datacenter) Network”, SIGCOMM 2015 “The Nature of Datacenter Traffic: Measurements & Analysis”, IMC 2009 “Network traffic characteristics of data centers in the wild”, IMC 2010 (dataset partially available) “Scaling Memcache at Facebook”, NSDI 2013 “Speeding up Distributed Request-Response Workflows”, SIGCOMM 2013 “Profiling Network Performance for Multi-Tier Data Center Applications”, NSDI 2011

湖南大学-信息科学与工程学院-计算机与科学系 Thanks! 陈果 副教授 湖南大学-信息科学与工程学院-计算机与科学系 邮箱:guochen@hnu.edu.cn 个人主页:1989chenguo.github.io