Presentation is loading. Please wait.

Presentation is loading. Please wait.

Netslice: Enabling Critical Network Infrastructure with Commodity Routers Prof. Hakim Weatherspoon, Cornell University Joint with Tudor Marian, Ki Suh.

Similar presentations


Presentation on theme: "Netslice: Enabling Critical Network Infrastructure with Commodity Routers Prof. Hakim Weatherspoon, Cornell University Joint with Tudor Marian, Ki Suh."— Presentation transcript:

1 Netslice: Enabling Critical Network Infrastructure with Commodity Routers Prof. Hakim Weatherspoon, Cornell University Joint with Tudor Marian, Ki Suh Lee TRUST Autumn 2010 Conference, Stanford University November 10, 2010

2 Commodity Datacenters  Datacenters are becoming a commodity  Unit of replacement  Datacenter in a box: already set up with commodity hardware & software (Intel, Linux, petabyte of storage)  Plug network, power & cooling and turn on  typically connected via optical fiber  may have network of such datacenters

3 Commodity Datacenters Titan tech boom, randy katz, 2008 311/10/2010Critical Network Infrastructure, by Hakim Weatherspoon

4 IBM Visit, Critical Infrastructure, by Hakim Weatherspoon Network Of Globally Distributed Datacenters  Cloud Computing—Datacenters interconnected via fiber  Long Fat Networks (LFN) or λ -networks  Packet processors and extensible routers—middleboxes  Increase functionality, performance, reliability, and security of network  E.g. DPI, IDS, PEP, protocol accelerators, overlay routers, multimedia servers, security appliances, and network monitors, etc packet processor middleboxes packet processor middleboxes  11/10/2010 4

5 Network Of Globally Distributed Datacenters  Packet processors  Maelstrom [NSDI’08,TONS’10], SMFS [FAST’09]  FEC,TCP-Split, De-duplication, Network-sync  OS abstractions for packet processing in user-space  Netslice  Lambda networks  Cornell NLR Rings [DSN10]  SDNA/BiFocals [IMC’10] Maelstrom Netslice SDNA/Bifocals TCP-Split SMFS Cornell NLR Rings 5Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

6 Network Of Globally Distributed Datacenters  Packet processors  Maelstrom [NSDI’08,TONS’10], SMFS [FAST’09]  FEC,TCP-Split, De-duplication, Network-sync  OS abstractions for packet processing in user-space  Netslice  Lambda networks  Cornell NLR Rings [DSN10]  SDNA/BiFocals [IMC’10] 6Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

7 Challenges  Large traffic volume processed / second (10Gb/s)  Typical packet processors realized in hardware  Trade off flexibility, programmability for speed  Goal: Improve datacenter communication  Packet processors and extensible routers  Abundance of commodity servers readily available Commodity hardware Software Proprietary specialized hardware 7Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

8 Takeaway  Raw socket cannot take advantage of multicore/multiqueue  Need new OS abstraction to take advantage of parallelism 9.7G 2.25G 8Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

9 Outline—Packet Processing Abstractions  The case for user-space packet processors  What’s wrong with the raw socket in multicore environment?  Hardware and software overheads  Contention and lack of application control of resources  Need new OS abstraction to take advantage of parallelism  Netslice  Evaluate  Group intruduction  Conclude 9Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

10 The Case Against Low-level Packet Processors  Idiosyncrasies of memory allocator  Small virtual address spaces  Inability to swap out pages  Limit on contiguous memory chunks  Execution contexts and preemptive precedence  Interrupt, bottom half, task/user context  Synchronization primitives  Tightly coupled with execution contexts (e.g. can I block?)  Lack of development tools  Lack of fault isolation  A bug in the kernel is lethal Hardware Operating System Kernel Network Stack Application User-space Application 10Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

11  Opportunity: exploit hardware parallelism  Overheads  Contention, contention, contention!  Memory wall  OS Design Overheads  System calls  Context switches  Scheduling  Blocking High-level Packet Processors: Where Have All My Cycles Gone? CPU Memory 11Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

12 Contention: Amdahl’s Law  Bounds maximum expected parallelism speedup  Fraction P of a program parallelized to run on N CPUs  The speedup is Serial fraction (1-P) Parallel fraction (P) Program: 1 Serial execution + Parallel execution = 12Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

13 Contention: Cache-coherent Architecture  Effects of cache coherency & memory accesses  Cores read and write blocks of data concurrently  Commodity system: Xeon X533 with 4MB L2 cache 13Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

14 Contention: Peripheral NIC  Slow cores fast network interface cards (NICs)  More cores exhibit contention and overheads  Hardware transmit/receive multi-queue support tx/rx queues 14Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

15 Software overheads in the Conventional Network Stack  Raw socket: all traffic from all NICs to user-space  Hardware and software are loosely coupled  Applications have no (end-to-end) control over resources tx/rx queues Network Stack Application Raw socket Network Stack Network Stack Network Stack Network Stack Network Stack Network Stack Network Stack Network Stack Application Network Stack 15Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

16 Software overheads in the Conventional Network Stack  API too general, hence complex network stack too bloated  Raw sockets, end-point sockets, files: all the same  Path taken by a packet is unnecessarily expensive  Hides information from applications  Limited functionality: least common denominator API  Inefficient API: issue one system call per packet Network Stack Application Raw socket Application TCP socket UDP socket AF_UNIX socket Application File API TCP socket Application File access 16Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

17 Netslice—user-space packet processor  Give power to the application  Packet processing in user-space  Four-pronged approach (high level)  Contention prevention  Spatially partition hardware  End-to-end control  Provide fine-grained control over hardware  Streamline path for packets  Export a rich, efficient, backwards compatible API Hardware Operating System Kernel Network Stack Netslice Application User-space Application 17Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

18 Netslice Spatial Partitioning  Contention prevention  Independent (parallel) execution contexts  Split each Network Interface Controller (NIC)  One NIC queue per NIC per context  Group and split the CPU cores  Implicit resources (bus and memory bandwidth) Temporal partitioning (time-sharing) Spatial partitioning (exclusive-access) 18Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

19 Netslice Spatial Partitioning Example  2x quad core Intel Xeon X5570 (Nehalem)  Two simultaneous hyperthreads – OS sees 16 CPUs  Non Uniform Memory Access (NUMA)  QuickPath point-to-point interconnect  Shared L3 cache 19Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

20 Fine-grained Hardware Control  End-to-end control  App controls NIC queue and CPU slice allocation  NIC hardware interrupt routing & NIC queue  Kernel execution context  User-space execution context  Tight coupling of software and hardware components 20Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

21 Streamlined Path for Packets  Inefficient conventional network stack  One network stack “to rule them all”  Performs too many memory accesses  Pollutes cache, context switches, synchronization, system calls, blocking API 21Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

22 Netslice API  Expresses fine-grained hardware control  Flexible: based on ioctl  ioctl(fd, NETSLICE_CPUMASK_GET, &mask);  sched_setaffinity(getpid(), sizeof(cpu_set_t), &mask.u_peer);  Backwards compatible (read/write)  fd = open("/dev/netslice-1", O_RDWR);  read(fd, iov, IOVS)  Efficient: batch send/receive (read/write)  Amortize overhead of protection domain crossing 22Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

23 Experimental Setup  R710 packet processors  dual socket quad core 2.93GHz Xeon X5570 (Nehalem)  8MB of shared L3 cache and 12GB of RAM  6GB connected to each of the two CPU sockets  Two Myri-10G NICs  R900 client end-hosts  four socket 2.40GHz Xeon E7330 (Penryn)  6MB of L2 cache and 32GB of RAM 23Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

24 Netslice Evaluation  Compare against state-of-the-art  RouteBricks in-kernel, Click & pcap-mmap user-space  Additional baseline scenario  All traffic through single NIC queue (receive-livelock)  What is the basic forwarding performance?  How efficient is the streamlining of one Netslice?  What is the benefit of batching?  How is Netslice scaling with the number of cores?  Can build high-speed complex packet processors? 24Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

25 Simple Packet Routing  End-to-end throughput, MTU (1500 byte) packets  Error bars (always present) denote standard error of mean 9.7G 2.25G 7.6G 7.5G 5.6G 74% of kernel 1/11 of Netslice 25Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

26 Single Netslice user and kernel context CPU placement  There are several choices 26Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

27 Linear Scaling with CPUs # of CPUs used IPsec with 128 bit key—typically used by VPN – AES encryption in Cipher-block Chaining mode 9.1G 8G 27Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

28 Netslice Implementation of Maelstrom  In-kernel reference version: 8432 lines of C  Netslice version: 1197 lines of user-space C  Forwarding throughput: 8952.04±37.25 Mbps  Maelstrom/Netslice goodput: 6993.69±35.7 Mbps  27.27% FEC overhead (for r=8, c=3)  6993.69Mbps × (1 + c ⁄ (r+c)) = 8901 Mbps 28Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

29 10Gbps and beyond  Netslice  Flexible API and spatial partitioning  Nehalem CPUs  FSB not the bottleneck any longer  Multiqueue NICs  Each core carves a private slice of every NIC  Batching  Userspace multi-read / multi-write instead of ossified conventional read / write  Traditional tricks  Pin down memory, minimize LLC contention, etc. 29Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

30 Conclusion  Network layer is fundamental to Datacenter operations  Packet processors enhance network functionality and performance  Improve network performance with software packet processors running on commodity servers in userspace  OS support to build packet processing applications  Harness implicit parallelism of modern hardware to scale  Solution completely portable; kernel module load at runtime 30Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

31 Paper Trail Theme: “Datacenter Middleboxes”  BiFocals/SDNA in IMC-2010  NLR study in DSN-2010  SMFS in FAST-2009  Maelstrom (FEC) in TONS-2010 and NSDI-2008  FWP, NSDI-2008 Poster Session  More at http://fireless.cs.cornell.edu 31Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010

32 Questions 32Critical Network Infrastructure, by Hakim Weatherspoon11/10/2010


Download ppt "Netslice: Enabling Critical Network Infrastructure with Commodity Routers Prof. Hakim Weatherspoon, Cornell University Joint with Tudor Marian, Ki Suh."

Similar presentations


Ads by Google