CSE 124 Networked Services Fall 2009 B. S. Manoj, Ph.D 10/27/20091CSE 124 Networked Services Fall 2009 Some.

CSE 124 Networked Services Fall 2009 B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa09/cse124 10/27/20091CSE 124 Networked Services Fall 2009 Some of these slides are adapted from various sources/individuals including but not limited to the slides from the IEEE/ACM digital libraries. Use of these slides other than for pedagogical purpose for CSE 124, may require explicit permissions from the respective sources.

Announcements First Paper Discussion – Discussion on 29 th October – Write-up due on: 28 th October Midterm: November 5 Programming Project 2 – Network Services Innovation Project 10/27/20092CSE 124 Networked Services Fall 2009

Where is the overhead? TCP was suspected of being too complex – In 1989, Clarke, Jacobson and others proved otherwise The complexity (overhead) lies in – Computing environment where TCP operates Interrupts OS scheduling Buffering Data movement Simple solutions that improves performance – Interrupt moderation NIC waits for multiple packets and notify the processor once Amortize the high cost of interrupts – Checksum offload Checksum calculation in processor is costly Offload checksum calculation to NIC (in hardware) – Large Segment offload Segment large chunks of data to smaller segments is expensive Offload segmentation and TCP/IP header preparation to NIC Useful for sender-side TCP – Can support upto ~1Gbps PHYs 10/27/2009CSE 124 Networked Services Fall 20093

Challenges in detail OS issues – Interrupts Interrupt moderation Polling Hybrid interrupts Memory – Latency Memory is slower than processor – Poor cache locality New Data entering from NIC or application Cache miss and CPU stall is common Buffering and copying – Usually two copies required Application to TCP copy and TCP to NIC copy – Receive side: Copy can be reduced to one if posted buffers are provided by application Mostly two copy required – Transmit side: Zero copy on Transmit (DMA from Application to NIC) can help Implemented on selected systems 10/27/2009CSE 124 Networked Services Fall 20094

TCP/IP Acceleration Methods Three main strategies – TCP Offloading Engine (TOE) – TCP Onloading – Stack and NIC enhancements TCP Offloading Engine – Offload TCP/IP processing to devices attached to the server’s I/O system – Use separate processing and memory resources – Pros Improves throughput and utilization performance Useful for bulk data transfer such as IP-storage Good for few connections with high bandwidth links – Cons May not scale well to large number of connections Needs special processors (expensive) Needs high memory in NIC (expensive) Store and forward in ToE is suitable only for large transfers – Latency between I/O subsystem and main memory is high Expensive TOEs or NICs are required 10/27/2009CSE 124 Networked Services Fall 20095 Processor Cache memory NIC device TCP Offload Engine

TCP onloading Dedicate TCP/IP processing to one or more general purpose cores – high performance – Cheap – Main memory to CPU latency is small Extensible – Programming tools and implementations exist – Good for long term performance Scalable – Good for large number of flows 10/27/2009CSE 124 Networked Services Fall 20096 Cache memory NIC device Core 1 (Applic ation) Core 2 (Applic ation) Core 3 (TCP/IP Processing or onloading) Core 0 (Applic ation)

Stack and NIC enhancements Asynchronous I/O – Asynchronous call backs on data arrival – Pre-posting buffers by application to avoid copying Header Splitting – Splitting headers and data – Better data pre-fetching – NIC can place the header Receive-side scaling – Using multiple cores to achieve connection level parallelism – Have multiple Queues in NIC – Map each queue to mapped to a different processor 10/27/2009CSE 124 Networked Services Fall 20097

Giant scale TCP (contd..) TCP for giant scale services faces – High bandwidth, high RTT – High bandwidth, low RTT High bandwidth, high RTT – Wide area links between data centers of the same organization – RTT greater than 200 ms – Bandwidth varies in 100Mbps to 1 Gbps (even 10Gbps) – Solution for TCP evolution Transmission rate algorithm Congestion control approach Loss sensitivity Fairness TCP friendliness 10/27/2009CSE 124 Networked Services Fall 20098

High bandwidth, low RTT – Communication between Data center tiers – Front-end servers and back-end databases – Very high bandwidth (>1Gbps or 10Gbps) – Very low RTT (10µs/ tier or with <100µs for multi- tier connections) – Main solution focus OS and software efficiency Tendency to implement everything in hardware 10/27/2009CSE 124 Networked Services Fall 20099 Giant scale TCP (contd..)

Case Study: Intel ETA architecture Embedded Transport Acceleration (ETA) Partitions system (cores) into – Host – Packet Processing Engine (PPE) – 1 or more cores for PPE Host-PPE Interface – Direct Transport Interface (DTI) – Supports Socket commands connect, listen, and accept DTI consists of – Queues for control, data, and synchronization 10/27/2009CSE 124 Networked Services Fall 200910

Improving Memory Challenges Light weight threading Direct Cache Access Asynchronous Memory copies 10/27/2009CSE 124 Networked Services Fall 200911

Light weight Threading 10/27/2009CSE 124 Networked Services Fall 200912

Direct Cache Access 10/27/2009CSE 124 Networked Services Fall 200913

Asynchronous Memory Copies 10/27/2009CSE 124 Networked Services Fall 200914

Efficient processing choices Processing choices vary with the type of data to be processed – Headers, data structures, or payload 10/27/2009CSE 124 Networked Services Fall 200915

Memory Aware Reference Architecture (MARS) for efficient TCP/IP processing 10/27/2009CSE 124 Networked Services Fall 200916

MARS performance 10/27/2009CSE 124 Networked Services Fall 200917

Desired features for TCP for giant scale services Scalability – Scale bandwidth share to high rates at moderate loss rates (e.g., 10 Gbps around 3.5e-8 loss rates) RTT fairness: – Throughput fairness is proportional to the RTT ratio TCP friendliness: – Bounded TCP fairness for all window sizes. At high and low loss rates Convergence – Faster convergence to a fair share. 10/27/2009CSE 124 Networked Services Fall 200918

TCP response function 10/27/2009CSE 124 Networked Services Fall 200919

Scalable TCP Traditional (Reno) An ACK arrives in avoidance phase When congestion detected RTT=200ms, MSS=1500B, BW=1Gbps, cwnd =17000pkts When congestion: cwnd=8500pkts (.5Gbps) To retain 1Gbps, it can take 8500 RTTs=8500x200ms=28 minutes Scalable TCP When an ACK arrives in avoidance phase When congestion detected RTT=200ms, MSS=1500B, BW=1Gbps, cwnd=17000pkts When congestion: cwnd=14875pkts (.875Gbps) To retain 1Gbps, it takes only 0.125/0.01=12.5 RTTs=12.5x200ms= 2.5 seconds 10/27/2009CSE 124 Networked Services Fall 200920

Scalable TCP (contd) 10/27/2009CSE 124 Networked Services Fall 200921 Traditional TCP (Reno) Scalable TCP Cwnd ≈ 1.22/(√L) Cwnd ≈ (a/b)x(1/L) a=0.01 B=0.125

Binary search in BIC-TCP 10/27/2009CSE 124 Networked Services Fall 200922 CWNDmax CWNDmin

Response function for TCP 10/27/2009CSE 124 Networked Services Fall 200923

Multi-tier architectures 10/27/2009CSE 124 Networked Services Fall 200924 A client-server model is considered 2-tier Multi-tiers are extensions of 2-tier – 3-tier: Client->Intermediate tier->Server – 3-tier introduces a set of applications between the client and the backend database servers – This intermediate terminating point between client and server Track users Ensure reliability of transactions Fault tolerance (equipment or software) Scalability An N-tier architecture extends the 3-tier model – Each tier requires equipment and software – Separate TCP/IP connection is required between each tier – N-spliced TCP connection between the Client-Server Number of transport hops: 2x(N-1)

Why N-tier architecture? TCP: Pros – good for the wild, heterogeneous, and uncontrolled Internet – High round-trip time and low bandwidth – Stick with it for Client->Edge Server TCP: Cons – Does not work well within the controlled, high bandwidth, low latency data center networks Why not make TCP in hardware? – Flexibility is better with software – Upgrades are possible without hardware replacement – Heterogeneity of hosts TCP performance is affected – Software implementation – OS factors Solution: Multi-tier architectures – Use TCP in the edge: from Client to the first terminating server – Use specially designed transport within the data center – Eg. InfiniBand transport solution 10/27/2009CSE 124 Networked Services Fall 200925

TCP connection termination in Data Centers Where should TCP terminated? – At the edge (front-end) [entry point in the data center] – At the backend data source [final tier] Can it benefit to terminate it at the edge? – Pros: Differentiated protocol design TCP from client to the edge, custom solutions between edge to back-end Performance improvement Number of connection requests and processing requirements may differ – Cons: TCP’s end-to-end semantics may be violated For complex to implement 10/27/2009CSE 124 Networked Services Fall 200926

Large number of users connect to the Front-end severs (edges) e.g, casually browsing users with read-only requirement Mostly communication requirements are stateless Few users will move over to the higher tiers – May require additional state information such as user profiles Even fewer will move over to higher tier with deeper state information such as session synchronization 10/27/2009CSE 124 Networked Services Fall 200927 Tier-N Data Center: User and connection hierarchy The Internet Small number of users Large number of users N..21N..21

Computing requirements are inversely proportional to the number of users At the edges, mostly read-only requirements, even web-caches can be good As the tiers go higher up, the computing will 10/27/2009CSE 124 Networked Services Fall 200928 The Internet High Computing power Low computing power N..21N..21 Tier-N Data Center: Computing Power requirement

Case Study: InfiniBand Interface InfiniBand Trade Association – founded in 1999 – With Steering group members: IBM, Intel, Mellanox, QLogic, Sun and Voltaire – InfiniBand™ is an industry-standard specification defines an input/output architecture to interconnect – Servers, communications infrastructure equipment, storage and embedded systems Targets a bandwidth of 1000Gbps in the next 3 years – Suitable for N-tier Data center architectures – TCP terminates at the edge 10/27/2009CSE 124 Networked Services Fall 200929

InfiniBand Architecture for 3-tier Data Center TCP Sockets TCP IP Ethernet Browser TCP IP Ethernet Proxy SDP Sockets SDP InfiniBand HW Infiniband HW OS bypass Apache Web Server TCP/IP InfiniBand Architecture HTTP HTML Personal Notebook Computer Network Service Tier Blade Servers Ethernet kernel user driver 10/27/200930CSE 124 Networked Services Fall 2009

Traditional DMA 10/27/2009CSE 124 Networked Services Fall 200931 Buffer 2 CPUDMA-Engine Buffer 1 (1) (2) 1.Buffer copy: CPU Moves the data 2.Buffer copy with DMA CPU programs DMA engine DMA engine moves data DMA engine notifies CPU upon completion of data transfer

Remote DMA 10/27/2009CSE 124 Networked Services Fall 200932 CPU NIC (DMA-Engine) Buffer 1 (3) 3. Buffer copy with Remote DMA CPU programs RDMA engine in NIC RDMA engine moves data across the network to the target destination RDMA engine notifies CPU upon completion of data transfer CPU NIC (DMA-Engine) Buffer 1 Host-AHost-B

RDMA layers 10/27/2009CSE 124 Networked Services Fall 200933 RDMA RDMA may utilize TCP/IP like stack within the NIC InfiniBand architecture utilizes RDMA semantics For faster data transfer between tiers

TCP incasting Large queries may be split between multiple servers K blocks of data requested from S servers Each block is striped across S servers For each work load request, server responds with fixed amount of data Client does not request the j+1 th before all the fragments for j th query is received – Synchronous read event Higher the number of servers (S), application goodput drops – Known as TCP incasting or TCP throughput collapse 10/27/2009CSE 124 Networked Services Fall 200934

Data Center TCP fine tuning Problem – Goodput drops to >700Mbps (S=3) to <200Mbps (S=7) Main Reason – Minimum value of RTT=200ms – Default for most Linux systems 10/27/2009CSE 124 Networked Services Fall 200935

10/27/2009CSE 124 Networked Services Fall 2009 TCP in Wireless networks Several factors affect TCP performance in Wireless – Misinterpretation of Packet loss – Frequent link/path breaks – Increased link/path length – Misinterpretation of congestion window – Asymmetric link behavior – Network partitioning and remerging 36

Summary Reading assignment Traditional TCP does not work well in Giant scale network services Protocol level changes Processing level efficiency improvement New methods of connection termination 10/27/200937CSE 124 Networked Services Fall 2009

CSE 124 Networked Services Fall 2009 B. S. Manoj, Ph.D 10/27/20091CSE 124 Networked Services Fall 2009 Some.

Similar presentations

Presentation on theme: "CSE 124 Networked Services Fall 2009 B. S. Manoj, Ph.D 10/27/20091CSE 124 Networked Services Fall 2009 Some."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 124 Networked Services Fall 2009 B. S. Manoj, Ph.D 10/27/20091CSE 124 Networked Services Fall 2009 Some.

Similar presentations

Presentation on theme: "CSE 124 Networked Services Fall 2009 B. S. Manoj, Ph.D 10/27/20091CSE 124 Networked Services Fall 2009 Some."— Presentation transcript:

Similar presentations

About project

Feedback