ND The research group on Networks & Distributed systems.

Slides:



Advertisements
Similar presentations
Multicast Tree Reconfiguration in Distributed Interactive Applications Pål Halvorsen 1,2, Knut-Helge Vik 1 and Carsten Griwodz 1,2 1 Department of Informatics,
Advertisements

Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Chapter 17 Networking Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
1 Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
OSMOSIS Final Presentation. Introduction Osmosis System Scalable, distributed system. Many-to-many publisher-subscriber real time sensor data streams,
Embedded Real-time Systems The Linux kernel. The Operating System Kernel Resident in memory, privileged mode System calls offer general purpose services.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
A Pipelined Execution of Tiled Nested Loops on SMPs with Computation and Communication Overlapping Maria Athanasaki, Aristidis Sotiropoulos, Georgios Tsoukalas,
Figure 1.1 Interaction between applications and the operating system.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.
Understanding Factors That Influence Performance of a Web Server Presentation CS535 Project By Thiru.
Router Architectures An overview of router architectures.
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji  Hemal V. Shah ¥ D. K. Panda 
SET TOP BOX What is set-top box ? An interactive device which integrates the video and audio decoding capabilities of television with a multimedia application.
Router Architectures An overview of router architectures.
I/O Acceleration in Server Architectures
Revisiting Network Interface Cards as First-Class Citizens Wu-chun Feng (Virginia Tech) Pavan Balaji (Argonne National Lab) Ajeet Singh (Virginia Tech)
Performance Tradeoffs for Static Allocation of Zero-Copy Buffers Pål Halvorsen, Espen Jorde, Karl-André Skevik, Vera Goebel, and Thomas Plagemann Institute.
PRISM: Proxies for Internet Streaming Media J. Kurose, P. Shenoy, D. Towsley (UMass/Amherst) L. Gao (Smith College) G. Hjalmtysson, J. Rexford (AT&T Research.
Chapter 17 Networking Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William Stallings.
1/29/2002 CS Distributed Systems 1 Infiniband Architecture Aniruddha Bohra.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Digital UNIX Internals III/O Framework Chapter 12.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.
Kernel, processes and threads Windows and Linux. Windows Architecture Operating system design Modified microkernel Layered Components HAL Interacts with.
1 Liquid Software Larry Peterson Princeton University John Hartman University of Arizona
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Optimizing UDP-based Protocol Implementations Yunhong Gu and Robert L. Grossman Presenter: Michal Sabala National Center for Data Mining.
Assignment 5/9 – 2005 INF 5070 – Media Servers and Distribution Systems:
1 Networking Chapter Distributed Capabilities Communications architectures –Software that supports a group of networked computers Network operating.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Srihari Makineni & Ravi Iyer Communications Technology Lab
An Overlay Network Providing Application-Aware Multimedia Services Maarten Wijnants Bart Cornelissen Wim Lamotte Bart De Vleeschauwer.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
1 Integrating security in a quality aware multimedia delivery platform Paul Koster 21 november 2001.
ECE 526 – Network Processing Systems Design Computer Architecture: traditional network processing systems implementation Chapter 4: D. E. Comer.
How to Minimize Transport Protocol Processing: Implementation and Evaluation of Network Level Framing Pål Halvorsen, Thomas Plagemann, and Vera Goebel.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
CS 4396 Computer Networks Lab Router Architectures.
Server Resources 12/ INF5070 – Media Storage and Distribution Systems:
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
1Thu D. NguyenCS 545: Distributed Systems CS 545: Distributed Systems Spring 2002 Communication Medium Thu D. Nguyen
Simula Research Laboratory Lokaliteter & Forskning
Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Making the “Box” Transparent: System Call Performance as a First-class Result Yaoping Ruan, Vivek Pai Princeton University.
Voice Over Internet Protocol (VoIP) Copyright © 2006 Heathkit Company, Inc. All Rights Reserved Presentation 5 – VoIP and the OSI Model.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
Test for timestamp : measure code execution time.
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
BDTS and Its Evaluation on IGTMD link C. Chen, S. Soudan, M. Pasin, B. Chen, D. Divakaran, P. Primet CC-IN2P3, LIP ENS-Lyon
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
Accelerating Peer-to-Peer Networks for Video Streaming
Infiniband Architecture
CS 286 Computer Organization and Architecture
Internetworking: Hardware/Software Interface
Data Path through host/ANP.
Xen Network I/O Performance Analysis and Opportunities for Improvement
Low Overhead Interrupt Handling with SMT
Cluster Computers.
Presentation transcript:

ND The research group on Networks & Distributed systems

2 ND activities ICON – Interconnection Networks –Interconnection networks are tightly coupled/short distance networks with extreme demands on bandwidth, latency, and delivery –Problem areas: Effective routing/topologies, fault-tolerance/dynamic reconfiguration, and Quality of Service VINNER – End-to-end Internet communications –Problem area: Network resilience – as a set of methods and techniques that improve the user perception of network robustness and reliability.

3 ND activities QuA - Support of Quality of Service in component architectures –Problem area: How to develop applications that are sensitive to QoS on a component architecture platform and how dynamic QoS management and adaptation can be supported Relay – Resource utilization in time-dependent distributed systems –Problem area: Reduce the effects of resource limitations and geographical distances in interactive distributed applications – through a toolkit of kernel extensions, programmable subsystems, protocols and decision methods

Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics, University of Oslo, Norway 2 Simula Research Laboratory, Norway

5 Overview Motivation Existing mechanisms in Linux Possible enhancements Summary and Conclusions

6 Delivery Systems Network bus(es)

7 file system communication system application user space kernel space bus(es) Delivery Systems

8 Pentium 4 Processor registers cache(s) I/O controller hub memory controller hub RDRAM PCI slots network card disk file system communication system application file system communication system application disknetwork card Intel Hub Architecture  several in-memory data movements and context switches

9 Motivation Data copy operations are expensive –consume CPU, memory, hub, bus and interface resources (proportional to data size) –profiling shows that ~40% of CPU time is consumed by copying data between user and kernel space –gap between memory and CPU speeds increase –different access times to different banks System calls make a lot of switches between user and kernel space

10 file system communication system application user space kernel space bus(es) data_pointer Zero–Copy Data Paths

11 Motivation Data copy operations are expensive –consume CPU, memory, hub, bus and interface resources (proportional to data size) –profiling shows that ~40% of CPU time is consumed by copying data between user and kernel –gap between memory and CPU speeds increase –different access times to different banks System calls make a lot of switches between user and kernel space A lot of research has been performed in this area BUT, what is the status today of commodity operating systems?

Existing Linux Data Paths

13 file system communication system application user space kernel space bus(es) Content Download

14 Content Download: read / send application kernel page cache socket buffer application buffer read send copy DMA transfer  2n copy operations  2n system calls

15 Content Download: mmap / send application kernel page cache socket buffer mmap send copy DMA transfer  n copy operations  1 + n system calls

16 Content Download: sendfile application kernel page cache socket buffer sendfile gather DMA transfer append descriptor DMA transfer  0 copy operations  1 system calls

17 Content Download: Results UDPTCP Tested transfer of 1 GB file on Linux 2.6 Both UDP (with enhancements) and TCP

18 file system communication system application user space kernel space bus(es) Streaming

19 Streaming: read / send application kernel page cache socket buffer application buffer read send copy DMA transfer  2n copy operations  2n system calls

20 Streaming: read / writev application kernel page cache socket buffer application buffer read writev copy DMA transfer  3n copy operations  2n system calls copy  One copy more than previous solution

21 Streaming: mmap / send application kernel page cache socket buffer application buffer mmap uncork copy DMA transfer  2n copy operations  1 + 4n system calls copy send cork

22 Streaming: mmap / writev application kernel page cache socket buffer application buffer mmap writev copy DMA transfer  2n copy operations  1 + n system calls copy  Three calls less than previous solution

23 Streaming: sendfile application kernel page cache socket buffer application buffer DMA transfer  n copy operations  4n system calls gather DMA transfer append descriptor copy uncorksendfilesendcork

24 Streaming: Results Tested streaming of 1 GB file on Linux 2.6 RTP over UDP TCP sendfile (content download) Compared to not sending an RTP header over UDP, we get an increase of 29% (additional send call) More copy operations and system calls required  potential for improvements

Enhanced Streaming Data Paths

26 Enhanced Streaming: mmap / msend application kernel page cache socket buffer application buffer DMA transfer  n copy operations  1 + 4n system calls gather DMA transfer append descriptor copy msend allows to send data from an mmap ’ed file without copy mmap uncorksend cork msend copy DMA transfer  One copy less than previous solution

27 Enhanced Streaming: mmap / rtpmsend application kernel page cache socket buffer application buffer DMA transfer  n copy operations  1 + n system calls gather DMA transfer append descriptor copy mmap uncorksend cork rtpmsend RTP header copy integrated into msend system call  Three calls less than previous solution

28 Enhanced Streaming: mmap / krtpmsend application kernel page cache socket buffer application buffer DMA transfer  0 copy operations  1 system call gather DMA transfer append descriptor copy krtpmsend  One call less than previous solution An RTP engine in the kernel adds RTP headers rtpmsend RTP engine  One copy less than previous solution

29 Enhanced Streaming: rtpsendfile application kernel page cache socket buffer application buffer DMA transfer  n copy operations  n system calls gather DMA transfer append descriptor copy rtpsendfile  existing solution requires three more calls per packet uncorksendfilesendcork RTP header copy integrated into sendfile system call

30 Enhanced Streaming: krtpsendfile application kernel page cache socket buffer application buffer DMA transfer  0 copy operations  1 system call gather DMA transfer append descriptor copy krtpsendfile  One call less than previous solution An RTP engine in the kernel adds RTP headers rtpsendfile RTP engine  One copy less than previous solution

31 Enhanced Streaming: Results Tested streaming of 1 GB file on Linux 2.6 RTP over UDP TCP sendfile (content download) Existing mechanism (streaming) mmap based mechanisms sendfile based mechanisms ~27% improvement ~25% improvement

32 Conclusions Current commodity operating systems still pay a high price for streaming services However, small changes in the system call layer might be sufficient to remove most of the overhead Conclusively, commodity operating systems still have potential for improvement with respect to streaming support What can we hope to be supported? Road ahead: optimize the code, make patch and submit to kernel.org

33 Questions??