Operation System Support for Multi-User, Remote, Graphical Interaction Alexander Ya-li Wong, Margo Seltzer Harvard University, Division of Engineering.

Slides:

Advertisements

Similar presentations

Measuring IP Performance Geoff Huston Telstra. What are you trying to measure? User experience –Responsiveness –Sustained Throughput –Application performance.

Advertisements

Categories of I/O Devices

Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.

Operating System.

Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.

CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Resource Containers: A new Facility for Resource Management in Server Systems G. Banga, P. Druschel,

1 GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.

VSphere vs. Hyper-V Metron Performance Showdown. Objectives Architecture Available metrics Challenges in virtual environments Test environment and methods.

Dynamic Adaptive Streaming over HTTP2.0. What’s in store ▪ All about – MPEG DASH, pipelining, persistent connections and caching ▪ Google SPDY - Past,

Merit Network: Connecting People and Organizations Since 1966 CALEA Compliance – A Feasibility Study October 25, 2006 Mary Eileen McLaughlin Director –

THINC: An Architecture for Thin-Client Computing Ricardo A. Baratto

1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State Vsevolod V. Panteleenko Computer Science & Engineering.

1 Virtual Machine Resource Monitoring and Networking of Virtual Machines Ananth I. Sundararaj Department of Computer Science Northwestern University July.

An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.

VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.

Performance Engineering Laboratories Computer Engineering Department King Fahd University of Petroleum & Minerals (KFUPM), Dhahran.

MobiDesk: Mobile Virtual Desktop Computing Ricardo A. Baratto, Shaya Potter, Gong Su, Jason Nieh Network Computing Laboratory Columbia University September.

MobiDesk: Mobile Virtual Desktop Computing Ricardo A. Baratto, Shaya Potter, Gong Su, Jason Nieh Network Computing Laboratory Columbia University.

Internet and Intranet Protocols and Applications Section V: Network Application Performance Lecture 11: Why the World Wide Wait? 4/11/2000 Arthur P. Goldberg.

Adaptive Content Delivery for Scalable Web Servers Authors: Rahul Pradhan and Mark Claypool Presented by: David Finkel Computer Science Department Worcester.

Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.

Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.

RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.

Week 6 Operating Systems.

Stuart Cunningham - Computer Platforms COMPUTER PLATFORMS Network Operating Systems Week 9.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

1 Design and Performance of a Web Server Accelerator Eric Levy-Abegnoli, Arun Iyengar, Junehwa Song, and Daniel Dias INFOCOM ‘99.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.

Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.

Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.

Improving Network I/O Virtualization for Cloud Computing.

1 Analysis of Push Initiator Tool used for Wireless Application Protocol Taotao Huang Helsinki University of Technology Department of Electrical and Communication.

The Performance of Remote Display Mechanisms for Thin-Client Computing S.Jae Yang, Jason Nieh, Matt Selsky, and Nikhil Tiwari Department of Computer Science.

Effect Of Message Size and Number of Clients on WS Frameworks For CIS* Service Oriented Computing Dariusz Grabka Gerett Commeford Jack Cole.

Windows 2000 Course Summary Computing Department, Lancaster University, UK.

Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.

1 University of Maryland Linger-Longer: Fine-Grain Cycle Stealing in Networks of Workstations Kyung Dong Ryu © Copyright 2000, Kyung Dong Ryu, All Rights.

A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.

Srihari Makineni & Ravi Iyer Communications Technology Lab

Providing Differentiated Levels of Service in Web Content Hosting Jussara Almeida, etc... First Workshop on Internet Server Performance, 1998 Computer.

X-WindowsP.K.K.Thambi The X Window System Module 5.

CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.

ND The research group on Networks & Distributed systems.

Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.

Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.

Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University

Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.

Threads, Thread management & Resource Management.

“OpenCALEA” Pragmatic Cost Effective CALEA Compliance Manish Karir, Merit - Research and Development.

European Laboratory for Particle Physics Window NT 4 Scaling/Performance Tests Alberto Di Meglio CERN IT/DIS/NCS.

Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.

The interactive performance of SLIM: a stateless thin-client architecture Brian K. Schmidt and Monica S. Lam Stanford University J. Duane Northcutt Sun.

Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

PLANet: An Active Network Testbed Michael Hicks and Jonathan T.Moore Presented by Moonjoo Kim.

INDIANAUNIVERSITYINDIANAUNIVERSITY Tsunami File Transfer Protocol Presentation by ANML January 2003.

CLIENT SERVER COMPUTING. We have 2 types of n/w architectures – client server and peer to peer. In P2P, each system has equal capabilities and responsibilities.

Intra-Socket and Inter-Socket Communication in Multi-core Systems Roshan N.P S7 CSB Roll no:29.

Kangwon National Univ. Software & System Lab. Evaluating Windows NT Terminal Server Performance Alexander Ya-li Wong Margo Seltzer 발표자 : 방철석.

NFV Compute Acceleration APIs and Evaluation

“OpenCALEA” Pragmatic Cost Effective CALEA Compliance

SEDA: An Architecture for Scalable, Well-Conditioned Internet Services

Optimizing the Migration of Virtual Computers

Web Server Administration

GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.

Chapter 13: I/O Systems I/O Hardware Application I/O Interface

Presentation transcript:

Operation System Support for Multi-User, Remote, Graphical Interaction Alexander Ya-li Wong, Margo Seltzer Harvard University, Division of Engineering and Applied Sciences Kim, Byeong Gil Software & System kangwon Natl. Univ.

Content  Abstract  Introduction  Background of X Windows and TSE  Approach  Processor, Memory, Network User Behavior Compulsory Load, Dynamic Load Latency  Conclusion

Abstract  Processor and memory scheduling algorithms are not tuned for thin client service.  Under heavy CPU and memory load, user- perceived latencies is up to 100 times.  TSE’s network protocol outperforms X by up to six times.  Bitmap cache is essential for handling dynamic elements of modern user interfaces.  Use of bitmap cache can reduce network load by up to 2000%

Introduction  Modern computer system architecture Allow the processor, memory, disk, and display subsystems to be spatially extruded throughout network.  Thin Client consider cost and manageability interest in X Windows-like schemes introduction of thin client service into major commercial operating systems. accelerate as consumer products.

Background X WindowsTSE Library Xlib GUIWin32 GUI Class User-levelPass through the kernel Multi-user YES Protocol XRDP Compression & caching None Toolkit-specific, Usually none RLE Memory & Disk platforms Windows, Unix Macintosh Windows Unix (via third-party add-ons)

Background (con’t)  LBX (Low Bandwidth X) is a protocol extension to X is implemented as a proxy takes normal X traffic Applies various compression techniques

Approach  What is the maximum number of concurrent users and what impact on users yields this maximum value? Interactive  Latency, not throughput  Latency is a key performance criterion. ( “Using Latency to Evaluate Interactive System Performance” by Y., Wang, Z., Chen – 1996) Multi-User  Benchmarking on the multi-user system Graphical  need to consider with respect to latency Remote Access  The efficiency of the network protocol

The key Role of Latency  Lantency characteristic Latency tolerances for continuous operations are lower than for discrete operations. Humans are irritated by latencies 100ms or greater. ( “Providing A Low Latency User Experience In A High Latency Application” by Holden, L. – 1997)  Degree factor of latency continues to increase for any operation the number of operations that induce perceptible latency increases when perceptible latency continually changes

Effect factors of latency  Hardware resources relevant the processor, memory, disk, network resource scarcity - the speed of the memory hierarchy level  Operating system structure bad scheduling decisions inefficient context switches poor management of resource contention  User behavior Hardware resource limitations

Experimental Testbed  Composition Server - 333MHz Intel Celeron system - 96MB SDRAM - 4GB IDE hard disk - Bay Networks NetGear FA-310 Ethernet adapter Client - Intel Pentium II MB SDRAM - 11GB IDE hard disk - 3Com Ethernet adapter Network listening host - Intel Pentium MB EDO RAM - 2GB IDE hard disk - 3Com Ethernet adapter

Processor - Behavior  From Behavior to Load Multi-user support  incoming session connections  Additional per-user kernel state  Ownership information Remote-access support Interface operations  pass through the network subsystem  Compulsory Load are inherent in the operating system

Processor - Load  TSE observe greater overall idle-state CPU activity. Listen for and handle incoming client connections Session state management - NT Virtual Memory, Object, and process managers

Processor - Latency

Dynamic Latency  Methodology Sink C program - never voluntarily yields the processor. - should increase the scheduler queue length by one Testing program - TSE : Notepad - X Windows : vim Action - to engage character repeat on the client machine - the rate of which was set at 20Hz Measurement - using tcpdump

Dynamic Latency - Results  No load the server Sending a message to the client every 50ms  Load the sever

Memory  From Behavior to Load Compulsory Load  Dynamic memory usage of the kernel -The system is idle with no user sessions. -17MB for Linux and 19MB for TSE  Memory usage of each user session -To be a minimal login with no additional user activity

Memory – Compulsory Load (con’t)

Memory - Latency  From Load to Latency Opened a simple text editing application remotely TSE avg is about 40 times the threshold Linux avg is about 11 times the threshold

Network  How user behavior generates network load compare the ability of RDP, X, LBX Growing usage of animation  How network load translates to user-perceived latency Importance of network protocol efficiency  Terms “channel” – stream of network messages between the client and server “Input channel” – stream from the client to the server

Network - Behavior  From Behavior to Load depends on the design and implementation of the user interfaces increasing richness and sophistication of graphical interfaces is becoming increasingly network intensive

Network - Load  Compulsory Load session negotiation and initialization Any network traffic is exchanged after session setup Session setup costs -TSE : 45,328 bytes -Linux/X : 16,312 bytes  Dynamic Load RDP, X, LBX Testing Environment -Corel WordPerfect -Gimp -Netscape Navigator prototap – protocol tracing software based on the tcpdump

Network – Load (con’t)  RDP is the most efficient protocol less than 25% of LBX and less than 15% of X  Message of LBX is to be compressed.  Average message size is just 209 bytes.

Network – Load (con’t)  Virtual-IP (VIP) omit the IP header can reduce overhead  LBX has the smallest average msg size. still be less than half as efficient than RDP

Network – Animations: Bitmap Caching  RDP outperforms LBX and X.  X and LBX does not support bitmap caching.  TSE client reserves 1.5MB of memory for a bitmap cache using an LRU eviction policy

Network – Cache Effectiveness and CPU Load  is not only critical to reducing network load, but also processor load at the server.

Network – cache  Looping animations For values 25 ~ Mbps For all values above Mbps  LRU is the wrong scheme for handling looping animations.

Network - Latency  Demonstration Two simple C programs - establish a TCP connection and send and receive random data Ran ping for 60 sec. Took the average and variance in RTT. Default ping size is 64 bytes ( keystroke size)

Network – Latency (con’t)

Conclusion  Latency is the paramount performance  Highlights important issues relevant to thin client performance  Resource scheduling is not well optimized  Resource saturation rise well above human- perceptible levels  performed a detailed comparison of the RDP, X and LBX protocols  RDP is more efficient in terms os network load ( animated UI elements)