Measuring PROOF Lite performance in (non)virtualized environment Ioannis Charalampidis, Aristotle University of Thessaloniki Summer Student 2010.

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

Virtual Switching Without a Hypervisor for a More Secure Cloud Xin Jin Princeton University Joint work with Eric Keller(UPenn) and Jennifer Rexford(Princeton)
Operating System.
Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
Difference Engine: Harnessing Memory Redundancy in Virtual Machines by Diwaker Gupta et al. presented by Jonathan Berkhahn.
Profit from the cloud TM Parallels Dynamic Infrastructure AndOpenStack.
Bart Miller. Outline Definition and goals Paravirtualization System Architecture The Virtual Machine Interface Memory Management CPU Device I/O Network,
VSphere vs. Hyper-V Metron Performance Showdown. Objectives Architecture Available metrics Challenges in virtual environments Test environment and methods.
Virtual Machines What Why How Powerpoint?. What is a Virtual Machine? A Piece of software that emulates hardware.  Might emulate the I/O devices  Might.
Efficient VM Introspection in KVM and Performance Comparison with Xen
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice In search of a virtual yardstick:
Introduction to Virtualization
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
INFSO-RI An On-Demand Dynamic Virtualization Manager Øyvind Valen-Sendstad CERN – IT/GD, ETICS Virtual Node bootstrapper.
Measuring Performance Chapter 12 CSE807. Performance Measurement To assist in guaranteeing Service Level Agreements For capacity planning For troubleshooting.
A comparison between xen and kvm Andrea Chierici Riccardo Veraldi INFN-CNAF.
Virtualization for Cloud Computing
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
Tanenbaum 8.3 See references
How to Resolve Bottlenecks and Optimize your Virtual Environment Chris Chesley, Sr. Systems Engineer
Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D FIS Distinguished Professor of Computer Science School of.
Appendix B Planning a Virtualization Strategy for Exchange Server 2010.
Benefits: Increased server utilization Reduced IT TCO Improved IT agility.
Introduction to Operating Systems Chapter 1. cs431 -cotter2 Lecture Objectives Understand the relationship between computing hardware, operating system,
Chapter 1 Introduction 1.1 What is an operating system
INTRODUCTION TO VIRTUALIZATION KRISTEN WILLIAMS MOSES IKE.
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
High Performance Computing on Virtualized Environments Ganesh Thiagarajan Fall 2014 Instructor: Yuzhe(Richard) Tang Syracuse University.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview Part 2: History (continued)
Virtualization for the LHCb Online system CHEP Taipei Dedicato a Zio Renato Enrico Bonaccorsi, (CERN)
Lawrence Livermore National Laboratory Pianola: A script-based I/O benchmark Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA
Instrumentation of Xen VMs for efficient VM scheduling and capacity planning in hybrid clouds. Kurt Vermeersch Coordinator: Sam Verboven.
Performance Comparison Xen vs. KVM vs. Native –Benchmarks: SPEC CPU2006, SPEC JBB 2005, SPEC WEB, TPC –Case studies Design instrumentations for figure.
02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation Industrial Project Course (234313) Virtualization-aware.
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN
Introduction to virtualization
A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Tools and techniques for managing virtual machine images Andreas.
Full and Para Virtualization
© Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Restricted Module 7.
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing,
Docker and Container Technology
Virtualization One computer can do the job of multiple computers, by sharing the resources of a single computer across multiple environments. Turning hardware.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
Atlas Software Structure Complicated system maintained at CERN – Framework for Monte Carlo and real data (Athena) MC data generation, simulation and reconstruction.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Understanding Virtualization Overhead.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
Virtual Server Server Self Service Center (S3C) JI July.
Performance analysis comparison Andrea Chierici Virtualization tutorial Catania 1-3 dicember 2010.
A comparison between xen and kvm Andrea Chierici Riccardo Veraldi INFN-CNAF CCR 2009.
Intro To Virtualization Mohammed Morsi
Virtualization for Cloud Computing
Virtualization.
Virtual Machine Monitors
Understanding and Improving Server Performance
Presented by Yoon-Soo Lee
Process Management Process Concept Why only the global variables?
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Distributed Network Traffic Feature Extraction for a Real-time IDS
PES Lessons learned from large scale LSF scalability tests
Virtualization overview
Virtualization in the gLite Grid Middleware software process
Xen Summit Spring 2007 Platform Virtualization with XenEnterprise
OS Virtualization.
Operating Systems Lecture 1.
Presentation transcript:

Measuring PROOF Lite performance in (non)virtualized environment Ioannis Charalampidis, Aristotle University of Thessaloniki Summer Student 2010

Overview Introduction Benchmarks: Overall execution time Benchmarks: In-depth analysis Conclusion

What am I looking for? There is a known overhead caused by the virtualization process ▫ How big is it? ▫ Where is located? ▫ How can we minimize it? ▫ Which hypervisor has the best performance? I am using CernVM as guest

What is CernVM? It’s a baseline Virtual Software Appliance for use by LHC experiments It’s available for many hypervisors Hyper-V VM Ware Virtual Box KVM / QEMU XEN

How am I going to find the answers? Using as benchmark a standard data analysis application (ROOT + PROOF Lite) Test it on different hypervisors And on varying number of workers/CPUs Compare the performance (Physical vs. Virtualized)

Problem The benchmark application requires too much time to complete ( 2 min ~ 15 min ) ▫ At least 3 runs are required for reliable results ▫ The in-depth analysis overhead is about 40% ▫ It is not efficient to perform detailed analysis for every CPU / Hypervisor configuration  Create the overall execution time benchmarks  Find the best configuration to run the traces on

Benchmarks performed Overall time ▫ Using time utility and automated batch scripts In-depth analysis ▫ Tracing system calls using  Strace  KernelTAP ▫ Analyzing the trace files using applications I wrote  BASST (Batch analyzer based on STrace)  KARBON (General purpose application profiler based on trace files)

Process description and results

Benchmark Configuration Base machine ▫ Scientific Linux CERN 5 Guests ▫ CernVM 2.1 Software packages from SLC repositories ▫ Linux Kernel el5 ▫ XEN el5 ▫ KVM el5 ▫ Python 2.5.4p2 (from AFS) ▫ ROOT b (from AFS) Base machine hardware ▫ 24 x Intel Xeon X GHz with VT-x Support (64 bit) ▫ No VT-d nor Extended Page Tables (EPT) hardware support ▫ 32G RAM

Benchmark Configuration Virtual machine configuration ▫ 1, 2 to 16 CPUs with 2 CPU step ▫ + 1Gb RAM for Physical disk and Network tests ▫ + 17Gb RAM for RAM Disk tests ▫ Disk image for the OS ▫ Physical disk for the Data + Software Important background services running ▫ NSCD (Caching daemon)

Benchmark Configuration Caches were cleared before every test ▫ Page cache, dentries and inodes ▫ Using the /proc/sys/vm/drop_caches flag No swap memory was used ▫ By periodically monitoring the free memory

Automated batch scripts The VM batch script runs on the host machine It repeats the following procedure: ▫ Crate a new Virtual Machine ▫ Wait for the machine to finish booting ▫ Connect to the controlling script inside the VM ▫ Drop caches both on the host and the guest ▫ Start the job ▫ Receive and archive the results Client Server Hypervisor Benchmark

Problem There was a bug on PROOF Lite that was looking up a non-existing hostname during the startup of each worker  Example : 0.2-plitehp24.cern.ch Discovered by detailed system call tracing  The hostname couldn’t be cached  The application had to wait for the timeout  The startup time was delayed randomly  Call tracing applications made this delay even bigger virtually hanging the application

Problem The problem was resolved with: ▫ A minimal DNS proxy was developed that fakes the existence of the buggy hostname ▫ It was later fixed in PROOF source Application DNS Server Fake DNS Proxy cernvm.cern.ch? x.x-xxxxxx-xxx-xxx?

Problem Example: Events / sec for different CPU settings, as reported by the buggy benchmark BeforeAfter

Results – Physical Disk

Results – Network (XROOTD)

Results – RAM Disk

Results – Relative values RAM DiskNetwork (XROOTD) Physical Disk Bare metal KVMXEN

Results – Absolute values RAM DiskNetwork (XROOTD) Physical Disk Bare metal KVMXEN

Results – Comparison chart

Procedure, problems and results

In depth analysis In order to get more details the program execution was monitored and all the system calls were traced and logged Afterwards, the analyzer extracted useful information from the trace files such as ▫ Detecting the time spent on each system call ▫ Detecting the filesystem / network activity The process of tracing adds some overhead but it is cancelled out from the overall performance measurement

System call tracing utilities STrace ▫ Traces application-wide system calls from user space ▫ Connects to the tracing process using the ptrace() system call and monitors it’s activity Advantages ▫ Traces the application’s system calls in real time ▫ Has very verbose output Disadvantages ▫ Creates big overhead Process Kernel STrace

System call tracing utilities SystemTAP ▫ Traces system-wide kernel activity, asynchronously ▫ Runs as a kernel module Advantages ▫ Can trace virtually everything on a running kernel ▫ Supports scriptable kernel probes Disadvantages ▫ It is not simple to extract detailed information ▫ System calls can be lost on high CPU activity Process Kernel System TAP

System call tracing utilities Sample STrace output: arch_prctl(ARCH_SET_FS, 0x2b5f2bcc27d0) = mprotect(0x34ca54d000, 16384, PROT_READ) = mprotect(0x34ca01b000, 4096, PROT_READ) = munmap(0x2b5f2bc92000, ) = open("/usr/lib/locale/locale-archive", O_RDONLY) = fstat(4, {st_mode=S_IFREG|0644, st_size= ,...}) = mmap(NULL, , PROT_READ, MAP_PRIVATE, 4, 0) = 0x2b5f2bcc close(4) = brk(0) = 0x1ad1f brk(0x1ad40000) = 0x1ad open("/usr/share/locale/locale.alias", O_RDONLY) = fstat(4, {st_mode=S_IFREG|0644, st_size=2528,...}) = read(4, "", 4096) = close(4) = munmap(0x2b5f2f297000, 4096) = wait4(-1, 0x7fff8d813064, WNOHANG, NULL) = -1 ECHILD (No child processes)...

KARBON – A trace file analyzer

Is a general purpose application profiler based on system call trace files It traces file descriptors and reports detailed I/O statistics for files, network sockets and FIFO pipes It analyzes the child processes and creates process graphs and process trees It can detect the “Hot spots” of an application Custom analyzing tools can be created on-demand using the development API

KARBON – Application block diagram Source (File or TCP Stream) Router Preprocessing Tool Analyzer Filter Presenter Tokenizer

Results Time utilization of the traced application

Results Time utilization of the traced application

Results Time utilization of the traced application

Results Overall system call time for filesystem I/O Reminder: Kernel buffers were dropped before every test ▫ Possible caching effect inside the hypervisor [ms]ReadingWritingSeekingTotal Bare metal490, , , , KVM38, , , , XEN38, , , ,

Results Overall system call time for UNIX Sockets [ms]ReceivingSendingBind, ListenConnectingTotal Bare metal , , KVM59, , , XEN97, , ,

Results Most time-consuming miscellaneous system calls System callBare metalKVMXEN wait4() 178, ,829,30388,885,57 gettimeofday()(No trace) 219,780,33218,018,63 nanosleep()(No trace) 12,250,1212,029,30 time()(No trace) 9,081,94 rt_sigreturn() 150,9431,685,2859,271,061 setitimer() 23,245698,785223,669

Conclusion Physical Disk ▫ KVM can achieve better performance than XEN, reaching % of the native speed ▫ Best performance achieved on 6 CPUs/6 workers (7Gb RAM) with 81% of the native speed Network (Xrootd) ▫ XEN can achieve better performance than KVM, reaching % of the native speed ▫ Best performance achieved again on 6 CPUs / 6 workers (7G RAM) with 92% of the native speed

Conclusion Some disk I/O operations (read) appear to be faster inside the Virtual Machine Some of them appear to be slower (seek, write) ▫ Possible caching effect even on direct disk access Network I/O ▫ TCP under XEN looks fine, whereas with KVM there are some issues ▫ UNIX Sockets seem to have significant penalty inside the VMs Some miscellaneous system calls take longer inside the VM ▫ Time-related functions (gettimeoftheday, nanosleep)  Used for paravirtualized implementation of other system calls?

Other uses of the tools SystemTAP could be used by nightly builds in order to detect hanged applications KARBON can be used as a general log file analysis program

Future work Benchmark VMs with a disk image file residing on a RAID Array Benchmark many concurrent KVM virtual machines with total memory exceed the overall system memory – Exploit NPT Test the PCI Pass-through for network cards (KVM) – Test VT-d Convert the benchmark application from python to pure C Repeat the benchmarks with the optimized ROOT input files Test again the KVM Network performance with Recompile the kernel with CONFIG_KVM_CLOCK