High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong.

Slides:



Advertisements
Similar presentations
Outline of the Paper Introduction. Overview Of L4. Design and Implementation Of Linux Server. Evaluating Compatibility Performance. Evaluating Extensibility.
Advertisements

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
IO-Lite: A Unified Buffering and Caching System By Pai, Druschel, and Zwaenepoel (1999) Presented by Justin Kliger for CS780: Advanced Techniques in Caching.
Fast Communication Firefly RPC Lightweight RPC  CS 614  Tuesday March 13, 2001  Jeff Hoy.
Incremental Network Programming for Wireless Sensors NEST Retreat June 3 rd, 2004 Jaein Jeong UC Berkeley, EECS Introduction Background – Mechanisms of.
UC Berkeley 1 A Disk and Thermal Emulation Model for RAMP Zhangxi Tan and David Patterson.
Incremental Network Programming for Wireless Sensors IEEE SECON 2004 Jaein Jeong and David Culler UC Berkeley, EECS.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
Microkernels: Mach and L4
1 I/O Management in Representative Operating Systems.
COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.
Network Programming Eddie Aronovich mail:
Router Architectures An overview of router architectures.
UNIX SVR4 COSC513 Zhaohui Chen Jiefei Huang. UNIX SVR4 UNIX system V release 4 is a major new release of the UNIX operating system, developed by AT&T.
Operating Systems CMPSC 473 Threads September 16, Lecture 7 Instructor: Bhuvan Urgaonkar.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
Draft-shafer-netconf-syslog-00.txt Phil Shafer July 2006 IETF 66, Montreal.
The Linux /proc Filesystem CSE8343 – Fall 2001 Group A1 – Alex MacFarlane, Garrick Williamson, Brad Crabtree.
Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath.
Optimizing UDP-based Protocol Implementations Yunhong Gu and Robert L. Grossman Presenter: Michal Sabala National Center for Data Mining.
Guide to Linux Installation and Administration, 2e1 Chapter 10 Managing System Resources.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
3.1 Silberschatz, Galvin and Gagne ©2009Operating System Concepts with Java – 8 th Edition Chapter 3: Processes.
14.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 4: Threads.
CE Operating Systems Lecture 13 Linux/Unix interprocess communication.
Core System Services. INIT Daemon The init process is the patron of all processes. first process that gets started in any Linux/ UNIX -based system.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
2002 Networking Operating Systems (CO32010) 1. Operating Systems 2. Processes and scheduling 3.
The Mach System Silberschatz et al Presented By Anjana Venkat.
Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience O’Reilly Open.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
Trickles :A stateless network stack for improved Scalability, Resilience, and Flexibility Alan Shieh,Andrew C.Myers,Emin Gun Sirer Dept. of Computer Science,Cornell.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 5.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
ROD Activities at Dresden Andreas Glatte, Andreas Meyer, Andy Kielburg-Jeka, Arno Straessner LAr Electronics Upgrade Meeting – LAr Week September 2009.
An open source user space fast path TCP/IP stack and more…
1 Chapter 5: Threads Overview Multithreading Models & Issues Read Chapter 5 pages
Introduction to threads
Chapter 3: Windows7 Part 5.
Operating System Overview
Implementing the syslog Protocol on a Radlan router
Arrakis: The Operating System is the Control Plane
OPERATING SYSTEM CONCEPT AND PRACTISE
Chapter 4: Threads.
Chapter 4: Threads.
Operating Systems Review ENCE 360.
The Mach System Sri Ramkrishna.
CS533 Concepts of Operating Systems
Chapter 2 Processes and Threads Today 2.1 Processes 2.2 Threads
Chapter 4: Multithreaded Programming
Process Management Presented By Aditya Gupta Assistant Professor
Section 10: Last section! Final review.
SCTP v/s TCP – A Comparison of Transport Protocols for Web Traffic
INTER-PROCESS COMMUNICATION
Chapter 3: Windows7 Part 5.
Optimizing Malloc and Free
CS703 - Advanced Operating Systems
Integrating DPDK/SPDK with storage application
Chapter 2: The Linux System Part 5
Operating Systems Lecture 1.
Chapter 4: Threads.
More on RSVP implementation
Shared Memory David Ferry, Chris Gill
SocksDirect: Datacenter Sockets can be Fast and Compatible
CS Introduction to Operating Systems
Presentation transcript:

High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong

/ 25 Introduction - Embedded UNIX in many places 2 File System KERNEL USER Buffer App Process … log syslogd syslog Traditional UNIX Logging System

/ 25 Problem Statement - Apps slow down w. large amount of logging Long latency to logging daemon Inefficiency of unbuffered writes to flash FS Long latency even with output buffering 3 Flash File System KERNEL USER Buffer App Process … log syslogd syslog Flash File System KERNEL USER Buffer App Process … log syslogd syslog Flash File System KERNEL USER Buffer App Process … log syslogd syslog Flash File System KERNEL USER Buffer App Process … log syslogd syslog Flash File System KERNEL USER Buffer App Process … log syslogd syslog Flash Logger Named pipe

/ 25 Our Approach Faster Message Transfer Compatibility with Existing Logging Apps Destination-Aware Message Formatting 4

/ 25 Organization Related Work for UNIX Logging Systems Background – Cisco UCS and Virtual Interface Card (VIC) – Evolution of VIC Logging System Design Requirements and Implementation Evaluation and Optimization Conclusion 5

/ 25 Related Work - Logging Methods for UNIX Apps Not designed for embedded/flash logging – Slow msg passing (msg copying over kernel) – Unbuffered message writes 6 Syslog Introduced in early 80’s Still most notable one Syslog-ng An extension based on nsyslogd Reliable transport, encryption, and richer set of information and filtering Rsyslog An extension used in latest distros Multi-threading.

/ 25 Background - Cisco UCS and Virtual Interface Card 7 Cisco UCS datacenter server system Cisco UCS server 128 Programmable Virtual Interfaces Ethernet NICs Fibre Channel HBAs 10GBASE-KR Unified Network Fabric, 1 to Each Fabric Extender Cisco UCS Virtual Interface Card (VIC) Mgmt CPU FCPU 0 VIC ASIC FCPU 1 Mgmt CPU MIPS proc core (500MHz, MIPS 24Kc) Embedded Linux (Linux kernel rc5)

/ 25 Background - Evolution of VIC Logging System 8 Logging from Multiple Processes Different Severity Levels Formatting and flash writing Forwards serious msgs to switches Functional, but with worse write performance Improves flash write performance of unbuffered syslogd Still suffers long latency Logd – a simple logging daemon Unbuffered syslogd Buffered syslogd

/ 25 Organization Related Work for UNIX Logging Systems Background – Cisco UCS and Virtual Interface Card (VIC) – Evolution of VIC Logging System Design Requirements & Implementation Evaluation & Optimization Conclusion 9

/ 25 Design Requirements - Faster Message Transfer Avoid kernel-to-user space msg copying 10 Syslogd LoggingMqlogd Logging

/ 25 Design Requirements - Faster Message Transfer Reduce message copying from 4 to ’ 2’ Syslogd LoggingMqlogd Logging App local copy 1 Write to kernel buffer 2 Write directly to shared memory 1’ Write from shared memory to named pipe 2’ Write to named pipe 4 Syslogd local copy 3

/ 25 Design Requirements - Compatibility with Existing Logging Apps Thru Logging API – Replace syslog() with share memory lib calls Direct Syslog Calls – Server receives msgs through UDP Unix socket 12 Logging Server (Syslogd) Logging Client syslog() library call klogdfls … UDP Unix Socket Logging Server (Syslogd) Logging Client syslog() library call mcpfls … UDP Unix Socket Logging API : log_info(), log_error(), … Logging Server (mqlogd) Logging Client klogd xinetd … syslog() library call UDP Unix Socket Logging Server (mqlogd) Logging Client app1app2 … Logging API : log_info(), log_error(), … Shared Memory Logging Library

/ 25 Design Requirements - Destination-Aware Message Formatting Syslogd – Working but limited – Redundant – Coarse time granularity (in seconds) Mqlogd – Destination-aware formatting with space saving – Uses system supported timing (in micro-seconds) 13

/ 25 Implementation - Shared Memory and Circular Queue Notification Mechanism – Write-and-select – Signal Locking Mechanism – Semaphore lock – Pthread lock Enqueue Logging Client Shared Memory … Logging Client Logging Server Dequeue Logging Event Notification Disable Flag Circular Queue Header Notification Disable Flag … Non-Header Entry Header Entry Queue Memory Layout Non-Header Entry Notification 14

/ 25 Organization Related Work for UNIX Logging Systems Background – Cisco UCS and Virtual Interface Card (VIC) – Evolution of VIC Logging System Design Requirements & Implementation Evaluation & Optimization Conclusion 15

/ 25 Evaluation Metrics – Request Latency – Request Drop Rate Parameters – Number of clients – Number of iterations (Depth of queue size) – Locking mechanism – Notification mechanism 16

/ 25 Performance Results - Performance compared to syslogd Avg Latency: >10x speed-up Min Latency: >20x speed-up Max Latency: >2x speed-up 17

/ 25 Performance Results - Effect of Queue Size No drops within queue size (e.g ) Queue size should be larger than max expected burst size 18

/ 25 Performance Results - Effect of Multiple Clients Avg request latency increases proportionally With 2 clients, request starts to drop with smaller number of iterations 19

/ 25 Performance Results - Effect of Notification Mechanisms Makes little difference 20

/ 25 Performance Results - Effect of Lock Mechanisms Pthread mutex is 40% faster than semaphore. Semaphore is used for our production code due to a limitation of pthread mutex lock (Linux kernel rc5).. 21

/ 25 Performance Results - Effect of Client Interface Type Logging using UNIX socket interface – Backward compatibility is no faster – About the same level as syslogd. – For compatibility, not for general use. 22

/ 25 Optimization - Effects of deferred notification Sends one notification for a batch of msgs Measured time for host-to-adapter commands (capability & macaddr) with and w.o. logging 2x speed-up in latency 23

/ 25 Future Works Reduce kernel msg copying even further Improve performance with faster lock Avoid loss of serious messages 24 Flash Logger Named pipe File System KERNEL USER App Process … log mqlogd Memory Mapped File enqueue dequeue Flash Logger File System KERNEL USER App Process … log mqlogd Memory Mapped File enqueue dequeue Memory Mapped File

/ 25 Conclusion Logging system for embedded UNIX apps Up to 100x speed-up in latency, 10x throughput Backward Compatibility Commercially used in Cisco UCS Virtual Interface Cards 25