High-performance tracing of many-core systems with LTTng

Slides:



Advertisements
Similar presentations
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Advertisements

System Simulation Of 1000-cores Heterogeneous SoCs Shivani Raghav Embedded System Laboratory (ESL) Ecole Polytechnique Federale de Lausanne (EPFL)
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
t Popularity of the Internet t Provides universal interconnection between individual groups that use different hardware suited for their needs t Based.
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.
PhD course - Milan, March /09/ Some additional words about cloud computing Lionel Brunie National Institute of Applied Science (INSA) LIRIS.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
May l Washington, DC l Omni Shoreham Nick Dobrovolskiy VP Parallels Open Platform May 19 th, 2008 Introducing Parallels Server.
Computing Labs CL5 / CL6 Multi-/Many-Core Programming with Intel Xeon Phi Coprocessors Rogério Iope São Paulo State University (UNESP)
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18.
PART II OPERATING SYSTEMS LECTURE 8 SO TAXONOMY Ştefan Stăncescu 1.
Zero - G CONNECTING THE INTERNET OF THINGS. Introduction to Zero -G.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Revision - 01 Intel Confidential Page 1 Intel HPC Update Norfolk, VA April 2008.
Full and Para Virtualization
By Chad Andrus. TILE-Gx100  100 Identical Processor Cores Each core has its own L2 & L3 cache Each can run its own OS or group together for multiprocessing.
Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD
Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real.
SEMINAR ON.  OVERVIEW -  What is Cloud Computing???  Amazon Elastic Cloud Computing (Amazon EC2)  Amazon EC2 Core Concept  How to use Amazon EC2.
Virtualization for Cloud Computing
Applied Operating System Concepts
M. Bellato INFN Padova and U. Marconi INFN Bologna
NFV Compute Acceleration APIs and Evaluation
Fundamentals Sunny Sharma Microsoft
Lynn Choi School of Electrical Engineering
Chapter 1: Introduction
Chapter 1: Introduction
Current Generation Hypervisor Type 1 Type 2.
ECE354 Embedded Systems Introduction C Andras Moritz.
Virtualization OVERVIEW
Virtualization Virtualization is the creation of substitutes for real resources – abstraction of real resources Users/Applications are typically unaware.
Chapter 1: Introduction
Chapter 1: Introduction
System On Chip.
Sebastian Solbach Consulting Member of Technical Staff
Virtualization, Cloud Computing and Big Data
Architecture & Organization 1
MOBILE DEVICE OPERATING SYSTEM
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Ray-Cast Rendering in VTK-m
Virtualization Virtualization is the creation of substitutes for real resources – abstraction of real resources Users/Applications are typically unaware.
חוברת שקפים להרצאות של ד"ר יאיר ויסמן מבוססת על אתר האינטרנט:
Architecture & Organization 1
Virtualization Techniques
Intro. To Operating Systems
Characteristics of Reconfigurable Hardware
Operating System Concepts
Chapter 1: Introduction
Introduction to Operating Systems
Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.
Introduction to Operating Systems
Subject Name: Operating System Concepts Subject Number:
Chapter 1: Introduction
Chapter 1: Introduction
Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T. J
Chapter 1: Introduction
Chapter 1: Introduction
NVMe.
Chapter 1: Introduction
Operating System Concepts
Chapter 13: I/O Systems.
Chapter 1: Introduction
Presentation transcript:

High-performance tracing of many-core systems with LTTng Simon Marchi Laboratoire DORSAL Département de génie informatique Noyau d'un système d'exploitation

Outline Intro to tracing Problem description Studied platforms Characteristics of many-core processors Work done and planned work Noyau d'un système d'exploitation

Intro to tracing (1/2) Very high performance logging How ? Very compact output format Lockless synchronization Low-level optimization (architecture dependent) Small footprint Won't block the application Noyau d'un système d'exploitation

Intro to tracing (2/2) Used by Kernel and application developers Sysadmins Security analysts Education Noyau d'un système d'exploitation

LTTng ! Open source project started at Polytechnique Linux kernel and userspace application tracer Very active development – many industrial partners http://www.lttng.org Noyau d'un système d'exploitation

LTTng ! Noyau d'un système d'exploitation

Problem description Latest generation of many-core processors Tilera, Intel Xeon Phi, Freescale, Adapteva Expected to become more popular Energy-efficient Best way to use ever-increasing number of transistors on chips Developers need good tools LTTng helps developers with performance problems or bugs related to parallel programming. There is no doubt a tracer will be a good friend on a 50 core machine. Noyau d'un système d'exploitation

Problem description Port and optimize LTTng for many-core architectures Expected challenges Limited storage High volume of data generated Highly parallel architectures, performance scaling We expect to do more at the same time with these processors, so necessarily there will be more to trace. Noyau d'un système d'exploitation

Studied platforms Tilera TILE-Gx8036 36 cores (versions up to 100 cores to come) Target market: cloud computing, packet processing, data mining, multimedia, etc. Already available at the lab Intel Xeon Phi 60 x86-compatible cores Target market: coprocessor in servers, high performance computing Launched November 2012, general availability January 2013... on its way ! Noyau d'un système d'exploitation

Tilera TILE-Gx architecture Source: TILE-Gx8036 product brief, http://www.tilera.com/sites/default/files/productbriefs/TILE-Gx8036_PB033-02_0.pdf Noyau d'un système d'exploitation

Intel Xeon Phi architecture Source: Intel, http://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-codename-knights-corner Noyau d'un système d'exploitation

Common characteristics Interconnection network between cores Shared memory becomes a bottleneck TILE-Gx: mesh-like network Xeon Phi: ring interconnect Very fast Distributed cache architecture Each core has its own L1/L2 cache On L2 cache miss, the core looks up in the other L2 (virtual L3) Uses the interconnection network Direct L2 to I/O transfer to avoid main memory Tilera: 64k L1 + 256k L2 Xeon Phi: ? Very fast network, delay for interconnection: ~1-5 cycles per hop Noyau d'un système d'exploitation

Common characteristics In-memory filesystem 8 GB of memory no permanent storage The trace has to be stored somewhere else. High bandwidth I/O PCI Express link to host Tilera: 4 x 10GbE network controller Runs a full Linux OS Most standard tools (e.g. gdb, oprofile) are already compatible Noyau d'un système d'exploitation

Tilera TILE-Gx characteristrics Mesh network Developers can use it as a “software” ASIC Many hardware accelerators for Packet processor/router Cryptography (SSL, DSA, RSA, IPSec, etc...) Compression (gzip) Runs a hypervisor Possibility to dedicate cores to different simultaneously running OSes Possibility to run Zero Overhead Linux and bare metal applications Software asic: lots of small processors connected together, short wires Noyau d'un système d'exploitation

Work done Basic port of LTTng (UST and kernel) to the Tilera Only one small fix was necessary on the LTTng side A few issues reported to Tilera were fixed on their side Noyau d'un système d'exploitation

Planned work Direct port of LTTng to the platforms - Just get it to work Develop a benchmark suite - Various real-life, heavily parallel applications Find bottlenecks, optimize - Make use of the special communication hardware - Adapt to the architectural features Integrate the work - Find ways to abstract for other many-core platforms Noyau d'un système d'exploitation

Conclusion Problem: many-core = a lot of data to trace Require different approaches than classic processors Different bottlenecks / constraints New hardware features / accelerators Noyau d'un système d'exploitation

Question ? Noyau d'un système d'exploitation