Optimization for the Linux kernel and Linux OS C. Tyler McAdams

Slides:

Advertisements

Similar presentations

Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel ® Software Development.

Advertisements

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.

MP3 Optimization Exploiting Processor Architecture and Using Better Algorithms Mancia Anguita Universidad de Granada J. Manuel Martinez – Lechado Vitelcom.

Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.

Chapter 4: Threads. Overview Multithreading Models Threading Issues Pthreads Windows XP Threads.

Threads Section 2.2. Introduction to threads A thread (of execution) is a light-weight process –Threads reside within processes. –They share one address.

SCHEDULER ACTIVATIONS Effective Kernel Support for the User-level Management of Parallelism Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, Henry.

Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.

Incremental Path Profiling Kevin Bierhoff and Laura Hiatt Path ProfilingIncremental ApproachExperimental Results Path profiling counts how often each path.

CPS110: Implementing threads/locks on a uni-processor Landon Cox.

ThreadsThreads operating systems. ThreadsThreads A Thread, or thread of execution, is the sequence of instructions being executed. A process may have.

Chapter 91 Memory Management Chapter 9   Review of process from source to executable (linking, loading, addressing)   General discussion of memory.

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit

1 Intel® Compilers For Xeon™ Processor.

1 Day 1 Module 2:. 2 Use key compiler optimization switches Upon completion of this module, you will be able to: Optimize software for the architecture.

Intel® Composer XE for HPC customers July 2010 Denis Makoshenko, Intel, SSG.

Intel® IPP. Fighting for the performance Intel® IPP. Fighting for the performance Novosibirsk, 2008 Boris Sabanin Novosibirsk, 2008 Boris Sabanin.

Ultra sound solution Impact of C++ DSP optimization techniques.

CCS APPS CODE COVERAGE. CCS APPS Code Coverage Definition: –The amount of code within a program that is exercised Uses: –Important for discovering code.

Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.

System Software for Parallel Computing. Two System Software Components Hard to do the innovation Replacement for Tradition Optimizing Compilers Replacement.

Performance of mathematical software Agner Fog Technical University of Denmark

1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.

Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.

Page 1 Towards a Schedu Capturing OS Expertise in an Event Type System: the Bossa Experience Julia L. Lawall DIKU, University of Copenhagen Joint work.

Single Node Optimization Computational Astrophysics.

A parallel High Level Trigger benchmark (using multithreading and/or SSE)‏ Håvard Bjerke.

Design of A Custom Vector Operation API Exploiting SIMD Intrinsics within Java Presented by John-Marc Desmarais Authors: Jonathan Parri, John-Marc Desmarais,

Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Lecture #20: Profiling NetBeans Profiler 6.0.

Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.

Instruction Scheduling Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

ICC Optimization for the Linux Kernel and Linux Operating System C. Tyler McAdams Project Founder

1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,

Introduction to threads

CMIT100 CHAPTER 13 - SOFTWARE.

High-level optimization Jakub Yaghob

Andreas Hoffmann Andreas Ropers Tim Kogel Stefan Pees Prof

EMERALDS Landon Cox March 22, 2017.

Current Generation Hypervisor Type 1 Type 2.

Chapter 8: Main Memory.

Android Runtime – Dalvik VM

Chapter 4 Threads.

Lecture 5: GPU Compute Architecture

Vector Processing => Multimedia

Implementation of IDEA on a Reconfigurable Computer

Instruction Scheduling for Instruction-Level Parallelism

Intel® Parallel Studio and Advisor

Lecture 5: GPU Compute Architecture for the last time

More examples How many processes does this piece of code create?

Compiler Back End Panel

Chapter 26 Concurrency and Thread

Optimize Your Java Code By Tools

STUDY AND IMPLEMENTATION

Compiler Back End Panel

Coe818 Advanced Computer Architecture

Compiler Front End Panel

PROCESS MANAGEMENT Information maintained by OS for process management

X10 Future Plans & Discussion

Lecture Topics: 11/1 General Operating System Concepts Processes

Reverse engineering through full system simulations

Samuel Larsen Saman Amarasinghe Laboratory for Computer Science

Multi-Core Programming Assignment

Why Threads Are A Bad Idea (for most purposes)

Min Heap Update E.g. remove smallest item 1. Pop off top (smallest) 3

System Calls System calls are the user API to the OS

Why Threads Are A Bad Idea (for most purposes)

Why Threads Are A Bad Idea (for most purposes)

Multicore and GPU Programming

CS Introduction to Operating Systems

Presentation transcript:

Optimization for the Linux kernel and Linux OS C. Tyler McAdams

The present state of the Kernel

There are many spork kernels Who could for get the iMac line? Beautiful Sold well Optimized?! No. Too much debugger code (riiiight) Slow as crap but had iTunes!

Why do distro kernels end up sporks? They have to be ready to work with anything. They have no idea what cpu you're going to put it on Everybody's fav: Backwards compatibility

What if we could custom build? Target the CPU arch Target the system use Remove code bulk that is not needed for the design Use a compiler expressly designed for optimization

Linux + ICC Makes this Possible! Linux is open source so we can shape it like an artist does clay ICC employs powerful optimization techniques that can exploit this openness for speed

So what does ICC add to the Mix? IPO Interprocedural optimization PGO Profile Guided Optimization Highend Vectorization Highend Math Algorithms Optimized threading

IPO: Interprocedural Optimization IPO is a heuristically based optimization scheme that can be implemented on entire programs or single files. IPO can eliminate inefficient wasted use of cpu registers, and SIMD units and more.

PGO: Profile Guided Optimization PGO uses multiple stages to create code that executes optimality for what ever the system is being used for Stage 1: make, execute and analyze execution Stage 2: make and optimize with stage 1 data

Vectorization: MMX, SSE*, etc Vectorization was first used back in the '60s as a way to instrument the compiler to find and optimize loops analyzed in code. Today more practical examples of it's use is apps like Photoshop

Other Diversified ICC Existentials Debugger Threading Building Blocks Integrated Performance Primitives Math Kernel Library