Published in ACM SIGPLAN, 2010 Heidi Pan MassachusettsInstitute of Technology Benjamin Hindman UC Berkeley Krste Asanovi´c UC Berkeley 1.

Slides:



Advertisements
Similar presentations
Part IV: Memory Management
Advertisements

Scheduler Activations: Effective Kernel Support for the User-level Management of Parallelism Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska,
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 2 nd Edition Chapter 4: Threads.
Chapter 5 Processes and Threads Copyright © 2008.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
Based on Silberschatz, Galvin and Gagne  2009 Threads Definition and motivation Multithreading Models Threading Issues Examples.
3.5 Interprocess Communication
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Threads CSCI 444/544 Operating Systems Fall 2008.
 2004 Deitel & Associates, Inc. All rights reserved. Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for Threads.
4.7.1 Thread Signal Delivery Two types of signals –Synchronous: Occur as a direct result of program execution Should be delivered to currently executing.
Threads. Processes and Threads  Two characteristics of “processes” as considered so far: Unit of resource allocation Unit of dispatch  Characteristics.
ThreadsThreads operating systems. ThreadsThreads A Thread, or thread of execution, is the sequence of instructions being executed. A process may have.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
User-Level Process towards Exascale Systems Akio Shimada [1], Atsushi Hori [1], Yutaka Ishikawa [1], Pavan Balaji [2] [1] RIKEN AICS, [2] Argonne National.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
EECS 262a Advanced Topics in Computer Systems Lecture 12 Multiprocessor/Realtime Scheduling October 8 th, 2012 John Kubiatowicz and Anthony D. Joseph Electrical.
9/13/20151 Threads ICS 240: Operating Systems –William Albritton Information and Computer Sciences Department at Leeward Community College –Original slides.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Threads, Thread management & Resource Management.
OS provide a user-friendly environment and manage resources of the computer system. Operating systems manage: –Processes –Memory –Storage –I/O subsystem.
Threads by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.
Multithreading in Java Project of COCS 513 By Wei Li December, 2000.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
Tessellation: Space-Time Partitioning in a Manycore Client OS Rose Liu 1,2, Kevin Klues 1, Sarah Bird 1, Steven Hofmeyr 3, Krste Asanovic 1, John Kubiatowicz.
Scheduler Activations: Effective Kernel Support for the User- Level Management of Parallelism. Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska,
 2004 Deitel & Associates, Inc. All rights reserved. 1 Chapter 4 – Thread Concepts Outline 4.1 Introduction 4.2Definition of Thread 4.3Motivation for.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
The Performance of μ-Kernel-Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presenter: Sunita Marathe.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
Introduction to Operating Systems and Concurrency.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
Department of Computer Science and Software Engineering
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Processes and Virtual Memory
Lithe Composing Parallel Software Efficiently
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
1 OS Review Processes and Threads Chi Zhang
Threads-Process Interaction. CONTENTS  Threads  Process interaction.
Threads. Thread A basic unit of CPU utilization. An Abstract data type representing an independent flow of control within a process A traditional (or.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
Building Composable Parallel Software with Liquid Threads Heidi Pan*, Benjamin Hindman +, Krste Asanovic + *MIT, + UC Berkeley Microsoft Numerical Library.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.
B ERKELEY P AR L AB 1 Lithe: Enabling Efficient Composition of Parallel Libraries Heidi Pan, Benjamin Hindman, Krste Asanović HotPar  Berkeley, CA  March.
Processes Chapter 3. Processes in Distributed Systems Processes and threads –Introduction to threads –Distinction between threads and processes Threads.
June 13-15, 2010SPAA Managing the Complexity of Lookahead for LU Factorization with Pivoting Ernie Chan.
B ERKELEY P AR L AB Lithe Composing Parallel Software Efficiently PLDI  June 09, 2010 Heidi Pan, Benjamin Hindman, Krste Asanovic  {benh,
1 Chapter 5: Threads Overview Multithreading Models & Issues Read Chapter 5 pages
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
Lecture 5. Example for periority The average waiting time : = 41/5= 8.2.
Chapter 4 – Thread Concepts
Introduction to threads
Chapter 4: Multithreaded Programming
The Mach System Sri Ramkrishna.
Chapter 4 – Thread Concepts
Chapter 4: Threads.
Chapter 26 Concurrency and Thread
Multithreaded Programming
Operating System Introduction.
Chapter 4: Threads & Concurrency
Thomas E. Anderson, Brian N. Bershad,
- When you approach operating system concepts there might be several confusing terms that may look similar but in fact refer to different concepts:  multiprogramming, multiprocessing, multitasking,
Presentation transcript:

Published in ACM SIGPLAN, 2010 Heidi Pan MassachusettsInstitute of Technology Benjamin Hindman UC Berkeley Krste Asanovi´c UC Berkeley 1

Outline Introduction Cooperative Hierarchical Resource Management Lithe Primitives Lithe Interface Interoperable Parallel Libraries Evaluation Conclusion 2

Introduction For productivity, programmers would like to reuse parallel libraries by composing them together to build applications Compose parallel libraries efficiently is hard Poor resource management across parallel libraries 3

Introduction(cont.) Lithe, is a low-level substrate that provides the basic primitives for parallel execution and a standard interface for composing arbitrary parallel libraries efficiently. Replace the virtualized thread abstraction with an unvirtualized hardware thread(hart) primitive, to represent a processing resource. 4

Introduction(cont.) Using Lithe 5

Introduction(cont.) Harts must be explicitly allocated and shared amongst the different libraries. The paper have ported Intel’s Threading Building Blocks (TBB) and GNU’s OpenMP libraries to run with Lithe. 6

Cooperative Hierarchical Resource Management Applications are built by composing libraries hierarchically GraphicsMagick library -> Network graphics library libpng -> Compression library zlib 7

Lithe Primitives Lithe provides two basic primitives, harts and contexts, to enable library runtimes to perform parallel execution. Hart primitive prevents uncontrolled oversubscription of the machine. Context primitive prevents undersubscription of the machine. 8

Lithe Primitives Hart(hardware thread) represents a processing resource, one hardware thread per physical core. Within an application, there is a fixed one-to-one mapping between harts and the physical hardware thread contexts A hart must be allocated to a runtime before that runtime can execute code which is in contrast to threads 9

Lithe Primitives(cont.) Each hart always has an associated context, which acts as the execution vessel for the computation running on the hart. Contexts allow runtimes to interoperate with libraries that may need to block the current computation. 10

Lithe Interface Lithe substrate defines two standard interfaces: 1. A runtime interface that parallel runtimes use to share harts and manipulate contexts. 2. A scheduler callback interface that each parallel runtime must implement to manage its own harts and interoperate with others. At any point in time, a hart is managed by exactly one scheduler, and each scheduler knows exactly which harts are under its control. 11

Lithe Interface(cont.) 12

Lithe Interface(cont.) Base Scheduler: Lithe runtime provides a base scheduler for the application. Runtime Functions and Callbacks: 13

Lithe Interface(cont.) 14

Lithe Interface(cont.) Contexts 15

Will spawn N task Resume paused Wait until all task completed Satisfy children 16

Interoperable Parallel Libraries TBB OpenMP Barrier 17

Interoperable Parallel Libraries TBB Each TBB scheduler manages harts rather than thread OpenMP Each team scheduler multiplexes its workers onto its available harts. 18

Interoperable Parallel Libraries Barrier Implemented some simple user-level barriers in order to experiment with how synchronization primitives interact with parallel libraries. 19

Evaluation Hardware platform: quad-socket 2.3GHz AMD Opteron with 4 cores per socket( 16 cores total). OS platform: bit linux kernel Glibc TBB 2.1 OpenMP from GCC 4.4 trunk. 20

Evaluation(cont.) Ported Libraries To validate that porting libraries to Lithe incurs negligible or no overhead. cg: conjugate gradient ft: fast Fourier transform is: integer sort 21

Evaluation(cont.) Use an algorithm for sparse QR factorization(SPQR) commonly used in the linear least-squares method for solving geodetic survey, photogrammetry, surface fitting. Original Out-of-the-Box Implementation Algorithm using TBB to create the parallel tasks Call parallel BLAS (basic linear algebra subprogram) matrix routines from MKL(Math Kernel Library). MKL uses OpenMP to parallelize itself. 22

Evaluation(cont.) 23

Evaluation(cont.) Execution Time (seconds) Thread Num 24

Evaluation(cont.) 25 Thread Num

Evaluation-cache miss 26

Evaluation-context switch 27

Evaluation(cont.) 28

Conclusion Using Lithe, we are able to implement multiple existing parallel abstractions with no measurable overhead, while enabling parallel libraries to be composed efficiently. 29