University of Colorado at Boulder Core Research Lab Operating System Support for Pipeline Parallelism on Multicore Architectures Manish Vachharajani University.

Slides:



Advertisements
Similar presentations
University of Colorado at Boulder Core Research Lab FastForward for Efficient Pipeline Parallelism: A Cache-Optimized Concurrent Lock-Free Queue Tipp Moseley.
Advertisements

4. Shared Memory Parallel Architectures 4.4. Multicore Architectures
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
University of Colorado at Boulder Core Research Lab ZDDs for Dynamic Trace Analysis Graham Price Manish Vachharajani John Giacomoni John Michalakes Sreyasi.
Computer Abstractions and Technology
Memory Address Decoding
Introduction CSCI 444/544 Operating Systems Fall 2008.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU.
University of Michigan Electrical Engineering and Computer Science 1 Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution.
1 Threads, SMP, and Microkernels Chapter 4. 2 Process: Some Info. Motivation for threads! Two fundamental aspects of a “process”: Resource ownership Scheduling.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of.
University of Colorado at Boulder Core Research Lab Tipp Moseley, Graham Price, Brian Bushnell, Manish Vachharajani, and Dirk Grunwald University of Colorado.
Frame Shared Memory: Line-Rate Networking on Commodity Hardware
Static Translation of Stream Programming to a Parallel System S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz Programming Language Group School.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Real-Time Kernels and Operating Systems. Operating System: Software that coordinates multiple tasks in processor, including peripheral interfacing Types.
INTEL CONFIDENTIAL Why Parallel? Why Now? Introduction to Parallel Programming – Part 1.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
Computer System Architectures Computer System Software
A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian, Haojie Zhou, Yongqiang He,Li Zha 簡報人:碩資工一甲 董耀文.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Distributed Systems 1 CS- 492 Distributed system & Parallel Processing Sunday: 2/4/1435 (8 – 11 ) Lecture (1) Introduction to distributed system and models.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 1 Introduction Read:
Robert Fourer, Jun Ma, Kipp Martin Copyright 2006 An Enterprise Computational System Built on the Optimization Services (OS) Framework and Standards Jun.
1 Previous lecture review n Out of basic scheduling techniques none is a clear winner: u FCFS - simple but unfair u RR - more overhead than FCFS may not.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Tessellation: Space-Time Partitioning in a Manycore Client OS Rose Liu 1,2, Kevin Klues 1, Sarah Bird 1, Steven Hofmeyr 3, Krste Asanovic 1, John Kubiatowicz.
المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
An Overlay Network Providing Application-Aware Multimedia Services Maarten Wijnants Bart Cornelissen Wim Lamotte Bart De Vleeschauwer.
Chapter 7 Operating Systems. Define the purpose and functions of an operating system. Understand the components of an operating system. Understand the.
Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.
Dr. Alexandra Fedorova School of Computing Science SFU
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
VTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core Embedded Lab. Kim Sewoog Cong Xu, Sahan Gamage, Hui Lu, Ramana Kompella,
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
Processor Architecture
Abstract A Structured Approach for Modular Design: A Plug and Play Middleware for Sensory Modules, Actuation Platforms, Task Descriptions and Implementations.
Floating Point Numbers & Parallel Computing. Outline Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing.
1 Grid Activity Summary » Grid Testbed » CFD Application » Virtualization » Information Grid » Grid CA.
Lecture 1: Network Operating Systems (NOS) An Introduction.
Martin Kruliš by Martin Kruliš (v1.1)1.
University of Michigan Electrical Engineering and Computer Science 1 Embracing Heterogeneity with Dynamic Core Boosting Hyoun Kyu Cho and Scott Mahlke.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 OS 1.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
1.1 Sandeep TayalCSE Department MAIT 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming Batched Systems Time-Sharing Systems.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Multimedia Retrieval Architecture Electrical Communication Engineering, Indian Institute of Science, Bangalore – , India Multimedia Retrieval Architecture.
Programming Multi-Core Processors based Embedded Systems A Hands-On Experience on Cavium Octeon based Platforms Lab Exercises: Lab 4 (Network Packet Filtering)
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Static Translation of Stream Program to a Parallel System S. M. Farhad The University of Sydney.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Mohit Aron Peter Druschel Presenter: Christopher Head
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Task Scheduling for Multicore CPUs and NUMA Systems
Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Topic 11 Amazon Web Services Prof. Zhang Gang
Hierarchical Architecture
Distributed Shared Memory
Superscalar Processors & VLIW Processors
Introduction to Operating Systems
Week1 software - Lecture outline & Assignments
Presentation transcript:

University of Colorado at Boulder Core Research Lab Operating System Support for Pipeline Parallelism on Multicore Architectures Manish Vachharajani University of Colorado at Boulder John Giacomoni

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab Problem UP performance at “end of life” Chip-Multiprocessor systems –Individual cores less powerful than UP –Asymmetric and Heterogeneous –10s-100s-1000s of cores What do we want from multicore system? Intel (2x2-core)MIT RAW (16-core)100-core400-core

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab Why Pipeline Parallelism? (Extracting Performance) Task Parallelism –Desktop Data Parallelism –Web serving –Split/Join, MapReduce, etc… Pipeline Parallelism –Video decoding –Network processing

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab Soft Network Processing (Soft-NP) GigE Network Properties: 1,488,095 frames/sec 672 ns/frame Frame dependencies How do we protect?

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab Frame Shared Memory (Soft-NP) Input (IP) Output(OP)

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab Low-Overhead Communication Syscalls ~170nspthread mutex ~200ns Gigabit Ethernet

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab FastForward Portable software only framework –~35-40ns/queue operation 2.0 GHz AMD Opteron –Architecturally tuned CLF queues Works with all consistency models Hides die-die communication (See Poster at PACT)

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab Zero-Stall Guarantee

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab Gang Scheduling Dedicate resources to pipeline applications Optimize for performance –Instead of system throughput or fairness –Computer Utility -> max(System Utilization) –Multicore system -> excess of resources. Want selective timesharing

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab Pipelineable OS Services

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab Pipelineable OS Services (2) Synchronous calls introduce too much overhead Asynchronous calls may limit parallelism Solution: OS Services with independent I/O paths

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab Heterogeneous Gang Scheduling Mixed stages from multiple process domains Can also mix in heterogeneous components Need a single scheduling label for every pipeline stage –Ensures simultaneous scheduling of every necessary resource Scheduling multi-domain entities

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab Multi-Domain Entities Pipeline private state Stage private state –State shared only with stage’s parent process The multi-domain application model respects the private data model implicit in single-domain applications while providing first- class naming for multi-domain pipelines.

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab Summary of Findings 1)Low-overhead communication 2)Zero-stall guarantee 1)Selective timesharing 3)Pipelineable OS services 4)Heterogenous Gang Scheduling 5)Pipelines as multi-domain applications

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab Questions?

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab Approach Turn over old rocks

University of Colorado at Boulder Core Research Lab University of Colorado at Boulder Core Research Lab Security US –Health Insurance Portability and Accountability Act –Gramm-Leach-Bliley –Sarbanes-Oxley EU –Data Protection Directive