Leveling the Field for Multicore Open Systems Architectures Markus Levy President, EEMBC President, Multicore Association.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Distributed Systems CS
Computer Abstractions and Technology
Designing and Optimizing Software for Intel® Architecture Multi-core Processors Peter van der Veen QNX Software Systems.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
1 Introduction to MIMD Architectures Sima, Fountain and Kacsuk Chapter 15 CSE462.
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
Parallel Computing Overview CS 524 – High-Performance Computing.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Measuring zSeries System Performance Dr. Chu J. Jong School of Information Technology Illinois State University 06/11/2012 Sponsored in part by Deer &
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Computer System Architectures Computer System Software
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
Advanced Operating Systems CIS 720 Lecture 1. Instructor Dr. Gurdip Singh – 234 Nichols Hall –
11 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray) Abdullah Gharaibeh, Lauro Costa, Elizeu.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Multicore In Real-Time Systems – Temporal Isolation Challenges Due To Shared Resources Ondřej Kotaba, Jan Nowotsch, Michael Paulitsch, Stefan.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Virtualization: Not Just For Servers Hollis Blanchard PowerPC kernel hacker.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
1 COMPSCI 110 Operating Systems Who - Introductions How - Policies and Administrative Details Why - Objectives and Expectations What - Our Topic: Operating.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Hybrid MPI and OpenMP Parallel Programming
Numerical Libraries Project Microsoft Incubation Group Mary Beth Hribar Microsoft Corporation CSCAPES Workshop June 10, 2008 Copyright Microsoft Corporation,
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
Operating System 4 THREADS, SMP AND MICROKERNELS.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Full and Para Virtualization
Outline Why this subject? What is High Performance Computing?
Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology September 19, 2007 Markus Levy, EEMBC and Multicore.
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Background Computer System Architectures Computer System Software.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
COMPSCI 110 Operating Systems
4- Performance Analysis of Parallel Programs
Current Generation Hypervisor Type 1 Type 2.
CS427 Multicore Architecture and Parallel Computing
Parallel Programming By J. H. Wang May 2, 2017.
Morgan Kaufmann Publishers
Parallel Programming in C with MPI and OpenMP
EE 193: Parallel Computing
Chapter 4: Threads.
Chapter 4: Threads.
Distributed Systems CS
Hybrid Programming with OpenMP and MPI
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
Chapter 4: Threads & Concurrency
Chapter 4: Threads.
Presentation transcript:

Leveling the Field for Multicore Open Systems Architectures Markus Levy President, EEMBC President, Multicore Association

Analyzing the Multicore Ecosystem Embedded Microprocessor Benchmark Consortium® (EEMBC) Industry benchmarks since 1997 Tool for evaluating embedded processors, compilers, systems MultiBench for shredding multicore processors

Enabling the Multicore Ecosystem Initial engagement began in May 2005 Industry-wide participation Current efforts Communications APIs Hypervisors Multicore Programming Practices Resource Management

Multicore Issues to Solve Concurrent programming Communications, synchronization, resource management between/among cores Performance analysis Debugging Distributed power management OS virtualization Modeling and simulation Load balancing Algorithm partitioning

Multicore Benchmarking Rules Do not rely on a single answer Match your application requirements –Small or large data sets –Few or many threads –Dependencies –OS overhead

Benchmarking Multicore – What’s Important? Measuring scalability Memory and I/O bandwidth Inter-core communications OS scheduling support Efficiency of synchronization System-level functionality

EEMBC Multicore Strategy Evaluation and future development of scalable SMP architectures –Includes multicore and manycore Measure impact of parallelization and scalability across both data processing and computationally-intensive tasks

Some Results Huge drop in performance when oversubscribed Nice scaling on networking only workloads Some benchmarks plateau earlier then expected MultiBench Results

Two Processor System Utilizing Single Memory Controller Quad Core Processor 1 Quad Core Processor 2 DDR2 FB DIMM Memory DDR2 Interface Processors 1 and 2 must always arbitrate for memory via their front side bus connection through the North Bridge. North Bridge Intel Front Side Bus

Dual Quads vs. Single Quad 1.Find max values for each workload 2.Ratio of all scores No Workloads Yield 2x Scaling

Two Processor System Utilizing Dual Memory Controllers Quad Core Processor 1 Quad Core Processor 2 Link DDR2 ECC Registered Memory DDR2 ECC Registered Memory DDR2 Interface DDR2 Interface Direct Access Shared Access Doubly Shared Access

Two Processor System Utilizing Single Memory Controller Quad Core Processor 1 Quad Core Processor 2 Link DDR2 ECC Registered Memory DDR2 Interface Processor 1 must always access memory by traversing link to Processor 2 Requires arbitration to access Processor 2’s memory Processor 2 always has prioritized access to this memory since it is directly attached. Affinity can help performance

Dual Quad Cores – Single Memory Controller Dual Quad Cores – Dual Memory Controllers

Rotation by multiple workers cooperating to process a single image. Each worker thread acquires slices and writes them to the output buffer. Potential bottlenecks related to memory interfaces and synchronization between worker threads.

Dual Quad Cores – Single Memory Controller Dual Quad Cores – Dual Memory Controllers

The Multicore Association Roadmap Communications - MCAPI: ultra-light weight Resource Management - Memory management - Basic synchronization - Resource registration - Resource partitioning Task Management -Task scheduling The Four MCA Pillars Virtualization (or OS) Communication Resource Management Task Management Debug Multicore System Adopted stds MCA Foundation APIs Value Added Functions Languages Programming Models Design Environments Application Generators Benchmarks Services Load Balancing System Mgt. Power Mgt. Reliability Quality of Service

Why Virtualize? Hardware consolidation Migration and hosting of legacy applications Resource management and balancing Faster provisioning Fault tolerance

Virtualized Multicore System Fractional CPU assignment for background tasks such as firmware update/system health monitor/power management Benchmarking involves many system-level elements CORE #2 CORE #3CORE #4CORE #1

EEMBC Hypermark Important Metrics Overall performance overhead of a hypervisor, i.e. CPU loading Static footprint (code and data size) Jitter Interrupt latency Comparison to native performance

20 Multicore Programming Practices (MPP) Long term –Continue research into languages, methodologies, etc Short term –How today’s embedded C/C++ code may be written to be “multicore ready” Influence of a group of like-minded methodology experts to ensure completeness, usefulness and industry-wide compatibility Creation of a standard “best practices” guide through a recognized, neutral industry body –Based on capturing current best practices

MPP Scope & Approach 21 Sequential C/C++ Architecture 1 e.g. Shared Memory Architecture 2 e.g. Programmer Managed Memory Architecture 3 e.g. Message Passing Focus on existing C/C++ without extensions, targeting current architectures Draw up framework of common pitfalls when transitioning from serial to parallel Consider solutions or avoidance tactics Analyze performance of solution

Summary Performance analysis highlights benefits and pitfalls EEMBC licensing and membership Portability prevents entrapment Multicore Association standards and working groups