Software Enablement for Multicore Architectures

Slides:



Advertisements
Similar presentations
The Interaction of Simultaneous Multithreading processors and the Memory Hierarchy: some early observations James Bulpin Computer Laboratory University.
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Structure of Computer Systems
Introductions to Parallel Programming Using OpenMP
Computer Abstractions and Technology
Thoughts on Shared Caches Jeff Odom University of Maryland.
Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
Contemporary Languages in Parallel Computing Raymond Hummel.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Computer System Architectures Computer System Software
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Multi-core architectures. Single-core computer Single-core CPU chip.
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
Multi-Core Architectures
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
GPU in HPC Scott A. Friedman ATS Research Computing Technologies.
Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Parallel Computing.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology September 19, 2007 Markus Levy, EEMBC and Multicore.
Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
“Processors” issues for LQCD January 2009 André Seznec IRISA/INRIA.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Feeding Parallel Machines – Any Silver Bullets? Novica Nosović ETF Sarajevo 8th Workshop “Software Engineering Education and Reverse Engineering” Durres,
Compilers: History and Context COMP Outline Compilers and languages Compilers and architectures – parallelism – memory hierarchies Other uses.
Parallel Programming Models
Introduction to threads
Computer Organization and Architecture Lecture 1 : Introduction
These slides are based on the book:
Conclusions on CS3014 David Gregg Department of Computer Science
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Productive Performance Tools for Heterogeneous Parallel Computing
Chapter 4: Multithreaded Programming
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Introduction to Parallel Processing
Microarchitecture.
CS5102 High Performance Computer Systems Thread-Level Parallelism
Distributed Processors
Pattern Parallel Programming
Constructing a system with multiple computers or processors
Multi-Processing in High Performance Computer Architecture:
Intel® Parallel Studio and Advisor
Guoliang Chen Parallel Computing Guoliang Chen
Chapter 4: Threads.
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Hybrid Programming with OpenMP and MPI
Multithreaded Programming
Chapter 4: Threads & Concurrency
Chapter 4 Multiprocessors
Database System Architectures
Introduction of Multicore Impacts
Types of Parallel Computers
Presentation transcript:

Software Enablement for Multicore Architectures David Bernstein Bilha Mendelson Bernstn@il.ibm.com bilha@il.ibm.com

Conventional Bulk CMOS SOI (silicon-on-insulator) Technology Scaling – We’ve Hit The Wall 0.2 0.4 0.6 0.8 1 2 4 6 8 10 20 1988 1992 1996 2000 2004 2008 2012 Conventional Bulk CMOS SOI (silicon-on-insulator) High mobility Double-Gate Year Relative Device Performance ? 11/14/2018

Has This Ever Happened Before? 140 Bipolar CMOS IBM RY5 IBM GP IBM RY6 Apache Pulsar Merced IBM RY7 IBM RY4 Pentium II(DSIP) Pentium 4 120 IBM ES9000 ? 100 80 Fujitsu VP2000 Watts / cm2 IBM 3090S 60 NTT Fujitsu M-780 40 IBM 3090 Start of CDC Cyber 205 20 Water Cooling IBM 4381 IBM 3081 Fujitsu M380 IBM 370 IBM 3033 IBM 360 Vacuum 1950 1960 1970 1980 1990 2000 2010 Source: Bernie Meyerson, IBM 11/14/2018

Industry trends Intel Quad-Core Sun’s 8-Core Chips: T1 - Niagra Cell Broadband Engine 11/14/2018

Hierarchy of Modular Building Blocks Systems will increasingly need to implement a hybrid execution model New programming systems need to reduce the need for programmer awareness of the topology on which their program executes Grid/Cluster High Speed Network Hierarchical SMP servers with non-uniform memory access characteristics Rack High Speed Network Hierarchical SMP servers with NUMA characteristics Board SMP Interconnect Homogenous SMP on Board 2 – 128 HW contexts on board Main Processor(s) with Accelerator(s) Master-Slace relationship between entities Memory Memory Chip Homogenous SMP on chip 2-32 HW contexts on chip Various forms of resource sharing Heterogenous collection of processors on chip Heterogenity at data and control flow level Cache I/O Attach Interconnect Fabric Mem Ctrl Core Core The next gen programming system must support programming simplicity while leveraging the performance of the underlying HW topology. Core Core Core will support multiple HW threads sharing a single cache exhibiting SMP characteristics. 11/14/2018

Architecture trends Several processor cores on a chip and specialized computing engines XML processing, cryptography, graphics Questions: how to interconnect large number of processor cores how to provide sufficient memory bandwidth how to structure the multilevel caching subsystem how to balance the general purpose computing resources with specialized processing engines and all the supporting memory, caching and interconnect structure, given a constant power budget Software development processes how to program for multicore architectures how to test and evaluate the performance of multithreaded applications 11/14/2018

Programming multiprocessor systems Two main directions: explicit manual programming exploit the combination of compiler optimization, build tool chains, and run-time subsystems In HPC and embedded communities: emphasis was more on explicit manual programming and special resources by expert programmers resulted in numerous home-grown language directives and extensions, internal tools, obscure run-time systems hardly portable to new generations of hardware 11/14/2018

Programming languages Very few new languages were invented in the last 2 decades Java - virtual machine, interpreter, JIT, garbage collection, set of libraries, etc. Can multicore spur development of new language/environment for parallelism? map-reduce, cilk, UPC, X10, and STAPL programmers can provide additional information related to parallelism Multicore provide multiple types of parallelism thread-level parallelism (TLP) – coarse-grain OpenMP - standard for shared-memory models MPI - standard for distributed-memory models pthreads, java threads - explicitly use automatic parallelization optimizations Most of the original auto-parallelizing compilers focused on FORTRAN data-level parallelism (DLP) – fine-grain auto-vectorization, auto-simdification What about asymmetric multicore architectures (like Cell processor)? is it possible to have a single source compilation for multiple ISAs? - initial attempts… how OpenMP can be used for programs - streaming 11/14/2018

Performance Analysis Tools Profile based tools – data aggregation FDPR-Pro, Code Analyzer, Diablo Performance evaluation is heavily influenced by thread interaction stales, locks, races, memory thrashing, pollute hardware counters trace-based analysis and visualization introduces timeline views and data to deal with communication issues lack of scalability: tend to grow fast, making it difficult to manipulate and visualize In HPC context: selecting arbitrary subset of cores/threads and arbitrary time intervals tracing might disturbs program's behavior HPCToolkit, TAU, Paraver, VTune, Code Analyzer, PDT, Trace Analyzer Lack of determinism 11/14/2018

Performance tools for multi-core: Cell Visual Performance Analyzer 5.0 Cell SDK 3.0 Profile Analyzer Code Analyzer Pipeline Analyzer Trace Analyzer PDT Lock Analyzer Infrastructure for collecting profiles on several systems Infrastructure for using databases for large data sets Set of interconnected views Cell support Infrastructure for collecting traces on SDK 3.0 libraries Analysis of lock usage Input for Trace Analyzer 11/14/2018

Debugging and testing tools Concurrent problems constitute about 10% of the bugs Bugs like crashes (races) or freeze (deadlocks) stay in the application reducing the up-time Testing is done at load testing - very late in the process We have been working on a tool supported methodology try to find the concurrency issues as early as possible: teach how to write concurrent code concurrent bug patterns explain the concurrent programming constructs teach general concurrency design patterns reviews - developed a specialized review technique for concurrent code teach how to do unit testing - developed synchronization coverage ConTest - a tool supported method for measuring contention Make the tests that are likely to exhibit bugs - changing the internal timing Tools for pinpointing locations of bugs if we have a test that we can cause the application to fail some of the time healing bugs so that the impact will not be seen 11/14/2018

Software trends Software enablement system for multicores Various directions for providing solutions Active area of research only some early results in the academic and industrial worlds in terms of established standards and technology much more will evolve in the years to come Need: programming models and compiler support for multicores performance evaluation tools testing and debugging tools 11/14/2018

Thank You 11/14/2018