March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.

Slides:



Advertisements
Similar presentations
Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.
Advertisements

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*
Claude TADONKI Mines ParisTech – LAL / CNRS / INP 2 P 3 University of Oujda (Morocco) – October 7, 2011 High Performance Computing Challenges and Trends.
SYNAR Systems Networking and Architecture Group CMPT 886: Special Topics in Operating Systems and Computer Architecture Dr. Alexandra Fedorova School of.
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
Introduction CS 524 – High-Performance Computing.
Distributed Systems Architectures
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
May 25, 2010 Mary Hall May 25, 2010 Advancing the Compiler Community’s Research Agenda with Archiving and Repeatability * This work has been partially.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Evaluating the Tera MTA Allan Snavely, Wayne Pfeiffer et.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
Parallel Programming Models and Paradigms
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
CS 300 – Lecture 2 Intro to Computer Architecture / Assembly Language History.
Outline Chapter 1 Hardware, Software, Programming, Web surfing, … Chapter Goals –Describe the layers of a computer system –Describe the concept.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
© 2009 Mathew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer Sciences Department University of Wisconsin, Madison,
08/21/2012CS4230 CS4230 Parallel Programming Lecture 1: Introduction Mary Hall August 21,
Computer System Architectures Computer System Software
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
1 소프트웨어공학 강좌 Chap 9. Distributed Systems Architectures - Architectural design for software that executes on more than one processor -
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
I-SPAN’05 December 07, Process Scheduling for the Parallel Desktop Designing Parallel Operating Systems using Modern Interconnects Process Scheduling.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
Multi-Core Architectures
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.
Distributed Database Systems Overview
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
Dr. Alexandra Fedorova School of Computing Science SFU
Chapter 2 Introduction to Systems Architecture. Chapter goals Discuss the development of automated computing Describe the general capabilities of a computer.
Improving I/O with Compiler-Supported Parallelism Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds.
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
1 "Workshop 31: Developing a Hands-on Undergraduate Parallel Programming Course with Pattern Programming SIGCSE The 44 th ACM Technical Symposium.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Programmability Hiroshi Nakashima Thomas Sterling.
John Demme Simha Sethumadhavan Columbia University.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Gauss Students’ Views on Multicore Processors Group members: Yu Yang (presenter), Xiaofang Chen, Subodh Sharma, Sarvani Vakkalanka, Anh Vo, Michael DeLisi,
Parallel Computing Presented by Justin Reschke
Background Computer System Architectures Computer System Software.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
Conclusions on CS3014 David Gregg Department of Computer Science
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Presentation transcript:

March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming

March 18, 2008SSE Meeting 2 The Multicore Paradigm Shift: Technology Drivers

March 18, 2008SSE Meeting 3 Key ideas: –Movement away from increasingly complex processor design and faster clocks –Replicated functionality (i.e., parallel) is simpler to design –Resources more efficiently utilized –Huge power management advantages What to do with all these transistors? Part 1: Technology Trends

March 18, 2008SSE Meeting 4 The Architectural Continuum Supercomputer: IBM BG/L Commodity Server: Sun Niagara Embedded: Xilinx Virtex 4

March 18, 2008SSE Meeting 5 Multicore: Impact on Software Consequences: –Individual processors will no longer get faster. At first, they might get a little slower. –Today’s software may not perform as well on tomorrow’s hardware as written. And forget about adding capability!  The very future of the computing industry demands successful strategies for applications to exploit parallelism across cores!

March 18, 2008SSE Meeting 6 We are at the cusp of a transition to multicore, multithreaded architectures, and we still have not demonstrated the ease of programming the move will require… I have talked with a few people at Microsoft Research who say this is also at or near the top of their list [of critical CS research problems]. Justin Rattner, CTO, Intel Corporation The Multicore Paradigm Shift: Computing Industry Perspective

March 18, 2008SSE Meeting 7 The Rest of this Talk Convergence of high-end, conventional and embedded computing –Application development and compilation strategies for high-end (supercomputers) are now becoming important for the masses Why? –Technology trends (Motivation) Looking to the future 1.Automatically generating parallel code is useful, but insufficient. 2.Parallel computing for the masses demands better parallel programming paradigms. 3.Compiler technology will become increasingly important to deal with a diversity of optimization challenge… and must be engineered for managing complexity and adapting to new architectures. 4.Potential to exploit vast machine resources to automatically compose applications and systematically tune application performance. 5.New tunable library and component technology.

March 18, 2008SSE Meeting 8 1. Automatic Parallelization Old approaches: –Limited to loops and array computations –Difficult to find sufficient granularity (parallel work between synchronization) –Success from fragile, complex software New ideas in this area: –Finer granularity of parallelism -- more plentiful –Combine with hardware support (e.g., speculation and multithreading) From Hall et al., “Maximizing Multiprocessor Performance with the SUIF Compiler”, IEEE Computer, Dec

March 18, 2008SSE Meeting 9 2. Parallel Programming State of the Art Three dominant classes of applications DomainsAppl. CharacteristicsProgramming Paradigms Scientific Computing Very large arrays representing simulation region, loops, data parallel MPI dominant, Also, OpenMP, PGAS Grids & distributed computing Databases Queries over large data sets, often distributed Query languages like SQL Systems and Embedded Software Fine-grain threads, small number of processors Low-level threading such as Pthreads Domain-specific, intellectually challenging and low-level programming models not suitable for the masses.

March 18, 2008SSE Meeting New Parallel Programming Paradigms Transactional memory –Section of code executes atomically with subsequent commit or rollback –Programming model + hardware support Streams and data-parallel models –Data streams describe the flow of data –Well-suited for certain applications and hardware (IBM Cell, GPUs) Domain-specific languages and libraries –Parallelism implicit within implementation Different applications and users demand different solutions. Convergence unlikely. Architecture independence?

March 18, 2008SSE Meeting Engineering a Compiler Compiler research will play a crucial role in achieving performance and programmability of multi-core hardware. What is the state of compilers today? –Roughly 5 year lag between introducing a new architecture and a robust compiler –Many interesting new architectures fail in the marketplace due to inadequate software tools Today’s compilers are complex and monolithic –SUIF has ~500K LOC, Open64 has ~12M LOC  The best research ideas do not always make it into practice

March 18, 2008SSE Meeting 12 Batch Compiler code input data 3. A New Kind of “Compiler” Traditional view:

March 18, 2008SSE Meeting 13 3 & 4. Performance Tuning “Compiler” Code Translation code input data (characteristics) Experiments Engine transformation script(s) search script(s)

March 18, 2008SSE Meeting Auto-tuner Code Translation code input data (characteristics) Experiments Engine transformation script(s) search script(s)

March 18, 2008SSE Meeting 15 Heterogeneous: Additional Complexity Device Type 1 Device Type 2 Device Type 3 Device Type 4 Memory Staging Data to/from global memory Other: Utilizing highly tuned libraries Differences in programming models (GPP +FPGA is extreme example) Partitioning: Where to execute? Managing data movement and synchronization

March 18, 2008SSE Meeting 16 Traditional View Expanded View Code (source or binary) Interface: Provides/ Requires Data Description: Types, Sizes Partial Code (source or tunable binary) Code Generator Interface: Abstract Provides/ Requires Data Description: Types, Sizes Interface: Device Dependencies Data Description: Map Features to Optimization Performance: Device, Data Features 5. Libraries and Component Technology Support for automatic selection, tuning, scheduling, etc.

March 18, 2008SSE Meeting 17 Summary Parallel computing is everywhere! –And we need software tools –Can we find some common ground? Strategies –Automatic parallelization –Libraries and domain-specific tools that hide parallelism  component technology –New programming languages –Auto-tuners to “test” alternative solutions General approach to solving challenges –Education: CS503, Parallel Programming –Organize the community to support incremental LONG TERM development.