Can TM help in addressing the “ Multicore Software Scaling Problem? ” Microsoft TM Panel July 2007 Nir Shavit Tel Aviv University.

Slides:



Advertisements
Similar presentations
Inferring Locks for Atomic Sections Cornell University (summer intern at Microsoft Research) Microsoft Research Sigmund CheremTrishul ChilimbiSumit Gulwani.
Advertisements

Introduction Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used in EMF. Read the TexPoint manual.
The many faces of TM Tim Harris. Granularity Distributed, large-scale atomic actions Composable shared memory data structures Leaf shared memory data.
The Art of Multiprocessor Programming Nir Shavit, Ori Shalev CS Spring 2007 (Based on the book by Herlihy and Shavit)
?????????? ? ? ? ? ? ? ? ? ? ????????? ????????? ????????? ????????? ????????? ????????? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? ? ?? ?? TM.
CSE431 Chapter 7A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 7A: Intro to Multiprocessor Systems Mary Jane Irwin (
Princess Sumaya Univ. Computer Engineering Dept. Chapter 7:
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
It’s the Software, Stupid James Larus Microsoft Research April 2005.
Introduction Companion slides for
Barrier Synchronization Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
What is next for accelerators? Turf war or collaboration? Stefan Möhl, Co-Founder, Chief Strategy Officer, Mitrionics.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Introduction Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Rajeev Alur for CIS 640 at Penn, Spring.
Amdahl's Law.
Software Transactional Memory Nir Shavit Tel-Aviv University and Sun Labs “Where Do We Come From? What Are We? Where Are We Going?”
Copyright © 2010, Oracle and/or its affiliates. All rights reserved. Who’s Afraid of a Big Bad Lock Nir Shavit Sun Labs at Oracle Joint work with Danny.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
CS 221 – May 13 Review chapter 1 Lab – Show me your C programs – Black spaghetti – connect remaining machines – Be able to ping, ssh, and transfer files.
Computer System Architectures Computer System Software
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Multicore Programming Nir Shavit Tel Aviv University.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
CS4402 – Parallel Computing Lecture 1: Classification of Parallel Computers Classification of Parallel Computation Important Laws of Parallel Compuation.
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
Introduction Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used in EMF. Read the TexPoint manual.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Dean Tullsen UCSD.  The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive.
University of Washington What is parallel processing? Spring 2014 Wrap-up When can we execute things in parallel? Parallelism: Use extra resources to solve.
1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.
1 Lecture #21 Shared Objects and Concurrent Programming This material is not available in the textbook. The online powerpoint presentations contain the.
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
TM Parallel Concepts An introduction. TM The Goal of Parallelization Reduction of elapsed time of a program Reduction in turnaround time of jobs Overhead:
Advanced Computer Networks Lecture 1 - Parallelization 1.
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Background Computer System Architectures Computer System Software.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
University of Washington 1 What is parallel processing? When can we execute things in parallel? Parallelism: Use extra resources to solve a problem faster.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Concurrency Idea. 2 Concurrency idea Challenge –Print primes from 1 to Given –Ten-processor multiprocessor –One thread per processor Goal –Get ten-fold.
Tools and Libraries for Manycore Computing Kathy Yelick U.C. Berkeley and LBNL.
Parallel Processing - introduction
Morgan Kaufmann Publishers
Computer Programming Machine and Assembly.
Parallel Processing Sharing the load.
Barrier Synchronization
Transactional Memory Companion slides for
CSC3050 – Computer Architecture
COMPONENTS – WHY? Object-oriented source-level re-use of code requires same source code language. Object-oriented source-level re-use may require understanding.
Lecture 20 Parallel Programming CSE /27/2019.
Controlled Interleaving for Transactions
The George Washington University
Presentation transcript:

Can TM help in addressing the “ Multicore Software Scaling Problem? ” Microsoft TM Panel July 2007 Nir Shavit Tel Aviv University

Amdahl’s Law: Speedup = 1/(ParallelPart/N + SequentialPart) Pay for N = 8 cores SequentialPart = 25% Speedup = only 2.9 times! Must parallelize applications on a very fine grain! How do we make use of multicores?

Need Fine-Grained Locking 75% Unshared 25% Shared cc cc cc cc Coarse Grained c c c c c c c c cc cc cc cc Fine Grained c c c c c c c c The reason we get only 2.9 speedup 75% Unshared 25% Shared

Traditional Scaling Process User code Traditional Uniprocessor Speedup 1.8x 7x 3.6x Moore’s law c C C

Ideal Multicore Scaling Process cc cc cc cc cc cc cc User code Multicore Speedup 1.8x7x3.6x Only Wishful Thinking!

Lock-based Code Doesn ’ t Scale cc cc cc cc cc cc cc1.8x 2x 2.9x User code Multicore Speedup Vendors must rewrite code for each machine

Lock-based Code Doesn ’ t Scale olocks are an even bigger problem then we think oScalability today: oCode stays the same, CPUs get faster oSimple model for vendors oScalability tomorrow oLock-based synch code must be rewritten as number of cores increases oHigh costs for vendors

Is TM part of the answer oCan transactions help maintaining the traditional scaling process? oAt least smooth out the transition points … oWrite code once using transactions (short transactions?) oHave TM tuned for each machine oSo no need rewrite software oLike a VM for synchronization … oKey point: transactions are the abstraction that is missing …

Can TM Make Scaling Smoother? cc cc cc cc cc cc cc User code TM code Multicore Speedup 1.8x7x3.6x

Questions to ponder … oWhat needs to be added to the TM designs to make transactional code be “ machine independent? ” oWhat needs to be added to compilers? Languages?