1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.

Slides:

Advertisements

Similar presentations

Garbage Collection in the Next C++ Standard Hans-J. Boehm, Mike Spertus, Symantec.

Advertisements

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.

Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.

XML Flattened The lessons to be learned from XBRL.

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.

Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.

Memory Consistency in Vector IRAM David Martin. Consistency model applies to instructions in a single instruction stream (different than multi-processor.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.

Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.

Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.

Multiscalar processors

State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.

Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.

Department of Computer Science Presenters Dennis Gove Matthew Marzilli The ATOMO ∑ Transactional Programming Language.

1 Organization of Programming Languages-Cheng (Fall 2004) Concurrency u A PROCESS or THREAD:is a potentially-active execution context. Classic von Neumann.

© 2009 Mathew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.

Advances in Language Design

Computer System Architectures Computer System Software

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

GENERAL CONCEPTS OF OOPS INTRODUCTION With rapidly changing world and highly competitive and versatile nature of industry, the operations are becoming.

© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

View-Oriented Parallel Programming for multi-core systems Dr Zhiyi Huang World 45 Univ of Otago.

Programming Paradigms for Concurrency Part 2: Transactional Memories Vasu Singh

Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.

Software & the Concurrency Revolution by Sutter & Larus ACM Queue Magazine, Sept For CMPS Halverson 1.

HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.

COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.

Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.

Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

CS162 Week 5 Kyle Dewey. Overview Announcements Reactive Imperative Programming Parallelism Software transactional memory.

Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.

Ronny Krashinsky Erik Machnicki Software Cache Coherent Shared Memory under Split-C.

Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.

CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.

Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Programmability Hiroshi Nakashima Thomas Sterling.

A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.

Solving Difficult HTM Problems Without Difficult Hardware Owen Hofmann, Donald Porter, Hany Ramadan, Christopher Rossbach, and Emmett Witchel University.

The Standford Hydra CMP  Lance Hammond  Benedict A. Hubbert  Michael Siu  Manohar K. Prabhu  Michael Chen  Kunle Olukotun Presented by Jason Davis.

3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,

HParC language. Background Shared memory level –Multiple separated shared memory spaces Message passing level-1 –Fast level of k separate message passing.

Parallel Computing Presented by Justin Reschke

SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.

Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu,

Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.

LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.

PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.

Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.

Lecture 20: Consistency Models, TM

Advanced Architectures

Distributed Shared Memory

CS5102 High Performance Computer Systems Thread-Level Parallelism

Computer Engg, IIT(BHU)

The University of Adelaide, School of Computer Science

Two Ideas of This Paper Using Permissions-only Cache to deduce the rate at which less-efficient overflow handling mechanisms are invoked. When the overflow.

Superscalar Processors & VLIW Processors

How to improve (decrease) CPI

Shared Memory Programming

Lecture 22: Consistency Models, TM

Background and Motivation

EE 4xx: Computer Architecture and Performance Programming

The Challenge of Cross - Language Interoperability

Lecture: Consistency Models, TM

Presentation transcript:

1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009

2 ‘Obvious Truths’ Single processors will not get faster, we need to go to multi-core There will be a need for processors with many (> 32?) cores These will need to support general purpose applications Application performance will need to scale with number of cores

3 ‘Obvious Truths’(2) General purpose parallel computing needs shared memory Current shared memory requires cache coherence Cache coherence doesn’t scale beyond 32 cores Updateable state makes general purpose parallel programming difficult

4 ‘Obvious Untruths’ HPC already has all the answers to parallel programming Message passing is the answer (hardware or software or both) Conventional languages already have adequate threading and locking facilities We can program without state

5 So what next? Simplifying the programming model must be the answer – removing facilities is desirable e.g. – Random control transfer – Pointer arithmetic – Explicit memory reclamation Arbitrary state manipulation is the enemy of parallelism – we must restrict it!

6 Half Truths? Functional languages are the answer to parallelism, all we need is to add state (in a controlled way) Transactional memory can replace locking to simplify the handling of parallel state Transactional memory can remove the need for cache coherence

7 Functions+Transactions The Cambridge Microsoft Haskell work has shown how transactions can be included in a functional language via monads Is this a style of programming which can be sold to the world as the way ahead for future multi-core (many- core) systems?

8 Selling a New Language It must capable of expressing everything that people want It isn’t just a case of producing something which is a good technical solution It mustn’t be too complex It probably needs to look familiar It needs to be efficient to implement

9 The Problems FP is unfamiliar to many existing programmers Many people find it hard to understand Even more find monads difficult In spite of excellent FP compiler technology, imperative programming will probably always be more efficient

10 Can We Compromise? Pure functional programs can be executed easily in parallel because they don’t update global state But if we only exploit parallelism at the function level, local manipulation of state within a function causes no problems Can we work with such a model?

11 What Would We Gain? ‘Easy’ parallelism at function level – This could either be explicit or implicit Familiarity of low level code – Can use iteration, assignment, updateable arrays etc. Potential increase in efficiency – Direct mapping to machine code – Explicit memory re-use

12 What Would We Lose? Clearly we lose referential transparency within any imperative code But this is inevitable if we want to manipulate state – even with monads Clearly, as described so far, we haven’t got the ability to manipulate global state – we need more

13 Adding Transactions We should only use shared state when it is really necessary It should be clear in the language when this is happening It should be detectable statically Ideally, it should be possible to check automatically the need for atomic sections

14 Memory Architecture With the right underlying programming model we should be able to determine memory regions – Read only – Thread local – Global write once – Global shared (transactional) Can lead to simplified scalable memory architecture

15 Experiments Using Scala to investigate programming styles – Is open source – Has both imperative & functional feature – Not currently transactional Using Simics based hardware simulator to experiment with memory architectures

16 Outstanding Questions Data Parallelism – How to express – How to handle in-place update of parallel data (array) structures Streaming applications – Purely functional? – Need message passing constructs? – Need additions to the memory model?

17 Conclusions None really so far! But am convinced, from a technical viewpoint, we need new programming approaches Am fairly convinced that we need to be pragmatic in order to sell a new approach, even if this requires compromises from ideals

18 Questions?

19 Transactional Memory Programming model to simplify manipulation of shared state – Speculative model – Sections of program declared ‘atomic’ – They must complete without conflict or die and restart – Must not alter global state until complete – Needs system support – software or hardware

20 Object Based Transactional Memory Hardware Based on ‘object-aware’ caches Exploits object structure to simplify transactional memory operations Advantages over other hardware TM proposals – Handles cache overflow elegantly – Enables multiple memory banks with distributed commit

21 TM & Cache Coherence Fine grain cache coherence is the major impediment to extensible multi-cores Updates to shared memory only occur when a transaction commits Caches only need to be updated at commit points (which tend to be coarser grain) If all shared memory is made transactional, the requirement for fine grain coherence is removed

22 TM Programming TM constructs can be added to conventional programming languages But, they require careful use to ensure correctness If transactional & non-transactional operations are allowed on the same data, the result can become complex to understand.

23 New Programming Models? Problems can often be simplified by restricting (unnecessary) programming facilities e.g. – Arbitrary control transfer – Pointer arithmetic – Explicit memory reclamation A new approach is needed to simplify parallel programming & hardware

24 We Need Useable & Efficient Models Shared memory is essential for general purpose programming Message passing (alone) (e.g. MPI, Occam etc.) is not sufficient We need shared updateable state – e.g. pure functional programming is not the answer The languages need to be simple and easily implementable

25 A Synthesis? Functional Programming has something to offer – don’t use state unnecessarily But don’t be too ‘religious’ – local, single threaded state is simple & efficient Can all global shared state be handled transactionally?

26 Experiments Using the language Scala – has both functional and imperative features Experimenting with applications Studying how techniques similar to ‘escape analysis’ can identify shared mutable state Looking at hardware implications, particularly memory architecture