1 Memory is the Network Krste Asanovic The Parallel Computing Laboratory EECS Department UC Berkeley NoCS Panel, May 13, 2009.

Slides:

Advertisements

Similar presentations

Communication-Avoiding Algorithms Jim Demmel EECS & Math Departments UC Berkeley.

Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.

Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.

Hardwired networks on chip for FPGAs and their applications

Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.

1 Distributed Computing Algorithms CSCI Distributed Computing: everything not centralized many processors.

Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures.

1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

OCIN Workshop Wrapup Bill Dally. Thanks To Funding –NSF - Timothy Pinkston, Federica Darema, Mike Foster –UC Discovery Program Organization –Jane Klickman,

1 Concurrent and Distributed Systems Introduction 8 lectures on concurrency control in centralised systems - interaction of components in main memory -

Parallel Computing Multiprocessor Systems on Chip: Adv. Computer Arch. for Embedded Systems By Jason Agron.

CS 240A: Models of parallel programming: Machines, languages, and complexity measures.

Concurrency CS 510: Programming Languages David Walker.

Chapter 11: Distributed Processing Parallel programming Principles of parallel programming languages Concurrent execution –Programming constructs –Guarded.

Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.

CS 603 Threads, Processes, and Agents March 18, 2002.

Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.

Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.

Murali Vijayaraghavan MIT Computer Science and Artificial Intelligence Laboratory RAMP Retreat, UC Berkeley, January 11, 2007 A Shared.

1 Organization of Programming Languages-Cheng (Fall 2004) Concurrency u A PROCESS or THREAD:is a potentially-active execution context. Classic von Neumann.

A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.

Synthesizable, Application-Specific NOC Generation using CHISEL Maysam Lavasani †, Eric Chung † †, John Davis † † † : The University of Texas at Austin.

Presenter : Shao-Cheih Hou Sight count : 11 ASPDAC ‘08.

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

© 2009 Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen 1 Concurrency in Programming Languages Matthew J. Sottile Timothy G. Mattson Craig.

Computer Architecture Challenges Shriniwas Gadage.

View-Oriented Parallel Programming for multi-core systems Dr Zhiyi Huang World 45 Univ of Otago.

1 Hardware Security Mechanisms Krste Asanovic U.C. Berkeley August 20, 2009.

COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.

Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.

Software & the Concurrency Revolution by Sutter & Larus ACM Queue Magazine, Sept For CMPS Halverson 1.

Kevin Skadron University of Virginia Dept. of Computer Science LAVA Lab Wrapup and Open Issues.

Enabling Multi-threaded Applications on Hybrid Shared Memory Manycore Architectures Tushar Rawat and Aviral Shrivastava Arizona State University, USA CML.

Concurrent Programming. Concurrency  Concurrency means for a program to have multiple paths of execution running at (almost) the same time. Examples:

© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.

CE Operating Systems Lecture 3 Overview of OS functions and structure.

Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.

1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.

System Architecture: Near, Medium, and Long-term Scalable Architectures Panel Discussion Presentation Sandia CSRI Workshop on Next-generation Scalable.

Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.

Lecture 20: Parallelism & Concurrency CS 62 Spring 2013 Kim Bruce & Kevin Coogan CS 62 Spring 2013 Kim Bruce & Kevin Coogan Some slides based on those.

Kevin Skadron University of Virginia Dept. of Computer Science LAVA Lab Wrapup and Open Issues.

1. 2 Pipelining vs. Parallel processing  In both cases, multiple “things” processed by multiple “functional units” Pipelining: each thing is broken into.

Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.

1 "Workshop 31: Developing a Hands-on Undergraduate Parallel Programming Course with Pattern Programming SIGCSE The 44 th ACM Technical Symposium.

CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/

Jini Architectural Overview Li Ping

1 Presenter: Min Yu,Lo 2015/12/21 Kumar, S.; Jantsch, A.; Soininen, J.-P.; Forsell, M.; Millberg, M.; Oberg, J.; Tiensyrja, K.; Hemani, A. VLSI, 2002.

FPGA-Based System Design: Chapter 7 Copyright  2004 Prentice Hall PTR Topics n Hardware/software co-design.

Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Threads. Readings r Silberschatz et al : Chapter 4.

1 Concurrent and Distributed Computing Dr Jose Santos Room 16J03

University of Washington Today Quick review? Parallelism Wrap-up 

3/12/07CS Visit Days1 A Sea Change in Processor Design Uniprocessor SpecInt Performance: From Hennessy and Patterson, Computer Architecture: A Quantitative.

Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.

Multi-Grid Esteban Pauli 4/25/06. Overview Problem Description Problem Description Implementation Implementation –Shared Memory –Distributed Memory –Other.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

Martin Kruliš by Martin Kruliš (v1.1)1.

Conclusions on CS3014 David Gregg Department of Computer Science

For Massively Parallel Computation The Chaotic State of the Art

Parallel Programming By J. H. Wang May 2, 2017.

Constructing a system with multiple computers or processors

Chapter 4: Threads.

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

EE 4xx: Computer Architecture and Performance Programming

Lecture 2 The Art of Concurrency

Presentation transcript:

1 Memory is the Network Krste Asanovic The Parallel Computing Laboratory EECS Department UC Berkeley NoCS Panel, May 13, 2009

One Slide Version of Talk  No-one can afford to build application-specific chips  at least not in feature sizes that warrant a NoC  All future chips are programmable and parallel  Multicore, Manycore, GPU, FPGA  Usable programmable parallel systems bottlenecked by memory system  Performance and energy-efficiency  Network is just a path to memory (on-chip and off- chip) - work on entire problem not just cabling  Change name to “International Symposium on Memory Systems”  (Large-scale networks between chips/boards/racks still very interesting to think of as networks, but that’s different.)

ApplicationChipApplication-Specific ProgrammableChip Application

Successful Parallel Programming Models   Producer-consumer easy   Mutual exclusion easy   No implicitly shared state   Sharing state cumbersome   Irregular computation hard   Examples: Occam, Simulink, StreamIt, Clik, … Actor Networks   Producer-consumer hard   Mutual exclusion hard   Transactional mem. helps   Sharing state easy   Maybe too easy   Examples: Pthreads, Cilk, Java, … Shared-Memory Dynamic Threads/Transactions fork join Data-Parallel/SPMD barrier   Producer-consumer easy   Handled en-masse   Mutual exclusion easy   Sharing state easy   Irregular computation hard   Examples: APL, NESL, Matlab, HPF, OpenMP, UPC, CAF, …

Memory is the Network-on-Chip from Software’s View Actors - messages buffered in memory-resident channels until convenient to run actor Data-Parallel - memory holds arrays used to interchange data between parallel phases Transactional - memory holds shared data base accessed atomically Programming with data on-flight on wires is too brittle for any large code (sorry Anant), need flexibility in when and where code gets executed

Fixed-function accelerators  Any programmable chip will have a stack of fixed-function accelerators  Crypto, Codecs, Radios, Graphics  But these won’t use NoC internally, just place and route  They’ll connect to general-purpose portion through memory for all reasons given before

Research Directions  Make memory a better communication channel  Richer software interface  Better synchronization primitives E.g., atomic message enqueue/dequeue for actor channels Atomic fetch-and-op for data-parallel apps Transactional memory for concurrent apps  Better cache-coherence protocols  Make memory go faster and with lower power  New device technologies (e.g., photonics)  New microarchitectures and network ideas  Must consider on-chip and off-chip to main memory at same time