Parallel Applications Parallel Hardware Parallel Software 1 The Parallel Computing Laboratory Krste Asanovic, Ras Bodik, Jim Demmel, Tony Keaveny, Kurt.

Slides:



Advertisements
Similar presentations
EECS Electrical Engineering and Computer Sciences B ERKELEY P AR L AB P A R A L L E L C O M P U T I N G L A B O R A T O R Y EECS Electrical Engineering.
Advertisements

UC Berkeley Par Lab 1 Technology Trends: The Datacenter is the Computer, The Cellphone/Laptop is the Computer David Patterson Director, Reliable Adaptive.
RAMP Gold : An FPGA-based Architecture Simulator for Multiprocessors Zhangxi Tan, Andrew Waterman, David Patterson, Krste Asanovic Parallel Computing Lab,
© Chinese University, CSE Dept. Software Engineering / Software Engineering Topic 1: Software Engineering: A Preview Your Name: ____________________.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
Parallel Applications Parallel Hardware Parallel Software IT industry Users 1 Par Lab Overview Dave Patterson Parallel Computing Laboratory (Par Lab) U.C.
Some Thoughts on Technology and Strategies for Petaflops.
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
BEEKeeper Remote Management and Debugging of Large FPGA Clusters Terry Filiba Navtej Sadhal.
UC Berkeley 1 Time dilation in RAMP Zhangxi Tan and David Patterson Computer Science Division UC Berkeley.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
1 Dr. Frederica Darema Senior Science and Technology Advisor NSF Future Parallel Computing Systems – what to remember from the past RAMP Workshop FCRC.
UC Berkeley Par Lab Overview 2 David Patterson Par Lab’s original research “bets” Software platform: data center + mobile client Let compelling applications.
© 2006 Regents University of California. All Rights Reserved RAMP Blue: A Message Passing Multi-Processor System on the BEE2 Andrew Schultz and Alex Krasnov.
Parallel Applications Parallel Hardware Parallel Software 1 The Parallel Computing Landscape: A View from Berkeley 2.0 Krste Asanovic, Ras Bodik, Jim Demmel,
CIS 314 : Computer Organization Lecture 1 – Introduction.
RAMP Gold: ParLab InfiniCore Model Krste Asanovic UC Berkeley RAMP Retreat, January 16, 2008.
Murali Vijayaraghavan MIT Computer Science and Artificial Intelligence Laboratory RAMP Retreat, UC Berkeley, January 11, 2007 A Shared.
1 Memory is the Network Krste Asanovic The Parallel Computing Laboratory EECS Department UC Berkeley NoCS Panel, May 13, 2009.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Using FPGAs with Embedded Processors for Complete Hardware and Software Systems Jonah Weber May 2, 2006.
Virtual Machines. Virtualization Virtualization deals with “extending or replacing an existing interface so as to mimic the behavior of another system”
Introduction to Computer Architecture SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING SUMMER 2015 RAMYAR SAEEDI.
Leveling the Field for Multicore Open Systems Architectures Markus Levy President, EEMBC President, Multicore Association.
Module 9 Review Questions 1. The ability for a system to continue when a hardware failure occurs is A. Failure tolerance B. Hardware tolerance C. Fault.
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
Computer System Architectures Computer System Software
UC Berkeley 1 The Datacenter is the Computer David Patterson Director, RAD Lab January, 2007.
Parallel Applications Parallel Hardware Parallel Software IT industry (Silicon Valley) Users 1 The Parallel Computing Landscape: A View from Berkeley Dave.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Reconfigurable Devices Presentation for Advanced Digital Electronics (ECNG3011) by Calixte George.
1 Hardware Security Mechanisms Krste Asanovic U.C. Berkeley August 20, 2009.
EET 4250: Chapter 1 Computer Abstractions and Technology Acknowledgements: Some slides and lecture notes for this course adapted from Prof. Mary Jane Irwin.
Lessons Learned The Hard Way: FPGA  PCB Integration Challenges Dave Brady & Bruce Riggins.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
CS/ECE 3330 Computer Architecture Kim Hazelwood Fall 2009.
Tessellation: Space-Time Partitioning in a Manycore Client OS Rose Liu 1,2, Kevin Klues 1, Sarah Bird 1, Steven Hofmeyr 3, Krste Asanovic 1, John Kubiatowicz.
Parallel Applications Parallel Hardware Parallel Software 1 The Parallel Computing Laboratory: A Research Agenda based on the Berkeley View Krste Asanovic,
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.
Processes Introduction to Operating Systems: Module 3.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Computer Organization & Assembly Language © by DR. M. Amer.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
AEEC405 – Microprocessor Architecture. Some Information Instructor Details Main Book.
BridgePoint Integration John Wolfe / Robert Day Accelerated Technology.
Harmony: A Run-Time for Managing Accelerators Sponsor: LogicBlox Inc. Gregory Diamos and Sudhakar Yalamanchili.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
1 Retreat (Advance) John Wawrzynek UC Berkeley January 15, 2009.
B5: Exascale Hardware. Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring.
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
Proposal for an Open Source Flash Failure Analysis Platform (FLAP) By Michael Tomer, Cory Shirts, SzeHsiang Harper, Jake Johns
Lecture 7: Overview Microprocessors / microcontrollers.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
KERRY BARNES WILLIAM LUNDGREN JAMES STEED
3/12/07CS Visit Days1 A Sea Change in Processor Design Uniprocessor SpecInt Performance: From Hennessy and Patterson, Computer Architecture: A Quantitative.
Dr. Philip Cannata 1 Hmm Projects C Structs or Classes or OO in Hmm10 Continuations in Hmm6 Trees, Binary Trees or Linked lists in Hmm3 Prolog in Hmm2.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Compiler Research How I spent my last 22 summer vacations Philip Sweany.
Hardware Architecture
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
Tools and Libraries for Manycore Computing Kathy Yelick U.C. Berkeley and LBNL.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Computer Architecture Furkan Rabee
PROGRAMMABLE LOGIC CONTROLLERS SINGLE CHIP COMPUTER
ECE354 Embedded Systems Introduction C Andras Moritz.
Serial Data Hub (Proj Dec13-13).
Virtualization Dr. S. R. Ahmed.
Presentation transcript:

Parallel Applications Parallel Hardware Parallel Software 1 The Parallel Computing Laboratory Krste Asanovic, Ras Bodik, Jim Demmel, Tony Keaveny, Kurt Keutzer, John Kubiatowicz, Edward Lee, Nelson Morgan, George Necula, Dave Patterson, Koushik Sen, John Wawrzynek, David Wessel, and Kathy Yelick March 17, 2008

2 A Parallel Revolution, Ready or Not  Embedded: per product ASIC to programmable platforms  Multicore chip most competitive path  Amortize design costs + Reduce design risk + Flexible platforms  PC, Server: Power Wall + Memory Wall = Brick Wall  End of way built microprocessors for last 40 years  New Moore’s Law is 2X processors (“cores”) per chip every technology generation, but same clock rate  “This shift toward increasing parallelism is not a triumphant stride forward based on breakthroughs …; instead, this … is actually a retreat from even greater challenges that thwart efficient silicon implementation of traditional solutions.” The Parallel Computing Landscape: A Berkeley View, Dec 2006  Sea change for HW & SW industries since changing the model of programming and debugging

3 P.S. Parallel Revolution May Fail  John Hennessy, President, Stanford University, 1/07: “…when we start talking about parallelism and ease of use of truly parallel computers, we're talking about a problem that's as hard as any that computer science has faced. … I would be panicked if I were in industry.” “A Conversation with Hennessy & Patterson,” ACM Queue Magazine, 4:10, 1/07.  100% failure rate of Parallel Computer Companies  Convex, Encore, Inmos (Transputer), MasPar, NCUBE, Kendall Square Research, Sequent, (Silicon Graphics), Thinking Machines, …  What if IT goes from a growth industry to a replacement industry?  If SW can’t effectively use 32, 64,... cores per chip  SW no faster on new computer  Only buy if computer wears out

4 Personal Health Image Retrieval Hearing, Music Speech Parallel Browser Motifs Sketching Legacy Code Schedulers Communication & Synch. Primitives Efficiency Language Compilers Par Lab Research Overview Easy to write correct programs that run efficiently on manycore Legacy OS Multicore/GPGPU OS Libraries & Services RAMP Manycore Hypervisor OS Arch. Productivity Layer Efficiency Layer Correctness Applications Composition & Coordination Language (C&CL) Parallel Libraries Parallel Frameworks Static Verification Dynamic Checking Debugging with Replay Directed Testing Autotuners C&CL Compiler/Interpreter Efficiency Languages Type Systems Diagnosing Power/Performance

5 Compelling Client Applications ImageDatabase Image Query by example 1000’s of images Music/Hearing Robust Speech Input Personal Health Parallel Browser

6   How do compelling apps relate to 13 motifs? Blue Cool “Motif" Popularity (Red Hot  Blue Cool)

7 Developing Parallel Software  2 types of programmers  2 layers  Efficiency Layer (10% of today’s programmers)  Expert programmers build Frameworks & Libraries, Hypervisors, …  “Bare metal” efficiency possible at Efficiency Layer  Productivity Layer (90% of today’s programmers)  Domain experts / Naïve programmers productively build parallel apps using frameworks & libraries  Frameworks & libraries composed to form app frameworks  Effective composition techniques allows the efficiency programmers to be highly leveraged  Create language for Composition and Coordination (C&C)

8 ParLab OS Research Devic e Driver Plug-in Web Browser Root Partition Manager Plug-in 1 Plug-in 2Service Manycore Hardware Web Browser Root Partition Manager Partition Hypervisor Logical System View Physical System View Root Partition implements policy to timeshare partitions - Suspended partition (passive data structure in memory) is not mapped by Root onto physical cores. Suspended

9 InfiniCore Architecture Overview Four separate on-chip network types  Control networks combine 1-bit signals in combinational tree for interrupts & barriers  Active message networks carry register-register messages between cores  L2/Coherence network connects L1 caches to L2 slices and indirectly to memory  Memory network connects L2 slices to memory controllers I/O and accelerators potentially attach to all network types. Flash replaces rotating disks. Only high-speed I/O is network & display. Active Message Network Control/Barrier Network L2/Coherence Network Memory Network Core L1D$ L1I$ L2RAML2Tags L2 Cntl. Core L1D$ L1I$ Accelerators and/or I/O interfaces MEMC DRAM I/O Pins L2RAML2Tags L2 Cntl. MEMC DRAM MEMC Flash

Core “RAMP Blue”  1008 = bit RISC cores / FPGA, 4 FGPAs/board, 21 boards  Simple MicroBlaze soft 90 MHz Full star-connection between modules  NASA Advanced Supercomputing (NAS) Parallel Benchmarks (all class S)  UPC versions (C plus shared-memory abstraction) CG, EP, IS, MG  RAMPants creating HW & SW for many- core community using next gen FPGAs  Chuck Thacker & Microsoft designing next boards  3rd party to manufacture and sell boards: 1H08  Gateware, Software BSD open source  RAMP Gold for Par Lab: new CPU

11 Physical Par Lab - 5th Floor Soda

12 ParLab Summary  Whole IT industry has bet its future on parallelism (!)  Try Apps-Driven vs. CS Solution-Driven Research  Motifs as anti-benchmarks  Efficiency layer for ≈10% today’s programmers  Productivity layer for ≈90% today’s programmers  C&C language to help compose and coordinate  Autotuners vs. Compilers  OS & HW: Primitives  Diagnose Power/Perf.  March 19 announcement UPCRC winner from top 25 CS departments Personal Health Image Retrieval Hearing, Music Speech Parallel Browser Motifs Sketching Legacy Code Schedulers Communication & Synch. Primitives Efficiency Language Compilers Legacy OS Multicore/GPGPU OS Libraries & Services RAMP Manycore Hypervisor OS Arch. Productivity Efficiency Correctness Apps Composition & Coordination Language (C&CL) Parallel Libraries Parallel Frameworks Static Verification Dynamic Checking Debugging with Replay Directed Testing Autotuners C&CL Compiler/Interpreter Efficiency Languages Type Systems Easy to write correct programs that run efficiently and scale up on manycore Diagnosing Power/Performance Bottlenecks