From Yale-45 to Yale-90: Let Us Not Bother the Programmers Guri Sohi University of Wisconsin-Madison Celebrating September 19, 2014.

Slides:



Advertisements
Similar presentations
Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen.
Advertisements

Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
FIU Chapter 7: Input/Output Jerome Crooks Panyawat Chiamprasert
Clouds C. Vuerli Contributed by Zsolt Nemeth. As it started.
Memory Management Questions answered in this lecture: How do processes share memory? What is static relocation? What is dynamic relocation? What is segmentation?
Lecture 1: History of Operating System
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Computer Organization and Architecture
Operating Systems Lecture 1 Crucial hardware concepts review M. Naghibzadeh Reference: M. Naghibzadeh, Operating System Concepts and Techniques, iUniverse.
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
Energy Model for Multiprocess Applications Texas Tech University.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Processes Part I Processes & Threads* *Referred to slides by Dr. Sanjeev Setia at George Mason University Chapter 3.
Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer Sciences Department University of Wisconsin, Madison,
Computer System Architectures Computer System Software
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Basics and Architectures
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
Priority Research Direction (use one slide for each) Key challenges -Fault understanding (RAS), modeling, prediction -Fault isolation/confinement + local.
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
3G Single Core Modem A New Telecommunications Device Group 4: Warren Irwin, Austin Beam, Amanda Medlin, Rob Westerman, Brittany Deardian.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
Introduction 1-1 Introduction to Virtual Machines From “Virtual Machines” Smith and Nair Chapter 1.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Processes and OS basics. RHS – SOC 2 OS Basics An Operating System (OS) is essentially an abstraction of a computer As a user or programmer, I do not.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
 Chapter 13 – Dependability Engineering 1 Chapter 12 Dependability and Security Specification 1.
Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)
Server Virtualization
Chap 1 Introduction. What is OS? OS is a program that interfaces users and computer hardware. Purpose: Provides an environment for users to execute programs.
1 Latest Generations of Multi Core Processors
Dean Tullsen UCSD.  The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive.
Implications of Emerging Hardware Tom Wenisch (University of Michigan) Nikos Hardavellas (Northwestern University) Sangyeun Cho (University of Pittsburgh)
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Lecture 2: Computer Architecture: A Science ofTradeoffs.
Processor Architecture
Distributed Computing Systems CSCI 6900/4900. Review Distributed system –A collection of independent computers that appears to its users as a single coherent.
Department of Computer Science Get the Parallelism out of my Cloud Karu Sankaralingam and Remzi H. Arpaci-Dusseau University of Wisconsin-Madison
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
Distributed Computing Systems CSCI 6900/4900. Review Definition & characteristics of distributed systems Distributed system organization Design goals.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Silberschatz, Galvin and Gagne ©2007 Chapter 0: Historical Overview.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Lynn Choi School of Electrical Engineering
Computer Organization CS224
Parallel Computing in the Multicore Era
Embedded Systems Design
Lecture 5: GPU Compute Architecture
Software Defined Networking (SDN)
Lecture 5: GPU Compute Architecture for the last time
Computer System Overview
ISCA 2005 Panel Guri Sohi.
Single-Chip Multiprocessors: the Rebirth of Parallel Architecture
Parallel Computing in the Multicore Era
Computer Architecture: A Science of Tradeoffs
Presentation transcript:

From Yale-45 to Yale-90: Let Us Not Bother the Programmers Guri Sohi University of Wisconsin-Madison Celebrating September 19, 2014

Outline Where have we come from Where are we are likely going 2

Where From: Hardware Primary goal was performance Continuing increase in performance without demands on software Lots of “under the hood” innovations in cores (e.g., Big MF branch predictors) – Key enabling technique was sequential appearance and precise exceptions 3

Where From: Hardware Put more on core to achieve certain objective – Argument is “improve efficiency” – Multimedia, vectors, 64-bit, etc. – Incremental cost – Cores have become a “catch all” – Good for all, but not the most efficient for any Efficiency became important – Emergence of more efficient, special-purpose solutions (e.g., GPUs) 4

Where From: Software Few applications, few customers “Shrink Wrap” software: few applications and lots of customers Ubiquitous software: lots of diverse applications and lots of software 5

Where From: Software No worries when everything “under the hood” Significant challenges with multicore – Need to parallelize If biting the bullet, might as well go all the way – E.g., GPUs But mostly avoid difficulty and embrace convenience – Even if inefficient 6

Important Lessons When transistor budgets exceed certain amounts, the importance of certain techniques decreases, making room for other techniques Relative importance of special techniques diminishes over time Convenience key to software proliferation Mass volumes drive end result 7

Future Academic Research General-purpose App processing Units (GPAPUs) XY-DRAM 4D integration – Heterogeneity (XY-DRAM) – Dynamically varying distance between 3D layers Revisit everything (e.g., cache design and DRAM scheduling) with 4D integration with GPAPUs 8

Future Primary design goal: energy Hardware: use more transistors to save energy Software: keep doing things “under the hood” 9

Future Novel uniprocessor cores Lower energy devices – prone to errors Customized computation energy reducers (a.k.a. accelerators) – If can use software library, why use on multiple CPUs? Why not on customized hardware? 10

Processor Usage Have OS core, user core Have core that can only run 32/64-bit code – Specialization for 32-bit operands Have core that doesn’t support precise interrupts Many other forms of limited functionality cores – Improve performance – Reduce energy 11

Processor Usage Steady demultiplexing of what was done on a general-purpose core Computation spreading – OS/user Separating specific code to accelerators Other forms of stripping out functionality in general purpose core “Mostly general-purpose” core 12

Hardware Going Forward Multiple mostly general-purpose processing cores – dynamically specialized Some “special-purpose” hardware For more efficient processing Over-provisioning: pool of available (i.e., powered on) resources might change frequently – Now called “dark silicon” Will need to be transparent to software 13

What all is needed? Develop picks and shovels What are the mechanisms to ease software use of diverse hardware? Are we going to have higher level of exceptions/restart? Does the microarchitecture need low level restart? – Precise/non-precise core 14