IBM Research Division © 2007 IBM Corporation July 22, 2008 The 50B Transistor Challenge Mikko Lipasti Department of Electrical and Computer Engineering.

Slides:



Advertisements
Similar presentations
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Advertisements

Microprocessor Microarchitecture Multithreading Lynn Choi School of Electrical Engineering.
Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
Room: E-3-31 Phone: Dr Masri Ayob TK 2123 COMPUTER ORGANISATION & ARCHITECTURE Lecture 4: Computer Performance.
1  1998 Morgan Kaufmann Publishers Lectures for 2nd Edition Note: these lectures are often supplemented with other materials and also problems from the.
1 Pipelining for Multi- Core Architectures. 2 Multi-Core Technology Single Core Dual CoreMulti-Core + Cache + Cache Core 4 or more cores.
DaMoN 2011 Paper Preview Organized by Stavros Harizopoulos and Qiong Luo Athens, Greece Jun 13, 2011.
Computer performance.
Introduction. Readings r Van Steen and Tanenbaum: 5.1 r Coulouris: 10.3.
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
Invitation to Computer Science 5th Edition
Multi-core architectures. Single-core computer Single-core CPU chip.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Multi-Core Architectures
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Dong Hyuk Woo Nak Hee Seong Hsien-Hsin S. Lee
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
High-Performance Computing An Applications Perspective REACH-IIT Kanpur 10 th Oct
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Grad Student Visit DayUniversity of Wisconsin-Madison Wisconsin Computer Architecture Guri SohiMark HillMikko LipastiDavid WoodKaru Sankaralingam Nam Sung.
Architectures of distributed systems Fundamental Models
Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)
CS211 - Fernandez - 1 CS211 Graduate Computer Architecture Network 3: Clusters, Examples.
Chapter 1 — Computer Abstractions and Technology — 1 The Computer Revolution Progress in computer technology – Underpinned by Moore’s Law Makes novel applications.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
1 Wisconsin Computer Architecture Guri SohiMark HillMikko LipastiDavid WoodKaru Sankaralingam Nam Sung Kim.
Neural Networks in Computer Science n CS/PY 231 Lab Presentation # 1 n January 14, 2005 n Mount Union College.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
ICC Module 3 Lesson 1 – Computer Architecture 1 / 12 © 2015 Ph. Janson Information, Computing & Communication Computer Architecture Clip 6 – Logic parallelism.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Chapter 5: Computer Systems Design and Organization Dr Mohamed Menacer Taibah University
Computer performance issues* Pipelines, Parallelism. Process and Threads.
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Multicore: Panic or Panacea? Mikko H. Lipasti Associate Professor Electrical and Computer Engineering University of Wisconsin – Madison
New-School Machine Structures Parallel Requests Assigned to computer e.g., Search “Katz” Parallel Threads Assigned to core e.g., Lookup, Ads Parallel Instructions.
Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University.
Computer Organization IS F242. Course Objective It aims at understanding and appreciating the computing system’s functional components, their characteristics,
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
History of Computers and Performance David Monismith Jan. 14, 2015 Based on notes from Dr. Bill Siever and from the Patterson and Hennessy Text.
Computer Architecture Furkan Rabee
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
William Stallings Computer Organization and Architecture 6th Edition
Lynn Choi School of Electrical Engineering
ECE354 Embedded Systems Introduction C Andras Moritz.
Simultaneous Multithreading
COSC 3406: Computer Organization
Architecture & Organization 1
Hyperthreading Technology
Gokul Ravi, Mikko H. Lipasti Electrical and Computer Engineering
Architecture & Organization 1
Computer Architecture Lecture 4 17th May, 2006
BIC 10503: COMPUTER ARCHITECTURE
Architectures of distributed systems Fundamental Models
Adaptive Single-Chip Multiprocessing
A High Performance SoC: PkunityTM
Midterm review.
What is Computer Architecture?
What is Computer Architecture?
What is Computer Architecture?
Architectures of distributed systems Fundamental Models
EE 193: Parallel Computing
CSE378 Introduction to Machine Organization
Introduction to Computer Engineering
Presentation transcript:

IBM Research Division © 2007 IBM Corporation July 22, 2008 The 50B Transistor Challenge Mikko Lipasti Department of Electrical and Computer Engineering University of Wisconsin - Madison IBM T.J. Watson Research Center July 22 and 23, 2008

IBM Research Division 50B Transistors on a Chip?  History –1997 IEEE Computer Special Issue, 1B T/chip by papers advocate single fast core – CMU, Michigan, Wisconsin IRAM – Berkeley RAW – MIT SMT – Washington Multicore – Stanford  11 years later, 50x more transistors –We still need faster cores : computation Fundamentally constrained by power –Will get more than one core : communication Need efficient interconnects and coherent caches –Will get lots of on-chip memory Need to think about new algorithms and new approaches to use it 2July 22, 2008

IBM Research Division (1) What Will We Do With 50B Transistors?  50B transistors/chip dramatically alters data centers  E.g. Nokia moving aggressively into services –Google, Yahoo, MSN each provision ~1M servers –Now provision for 10x installed base (phone vs. PC) Witness recent problems with Iphone/MobileMe  Impossible to anticipate applications –Youtube/Facebook/Flickr/Twitter –Unstructured real world data –Organize, search, extract semantic knowledge, mashups, …  Existing and future server apps all benefit 3July 22, 2008

IBM Research Division (2) How Will We Design Chips with 50B Transistors  Three things that processors need to be good at: –Computation –Communication –Storage/Memory  Focus on cost and nature of computation  Focus on cost of communication  Shift emphasis to memory 4July 22, 2008

IBM Research Division Cost of Computation  Less than 10% of energy spent on useful work –EPI overhead has gotten out of hand –Need to rethink operand delivery [ICCD’07], queues [ISPLED’07], caches, register files, control, …  Exploit program attributes –Solve hard problems via elimination Macro-ops : no single-cycle operations [MICRO’03, HPCA’06] –Do the hard parts with narrow values [JILP’07]  Eliminate redundancy, excessive pipelines –Clever clock gating [ISLPED’06, ICCD’07] –Remove renaming, register file, clocked scheduler, pipelines [submitted]  Goal: reduce EPI by 10x at fixed process technology and MIPS 5July 22, 2008

IBM Research Division Cost of Communication  Reduce coherence overhead and speculation –Region coherence [ISCA’05, ASPLOS’06, HPCA’08]  Exploit locality of communication patterns –Switched circuits [CALetters’07, NOCS’08] –On-chip multicasting [ISCA’08] –Multicast coherence [submitted]  New technologies –Nanophotonic rings [HP Labs collaboration] –Massive bandwidth, speed-of-light latency –Lots of interesting problems to solve 6July 22, 2008

IBM Research Division Emphasis on Memory  In future processes, memory will be easier than logic –Reliability, variability: well-known solutions (ECC, sparing) –Interesting new technologies (PCRAM, etc.) –Not caches -- diminishing returns  Return to more regular, “memory-like” devices and logic? –Gate array, LUT, PLA  Majority of 50B T must not be switching –Remembering is cheaper than computing Revisit value locality/reuse/memoization? –New search algorithms: TCAM accelerator [ICCD’08] : Logic in memory—but not IRAM! 7July 22, 2008

IBM Research Division Unstructured Real-World Data  Internet is exploding with data –Text –Semantic knowledge –Photo, video, audio  It is all in digital form but all we can do is view and copy it  Algorithms for analysis range from poor to nonexistent –Machine learning?  Why not learn from nature? 8July 22, 2008

IBM Research Division Brains  Human brain  Von Neumann machine –Face recognition: <500ms –Neurons are slow: Critical path is a handful of “gates” –Fundamentally different computational model  Made of shoddy, unreliable parts “…neurons are noisy, unreliable devices, … the nervous system averages over many cells to compensate for these shoddy components.” -Christof Koch  We can build it. We have the technology. Dec. 3, 2007MICRO’-40 Panel: Computing Beyond Von Neumann9

IBM Research Division Brains (2)  Human neocortex: –~20B neurons, ~200T synapses –Structurally homogenous –Hypothesis: runs common algorithm  Apply architecture 101? –Abstraction layers –Hierarchy and replication –Simulation/analysis/synthesis –Massively parallel fault-tolerant hardware  Best news: no need for parallel programming –Train vs. program –Let’s Build Brains! Dec. 3, 2007MICRO’-40 Panel: Computing Beyond Von Neumann10

IBM Research Division Summary  Computation : –Reduce cost (EPI) by 10x –New algorithms  Communication –Streamline coherence protocols, interconnects –Exploit new technologies  Storage/Memory –Reliability/variability –Logic in memory/new algorithms  Brain computing for unstructured real-world data 11July 22, 2008

IBM Research Division Questions?