Trends in the Infrastructure of Computing: Processing, Storage, Bandwidth CSCE 190: Computing in the Modern World Dr. Jason D. Bakos.

Slides:



Advertisements
Similar presentations
Computer Abstractions and Technology
Advertisements

ECE G201: Introductory Material Goal: to give you a quick, intuitive concept of how semiconductors, diodes, BJTs and MOSFETs work –as a review of electronics.
Lecture 0: Introduction
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Jan M. Rabaey Digital Integrated Circuits A Design Perspective.
Field Effect Transistors and their applications. There are Junction FETs (JFET) and Insulated gate FETs (IGFET) There are many types of IGFET. Most common.
Heterogeneous Computing at USC Dept. of Computer Science and Engineering University of South Carolina Dr. Jason D. Bakos Assistant Professor Heterogeneous.
Heterogeneous Computing: New Directions for Efficient and Scalable High-Performance Computing Dr. Jason D. Bakos.
Heterogeneous Computing: New Directions for Efficient and Scalable High-Performance Computing CSCE 791 Dr. Jason D. Bakos.
Heterogeneous Computing at USC Dept. of Computer Science and Engineering University of South Carolina Dr. Jason D. Bakos Assistant Professor Heterogeneous.
S. RossEECS 40 Spring 2003 Lecture 13 SEMICONDUCTORS: CHEMICAL STRUCTURE Start with a silicon substrate. Silicon has 4 valence electrons, and therefore.
Some Thoughts on Technology and Strategies for Petaflops.
CSCE 612: VLSI System Design Instructor: Jason D. Bakos.
Introduction to CMOS VLSI Design Lecture 0: Introduction
Chapter 01 An Overview of VLSI
CSCE 190: Computing in the Modern World Dr. Jason D. Bakos
Trends in the Infrastructure of Computing: Processing, Storage, Bandwidth CSCE 190: Computing in the Modern World Dr. Jason D. Bakos.
CSCE 212 Introduction to Computer Architecture
CSCE 613: Fundamentals of VLSI Chip Design Instructor: Jason D. Bakos.
Seven Minute Madness: Reconfigurable Computing Dr. Jason D. Bakos.
Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.
Integrated Circuit Design and Fabrication Dr. Jason D. Bakos.
Lecture 0: Introduction. CMOS VLSI Design 4th Ed. 0: Introduction2 Introduction  Integrated circuits: many transistors on one chip.  Very Large Scale.
Diodes TEC 284.
Introduction Integrated circuits: many transistors on one chip.
WEEK ONE TOPIC: ELECTRONICS SOLID STATE MATERIALS  CONDUCTORS  INSULATORS  SEMICONDUCTORS.
Introduction to Reconfigurable Computing Greg Stitt ECE Department University of Florida.
Z. Feng VLSI Design 1.1 VLSI Design MOSFET Zhuo Feng.
Lecture 13 Lecture by John O'Donnell, used with permission. 1 CS1Q Computer Systems Lecture 13 Simon Gay.
Introduction To Semiconductors
Heterogeneous Computing: New Directions for Efficient and Scalable High-Performance Computing CSCE 791 Dr. Jason D. Bakos.
1 Integrated Circuits Basics Titov Alexander 25 October 2014.
Figure 9.1. Use of silicon oxide as a masking layer during diffusion of dopants.
Electronics 1 Lecture 2 Ahsan Khawaja Lecturer Room 102 Department of Electrical Engineering.
Lecture 0: Introduction. CMOS VLSI Design 4th Ed. 0: Introduction2 Introduction  Integrated circuits: many transistors on one chip.  Very Large Scale.
Taklimat UniMAP Universiti Malaysia Perlis WAFER FABRICATION Hasnizah Aris, 2008 Lecture 2 Semiconductor Basic.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 8: September 24, 2010 MOS Model.
Introduction to Reconfigurable Computing Greg Stitt ECE Department University of Florida.
P-N Junction Diode Topics covered in this presentation:
SILICON DETECTORS PART I Characteristics on semiconductors.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 9: September 17, 2014 MOS Model.
Seven Minute Madness: Heterogeneous Computing Dr. Jason D. Bakos.
Abstraction And Technology 1 Comp 411 – Fall /28/06 Computer Abstractions and Technology 1. Layer Cakes 2. Computers are translators 3. Switches.
Conductors – many electrons free to move
Trends in the Infrastructure of Computing
CMOS VLSI Design Introduction
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 9: September 26, 2011 MOS Model.
Silicon Design Page 1 The Creation of a New Computer Chip.
Introduction to CMOS Transistor and Transistor Fundamental
Trends in the Infrastructure of Computing: Processing, Storage, Bandwidth CSCE 190: Computing in the Modern World Dr. Jason D. Bakos.
2007/11/20 Paul C.-P. Chao Optoelectronic System and Control Lab., EE, NCTU P1 Copyright 2015 by Paul Chao, NCTU VLSI Lecture 0: Introduction Paul C.–P.
Seven Minute Madness: Heterogeneous Computing Dr. Jason D. Bakos.
course Name: Semiconductors
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
COURSE NAME: SEMICONDUCTORS Course Code: PHYS 473.
Introduction to CMOS VLSI Design Lecture 0: Introduction.
1. Introduction. Diseño de Circuitos Digitales para Comunicaciones Introduction Integrated circuits: many transistors on one chip. Very Large Scale Integration.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
LTFY – Physics and Engineering
CIT 668: System Architecture
CSCE 212 Introduction to Computer Architecture
CSCE 190: Computing in the Modern World Dr. Jason D. Bakos
3.1 Introduction to CPU Central processing unit etched on silicon chip called microprocessor Contain tens of millions of tiny transistors Key components:
Trends in the Infrastructure of Computing
Chapter 1 Introduction.
Multicore and GPU Programming
Multicore and GPU Programming
Presentation transcript:

Trends in the Infrastructure of Computing: Processing, Storage, Bandwidth CSCE 190: Computing in the Modern World Dr. Jason D. Bakos

CSCE 190: Computing in the Modern World 2 Elements

CSCE 190: Computing in the Modern World 3 Semiconductors Silicon is a group IV element (4 valence electrons, shells: 2, 8, 18, 32…) –Forms covalent bonds with four neighbor atoms (3D cubic crystal lattice) –Si is a poor conductor, but conduction characteristics may be altered –Add impurities/dopants (replaces silicon atom in lattice): Makes a better conductor Group V element (phosphorus/arsenic) => 5 valence electrons –Leaves an electron free => n-type semiconductor (electrons, negative carriers) Group III element (boron) => 3 valence electrons –Borrows an electron from neighbor => p-type semiconductor (holes, positive carriers) forward bias reverse bias P-N junction Spacing=543 pm

CSCE 190: Computing in the Modern World 4 MOSFETs body/bulk GROUND NMOS/NFETPMOS/PFET channel shorter length, faster transistor (dist. for electrons) body/bulk HIGH positive voltage (Vdd) negative voltage (rel. to body) (GND) (S/D to body is reverse-biased) current Metal-poly-Oxide-Semiconductor structures built onto substrate –Diffusion: Inject dopants into substrate –Oxidation: Form layer of SiO2 (glass) –Deposition and etching: Add aluminum/copper wires

CSCE 190: Computing in the Modern World 5 Layout 3-input NAND

CSCE 190: Computing in the Modern World 6 Logic Gates invNAND2 NAND3 NOR2

CSCE 190: Computing in the Modern World 7 Logic Synthesis Behavior: –S = A + B –Assume A is 2 bits, B is 2 bits, C is 3 bits ABC 00 (0) 000 (0) 00 (0)01 (1)001 (1) 00 (0)10 (2)010 (2) 00 (0)11 (3)011 (3) 01 (1)00 (0)001 (1) 01 (1) 010 (2) 01 (1)10 (2)011 (3) 01 (1)11 (3)100 (4) 10 (2)00 (0)010 (2) 10 (2)01 (1)011 (3) 10 (2) 100 (4) 10 (2)11 (3)101 (5) 11 (3)00 (0)011 (3) 11 (3)01 (1)100 (4) 11 (3)10 (2)101 (5) 11 (3) 110 (6)

CSCE 190: Computing in the Modern World 8 MIPS Microarchitecture

CSCE 190: Computing in the Modern World 9 Synthesized and P&R’ed MIPS Architecture

CSCE 190: Computing in the Modern World 10 Feature Size Shrink minimum feature size… –Smaller L decreases carrier time and increases current –Therefore, W may also be reduced for fixed current –C g, C s, and C d are reduced –Transistor switches faster (~linear relationship)

CSCE 190: Computing in the Modern World 11 Minimum Feature Size YearProcessorSpeedTransistorsProcess 1982i MHz~134, m 1986i38616 – 40 MHz~270,000 1 m 1989i MHz~1 million.8 m 1993Pentium MHz~3 million.6 m 1995Pentium Pro MHz~4 million.5 m 1997Pentium II MHz~5 million.35 m 1999Pentium III450 – 1400 MHz~10 million.25 m 2000Pentium 41.3 – 3.8 GHz~50 million.18 m 2005Pentium D2 cores/package~200 million.09 m 2006Core 22 cores/die~300 million.065 m 2008Core i74 cores/die~800 million.040 m 2010“Sandy Bridge” 8 cores/die??.032 m

Heterogeneous Computing 12 Heterogeneous Computing: Execution Model initialization 0.5% of run time “hot” loop 99% of run time clean up 0.5% of run time instructions executed over time 49% of code 1% of code co-processor

Co-Processor Design CSCE 190: Computing in the Modern World 13 FPGA design:

CSCE HC Execution Model CPU X58 Host Memory Co- processor QPIPCIe On board Memory add-in cardhost In general, co-processor can achieve 10x – 1000x computational throughput vs. CPU Pay penaly for transferring memory between host memory and on-board memory Add-in card can have arbitrary amount of memory bandwidth (use proprietray memory interface) ~25 GB/s ~8 GB/s (x16) ????? ~100 GB/s for GeForce 260

HC Execution Model CSCE 190: Computing in the Modern World 15

Heterogeneous Computing 16 Heterogeneous Computing: Performance Example: –Application requires a week of CPU time –One computation consumes 99% of execution time Kernel speedup Application speedup Execution time hours hours hours hours hours

Heterogeneous Computing 17 Heterogeneous Computing with FPGAs

Heterogeneous Computing 18 Programming FPGAs

Heterogeneous Computing 19 Heterogeneous Computing with GPUs Graphics Processor Unit (GPU) –Contains hundreds of small processor cores grouped hierarchically –Has high bandwidth to on-board memory and to host memory –Became “programmable” about two years ago –Gained hardware double precision about one year ago Examples: IBM Cell, nVidia GeForce, AMD FireStream Advantage over FPGAs: –Easier to program –Less expensive (gamers drove high volumes, decreasing cost) Drawbacks: –Can’t necessarily outperform FPGAs for all types of computations Characterizing this is an open research problem

NVIDIA GPU Architecture CSCE 190: Computing in the Modern World 20

IBM Cell Architecture CSCE 190: Computing in the Modern World 21

Heterogeneous Computing 22 Heterogeneous Computing now Mainstream: IBM Roadrunner Los Alamos, fastest computer in the world 6,480 AMD Opteron (dual core) CPUs 12,960 PowerXCell 8i GPUs Each blade contains 2 Operons and 4 Cells 296 racks 1.71 petaflops peak (1.7 billion million fp operations per second) 2.35 MW (not including cooling) –Lake Murray hydroelectric plant produces ~150 MW (peak) –Lake Murray coal plant (McMeekin Station) produces ~300 MW (peak) –Catawba Nuclear Station near Rock Hill produces 2258 MW

Heterogeneous Computing 23 Our Group Past projects: –Custom FPGA accelerators and components: computational biology linear algebra –Multi-FPGA interconnection networks: interface abstractions adaptive routing algorithms on-chip router designs Current projects: –Design tools Dynamic code analysis Semi-automatic accelerator generation –GPU simulation and emulation for code tuning