IRAM Vision Microprocessor & DRAM on a single chip:

Slides:



Advertisements
Similar presentations
Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.
Advertisements

Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.
Jared Casper, Ronny Krashinsky, Christopher Batten, Krste Asanović MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA A Parameterizable.
Computer Architecture & Organization
Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.
OGO 2.1 SGI Origin 2000 Robert van Liere CWI, Amsterdam TU/e, Eindhoven 11 September 2001.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
CS 213 Commercial Multiprocessors. Origin2000 System – Shared Memory Directory state in same or separate DRAMs, accessed in parallel Upto 512 nodes (1024.
1 Network Performance Model Sender Receiver Sender Overhead Transmission time (size ÷ band- width) Time of Flight Receiver Overhead Transport Latency Total.
1 BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3.
Slide 1 Computers for the Post-PC Era John Kubiatowicz, Kathy Yelick, and David Patterson IBM Visit.
UC Berkeley 1 Time dilation in RAMP Zhangxi Tan and David Patterson Computer Science Division UC Berkeley.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
1 The Future of Microprocessors Embedded in Memory David A. Patterson EECS, University.
1 IRAM: A Microprocessor for the Post-PC Era David A. Patterson EECS, University of.
Bandwidth Rocks (1) Latency Lags Bandwidth (last ~20 years) Performance Milestones Disk: 3600, 5400, 7200, 10000, RPM.
1 IRAM and ISTORE David Patterson, Katherine Yelick, John Kubiatowicz U.C. Berkeley, EECS
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
Computer System Architectures Computer System Software
UC Berkeley 1 The Datacenter is the Computer David Patterson Director, RAD Lab January, 2007.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
MonetDB/X100 hyper-pipelining query execution Peter Boncz, Marcin Zukowski, Niels Nes.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Resource Addressable Network (RAN) An Adaptive Peer-to-Peer Substrate for Internet-Scale Service Platforms RAN Concept & Design  Adaptive, self-organizing,
CPE 731 Advanced Computer Architecture Technology Trends Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of California,
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Slide 1 IRAM and ISTORE Projects Aaron Brown, Jim Beck, Rich Fromm, Joe Gebis, Kimberly Keeton, Christoforos Kozyrakis, David Martin, Morley Mao, Rich.
BMTS 242: Computer and Systems Lecture 2: Memory, and Software Yousef Alharbi Website
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
BCS361: Computer Architecture I/O Devices. 2 Input/Output CPU Cache Bus MemoryDiskNetworkUSBDVD …
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
1 IRAM Vision Microprocessor & DRAM on a single chip: –on-chip memory latency 5-10X, bandwidth X –improve energy efficiency 2X-4X (no off-chip bus)
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
Graciela Perera Department of Computer Science and Information Systems Slide 1 of 18 INTRODUCTION NETWORKING CONCEPTS AND ADMINISTRATION CSIS 3723 Graciela.
Introduction to Computers - Hardware
Presented by: Nick Kirchem Feb 13, 2004
Lynn Choi School of Electrical Engineering
CSE1301 Computer Programming: Lecture 1 Computer Systems Overview
Overview Parallel Processing Pipelining
Berkeley Cluster Projects
Hardware Technology Trends and Database Opportunities
UNIT 9 Computer architecture
NVIDIA’s Extreme-Scale Computing Project
Rough Schedule 1:30-2:15 IRAM overview 2:15-3:00 ISTORE overview break
Berkeley Cluster: Zoom Project
Scaling for the Future Katherine Yelick U.C. Berkeley, EECS
IDISK Cluster 8 disks, 8 CPUs, DRAM /shelf
BitWarp Energy Efficient Analytic Data Processing on Next Generation General Purpose GPUs Jason Power || Yinan Li || Mark D. Hill || Jignesh M. Patel.
Computer Architecture CSCE 350
IRAM and ISTORE Projects
CS775: Computer Architecture
IRAM: A Microprocessor for the Post-PC Era
Welcome Three related projects at Berkeley Groundrules Introductions
New Directions in Computer Architecture
The University of Adelaide, School of Computer Science
Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.
IRAM: A Microprocessor for the Post-PC Era
IRAM: A Microprocessor for the Post-PC Era
The University of Adelaide, School of Computer Science
A microprocessor into a memory chip Dave Patterson, Berkeley, 1997
Utsunomiya University
Cluster Computers.
Presentation transcript:

IRAM Vision Microprocessor & DRAM on a single chip: B u s Proc L o g i c f a b I/O I/O Microprocessor & DRAM on a single chip: on-chip memory latency 5-10X, bandwidth 50-100X improve energy efficiency 2X-4X (no off-chip bus) serial I/O 5-10X v. buses smaller board area/volume adjustable memory size/width $ $ L2$ Bus Bus $B for separate lines for logic and memory Single chip: either processor in DRAM or memory in logic fab D R A M I/O I/O Proc D R A M f a b Bus D R A M

IRAM Update 2 test chips: serial lines (MOSIS) + Embedded DRAM/Crossbar (LG Semicon) Simulator/Architecture Manual Completed Initial Compiler (“VIC”) Completed Partner for scalar processor (Sandcraft/MIPS) LG delays, prospects => stick to plan to re-evaluate options for IRAM prototype Foundary: TSMC, UMC DRAM companies: IBM, Micron, NEC, Toshiba Applications: FFT, segmentation, ...

IRAM App: ISTORE (“Intelligent Storage”) 1 IRAM/DRAM + crossbar switch + fast serial link v. conventional SMP Move function to data v. data to CPU B u s Proc I/O I/O $ $ Conventional CPU L2$ Bus Bus How does TPC-D scale with dataset size? Compare NCR 5100M 20 node system (each node is 8 133 MHz Pentium CPUs), March 28, 1997; 100 GB, 300GB, 1000GB Per 19 queries, all but 2 go up linearly with database size: (3-5 vs 300, 7-15 vs. 1000) e.g, interval time ratios 300/100 = 3.35; 1000/100=9.98; 1000/300= 2.97 How much memory for IBM SP2 node? 100 GB: 12 processors with 24 GB; 300 GB: 128 thin nodes with 32 GB total; 256 MB/node (2 boards/processor) TPC-D is business analysis vs. business operation 17 read only queries; results in queries per Gigabyte Hour Scale Factor (SF) multiplies each portion of the data: 10 to 10000 SF 10 is about 10 GB; indices + temp table increase 3X - 5X I R A M … cross bar

Another Vision of ISTORE CPU/Memory 1 IRAM/disk + xbar + fast serial link v. conventional SMP, cluster Network latency = f(SW overhead), not link distance Move function to data v. data to CPU (scan, sort, join,...) Cost/performace, more scalable cross bar How does TPC-D scale with dataset size? Compare NCR 5100M 20 node system (each node is 8 133 MHz Pentium CPUs), March 28, 1997; 100 GB, 300GB, 1000GB Per 19 queries, all but 2 go up linearly with database size: (3-5 vs 300, 7-15 vs. 1000) e.g, interval time ratios 300/100 = 3.35; 1000/100=9.98; 1000/300= 2.97 How much memory for IBM SP2 node? 100 GB: 12 processors with 24 GB; 300 GB: 128 thin nodes with 32 GB total; 256 MB/node (2 boards/processor) TPC-D is business analysis vs. business operation 17 read only queries; results in queries per Gigabyte Hour Scale Factor (SF) multiplies each portion of the data: 10 to 10000 SF 10 is about 10 GB; indices + temp table increase 3X - 5X cross bar cross bar IRAM IRAM IRAM IRAM … … … … … … IRAM IRAM IRAM IRAM … … …

ISTORE Update Build prototypes to gain experience, develop software before IRAM chips arrive Replace with IRAM chips once available ISTORE-0: 2 Sandcraft Development boards + Fast Ethernet + Real-time OS ISTORE-1: Design small board (CPU, DRAM, Ethernet) and place inside disk enclosure, build 64 - 128 node system (Ethernet switch) ISTORE-2: “Intelligent SIMM” module based on Mitsubishi M32RXD (DRAM interface+CPU)

IRAM/ISTORE Schedule IRAM ISTORE/OS Compiler

1998 IRAM/ISTORE Presentations Articles MicroDesign Resources Dinner Meeting, 1/8/98 Embedded Memory Workshop, Japan, 3/15/98 Stanford Computer Science Colloquim, 5/6/98 University of Virginia Distinguished Lecture, 5/19/98 SIGMOD98 Keynote Address, 6/3/98 Articles “New Processor Paradigm: V-IRAM”, Microprocessor Report, 3/9/98, 17-19. “A perfect match.” New Scientist, 4/18/98, 36-39. "Professor's Idea for Speedy Chip Could Be More Than Academic ," Wall Street Journal, 8/28/98, B1, B4.

VIRAM-1 Specs/Goals Technology 0.18-0.20 micron, 5-6 metal layers, fast xtor Memory 16-32 MB Die size ≈ 250-300 mm2 Vector pipes/lanes 4 64-bit (or 8 32-bit or 16 16-bit) Target Low Power High Performance Serial I/O 4 lines @ 1 Gbit/s 8 lines @ 2 Gbit/s Poweruniversity ≈2 w @ 1-1.5 volt logic ≈10 w @ 1.5-2 volt logic Clockunivers. 200scalar/200vector MHz 300sc/300vector MHz Perfuniversity 1.6 GFLOPS64-6 GOPS16 2.4 GFLOPS64-10 GOPS16 Powerindustry ≈1 w @ 1-1.5 volt logic ≈10 w @ 1.5-2 volt logic Clockindustry 400scalar/400vector MHz 600s/600v MHz Perfindustry 3.2 GFLOPS64-12 GOPS16 4 GFLOPS64-16 GOPS16