Download presentation
Presentation is loading. Please wait.
Published byYuliani Vera Susman Modified over 5 years ago
1
IRAM Vision Microprocessor & DRAM on a single chip:
B u s Proc L o g i c f a b I/O I/O Microprocessor & DRAM on a single chip: on-chip memory latency 5-10X, bandwidth X improve energy efficiency 2X-4X (no off-chip bus) serial I/O 5-10X v. buses smaller board area/volume adjustable memory size/width $ $ L2$ Bus Bus $B for separate lines for logic and memory Single chip: either processor in DRAM or memory in logic fab D R A M I/O I/O Proc D R A M f a b Bus D R A M
2
IRAM Update 2 test chips: serial lines (MOSIS) + Embedded DRAM/Crossbar (LG Semicon) Simulator/Architecture Manual Completed Initial Compiler (“VIC”) Completed Partner for scalar processor (Sandcraft/MIPS) LG delays, prospects => stick to plan to re-evaluate options for IRAM prototype Foundary: TSMC, UMC DRAM companies: IBM, Micron, NEC, Toshiba Applications: FFT, segmentation, ...
3
IRAM App: ISTORE (“Intelligent Storage”)
1 IRAM/DRAM + crossbar switch + fast serial link v. conventional SMP Move function to data v. data to CPU B u s Proc I/O I/O $ $ Conventional CPU L2$ Bus Bus How does TPC-D scale with dataset size? Compare NCR 5100M 20 node system (each node is MHz Pentium CPUs), March 28, 1997; 100 GB, 300GB, 1000GB Per 19 queries, all but 2 go up linearly with database size: (3-5 vs 300, 7-15 vs. 1000) e.g, interval time ratios 300/100 = 3.35; 1000/100=9.98; 1000/300= 2.97 How much memory for IBM SP2 node? 100 GB: 12 processors with 24 GB; 300 GB: 128 thin nodes with 32 GB total; 256 MB/node (2 boards/processor) TPC-D is business analysis vs. business operation 17 read only queries; results in queries per Gigabyte Hour Scale Factor (SF) multiplies each portion of the data: 10 to 10000 SF 10 is about 10 GB; indices + temp table increase 3X - 5X I R A M … cross bar
4
Another Vision of ISTORE
CPU/Memory 1 IRAM/disk + xbar + fast serial link v. conventional SMP, cluster Network latency = f(SW overhead), not link distance Move function to data v. data to CPU (scan, sort, join,...) Cost/performace, more scalable cross bar How does TPC-D scale with dataset size? Compare NCR 5100M 20 node system (each node is MHz Pentium CPUs), March 28, 1997; 100 GB, 300GB, 1000GB Per 19 queries, all but 2 go up linearly with database size: (3-5 vs 300, 7-15 vs. 1000) e.g, interval time ratios 300/100 = 3.35; 1000/100=9.98; 1000/300= 2.97 How much memory for IBM SP2 node? 100 GB: 12 processors with 24 GB; 300 GB: 128 thin nodes with 32 GB total; 256 MB/node (2 boards/processor) TPC-D is business analysis vs. business operation 17 read only queries; results in queries per Gigabyte Hour Scale Factor (SF) multiplies each portion of the data: 10 to 10000 SF 10 is about 10 GB; indices + temp table increase 3X - 5X cross bar cross bar IRAM IRAM IRAM IRAM … … … … … … IRAM IRAM IRAM IRAM … … …
5
ISTORE Update Build prototypes to gain experience, develop software before IRAM chips arrive Replace with IRAM chips once available ISTORE-0: 2 Sandcraft Development boards + Fast Ethernet + Real-time OS ISTORE-1: Design small board (CPU, DRAM, Ethernet) and place inside disk enclosure, build node system (Ethernet switch) ISTORE-2: “Intelligent SIMM” module based on Mitsubishi M32RXD (DRAM interface+CPU)
6
IRAM/ISTORE Schedule IRAM ISTORE/OS Compiler
7
1998 IRAM/ISTORE Presentations Articles
MicroDesign Resources Dinner Meeting, 1/8/98 Embedded Memory Workshop, Japan, 3/15/98 Stanford Computer Science Colloquim, 5/6/98 University of Virginia Distinguished Lecture, 5/19/98 SIGMOD98 Keynote Address, 6/3/98 Articles “New Processor Paradigm: V-IRAM”, Microprocessor Report, 3/9/98, “A perfect match.” New Scientist, 4/18/98, "Professor's Idea for Speedy Chip Could Be More Than Academic ," Wall Street Journal, 8/28/98, B1, B4.
8
VIRAM-1 Specs/Goals Technology micron, 5-6 metal layers, fast xtor Memory MB Die size ≈ mm2 Vector pipes/lanes bit (or 8 32-bit or bit) Target Low Power High Performance Serial I/O 4 1 Gbit/s 8 2 Gbit/s Poweruniversity ≈ volt logic ≈ volt logic Clockunivers scalar/200vector MHz 300sc/300vector MHz Perfuniversity GFLOPS64-6 GOPS GFLOPS64-10 GOPS16 Powerindustry ≈ volt logic ≈ volt logic Clockindustry 400scalar/400vector MHz 600s/600v MHz Perfindustry 3.2 GFLOPS64-12 GOPS GFLOPS64-16 GOPS16
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.