Berkeley Cluster: Zoom Project 3 TB storage system 370 8 GB disks, 20 200 MHz PPro PCs, 100Mbit Switched Ethernet System cost small delta (~30%) over raw disk cost Application: San Francisco Fine Arts Museum Server 70,000 art images online Zoom in 32X; try it yourself! www.Thinker.org (statue) Fine Arts Project - High-resolution pictures of over 60,000 objects of art stored on PhotoCD - Database allows users to search the images using keywords (titles, artist’s names) - Images converted to tiled format (GridPix), to ease zooming and scrolling within an image
User Decision Support Demand vs. Processor speed Database demand: 2X / 9-12 months Database-Proc. Performance Gap: “Greg’s Law” Moore’s Law is a laggard 250%/year for Greg 60%/year for Moore 7%/year for DRAM Decision support is linear in database size CPU speed 2X / 18 months “Moore’s Law”
Outline Technology: Disk, Network, Memory, Processor, Systems Description/Performance Models History/State of the Art/ Trends Limits/Innovations Technology leading to a New Database Opportunity? Common Themes across 5 Technologies Hardware & Software Alternative to Today Benchmarks
Review technology trends to help? Desktop Processor: + SPEC performance – TPC-C performance, – CPU-Memory perf. gap Embedded Processor: + Cost/Perf, + inside disk – controllers everywhere Disk Memory Network Capacity + + … Bandwidth + + + Latency – – – Interface – – –
IRAM: “Intelligent RAM” C C Proc B u s I/O I/O Microprocessor & DRAM on a single chip: on-chip memory latency 5-10X, bandwidth 50-100X serial I/O 5-10X v. buses improve energy efficiency 2X-4X (no off-chip bus) reduce number of controllers smaller board area/volume $ $ L2$ C C Bus Bus C $B for separate lines for logic and memory Single chip: either processor in DRAM or memory in logic fab D R A M I/O I/O ... Proc D R A M Bus D R A M
“Intelligent Disk”(IDISK): Scalable Decision Support? Low cost, low power processor & memory included in disk at little extra cost (e.g., Seagate optional track buffer) Scaleable processing AND communication as increase disks cross bar How does TPC-D scale with dataset size? Compare NCR 5100M 20 node system (each node is 8 133 MHz Pentium CPUs), March 28, 1997; 100 GB, 300GB, 1000GB Per 19 queries, all but 2 go up linearly with database size: (3-5 vs 300, 7-15 vs. 1000) e.g, interval time ratios 300/100 = 3.35; 1000/100=9.98; 1000/300= 2.97 How much memory for IBM SP2 node? 100 GB: 12 processors with 24 GB; 300 GB: 128 thin nodes with 32 GB total; 256 MB/node (2 boards/processor) TPC-D is business analysis vs. business operation 17 read only queries; results in queries per Gigabyte Hour Scale Factor (SF) multiplies each portion of the data: 10 to 10000 SF 10 is about 10 GB; indices + temp table increase 3X - 5X cross bar cross bar IRAM IRAM IRAM IRAM … … … … … … IRAM IRAM IRAM IRAM … … …