CS 294-110: Technology Trends August 31, 2015 Ion Stoica and Ali Ghodsi (http://www.cs.berkeley.edu/~istoica/classes/cs294/15/)

Slides:

Advertisements

Similar presentations

Flash storage memory and Design Trade offs for SSD performance

Advertisements

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.

Caches Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)

Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.

Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.

1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.

Bandwidth Rocks (1) Latency Lags Bandwidth (last ~20 years) Performance Milestones Disk: 3600, 5400, 7200, 10000, RPM.

Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.

1 CS : Technology Trends Ion Stoica ( September 12, 2011.

Tape is Dead Disk is Tape Flash is Disk RAM Locality is King Jim Gray Microsoft December 2006 Presented at CIDR2007 Gong Show

+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.

UC Berkeley 1 The Datacenter is the Computer David Patterson Director, RAD Lab January, 2007.

Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu

Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.

Overview of Physical Storage Media

Egle Cebelyte. Random Access Memory is simply the storage area where all software is loaded and works from; also called working memory storage.

+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.

1 CS : Technology Trends Ion Stoica and Ali Ghodsi ( August 31, 2015.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

Storage Systems CSE 598d, Spring 2007 Lecture ?: Rules of thumb in data engineering Paper by Jim Gray and Prashant Shenoy Feb 15, 2007.

What is it and why do we need it? Chris Ward CS147 10/16/2008.

Moore’s Law Electronics 19 April Moore’s Original Data Gordon Moore Electronics 19 April 1965.

Taking SSDs to the Next Level – A Hot Year for Storage $9.5B+ in Acquisitions and Investments In the PC Market… Mainstream 3-bit MLC SSDs PCIe.

Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.

1 Meta-Message: Technology Ratios Matter Price and Performance change. If everything changes in the same way, then nothing really changes. If some things.

CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.

Introduction to Computers - Hardware

William Stallings Computer Organization and Architecture 6th Edition

TYPES OF MEMORY.

COSC3330 Computer Architecture

Lynn Choi School of Electrical Engineering

Seth Pugsley, Jeffrey Jestes,

COMP SYSTEM ARCHITECTURE

ECE 3055: Computer Architecture and Operating Systems

Hardware Technology Trends and Database Opportunities

Solid State Disks Testing with PROOF

Lecture 1: CS/ECE 3810 Introduction

Lecture 16: Data Storage Wednesday, November 6, 2006.

Beyond Moore’s Law, Issues and Approaches By: Patrick Cauley 4/8/16

Database Management Systems (CS 564)

RAM, CPUs, & BUSES Egle Cebelyte.

Berkeley Cluster: Zoom Project

HPE Persistent Memory Microsoft Ignite 2017

Hardware September 19, 2017.

Today How’s Lab 3 going? HW 3 will be out today

Architecture & Organization 1

CS-301 Introduction to Computing Lecture 17

Unit 2 Computer Systems HND in Computing and Systems Development

File Processing : Storage Media

Lecture 11: DMBS Internals

Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: illusion of having more physical memory program relocation protection.

Lecture 9: Data Storage and IO Models

Introduction to Computer Systems

What is the maximum capacity for DDR3 RAM?

Architecture & Organization 1

CS 140 Lecture Notes: Technology and Operating Systems

File Processing : Storage Media

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Today’s agenda Hardware architecture and runtime system

Tape is Dead Disk is Tape Flash is Disk RAM Locality is King

Motherboard External Hard disk USB 1 DVD Drive RAM CPU (Main Memory)

CS246: Search-Engine Scale

What is the maximum capacity for DDR3/DDR4 RAM?

Lecture: Cache Hierarchies

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Cache Memory and Performance

CS 295: Modern Systems Storage Technologies Introduction

Tape is Dead Disk is Tape Flash is Disk RAM Locality is King

Presentation transcript:

CS 294-110: Technology Trends August 31, 2015 Ion Stoica and Ali Ghodsi (http://www.cs.berkeley.edu/~istoica/classes/cs294/15/)

“Skate where the puck's going, not where it's been” – Walter Gretzky

Typical Server Node CPU Memory Bus RAM PCI SSD Ethernet SATA HDD

Typical Server Node Time to read all data: 10s sec CPU (80 GB/s) Memory Bus RAM (100s GB) (1 GB/s)* hours (1 GB/s) PCI SSD (600 MB/s)* (TBs) SATA 10s of hours Ethernet (50-100 MB/s)* HDD (10s TBs) * multiple channels

Moore’s Law Slowing Down Stated 50 years ago by Gordon Moore Number of transistors on microchip double every 2 years Today “closer to 2.5 years” - Brian Krzanich Number of transistors

Memory Capacity 2002 2017 +29% per year DRAM Capacity

Memory Price/Byte Evolution 1990-2000: -54% per year 2000-2010: -51% per year 2010-2015: -32% per year (http://www.jcmit.com/memoryprice.htm)

Typical Server Node CPU 30% Memory Bus RAM PCI SSD Ethernet SATA HDD

Normalized Performance (http://preshing.com/20120208/a-look-back-at-single-threaded-cpu-performance/)

Normalized Performance Today +10% per year (Intel personal communications) Normalized Performance (http://preshing.com/20120208/a-look-back-at-single-threaded-cpu-performance/)

Number of cores: +18-20% per year (Source: http://www.fool.com/investing/general/2015/06/22/1-huge-innovation-intel-corp-could-bring-to-future.aspx)

CPU Performance Improvement Number of cores: +18-20% Per core performance: +10% Aggregate improvement: +30-32%

Typical Server Node 30% 30% CPU Memory Bus RAM PCI SSD Ethernet SATA HDD

SSDs Rule of thumb: writes 10x more expensive than reads, Performance: Reads: 25us latency Write: 200us latency Erase: 1,5 ms Steady state, when SSD full One erase every 64 or 128 reads (depending on page size) Lifetime: 100,000-1 million writes per page Rule of thumb: writes 10x more expensive than reads, and erases 10x more expensive than writes

d Cost crosspoint!

SSDs vs. HDDs SSDs will soon become cheaper than HDDs Transition from HDDs to SSDs will accelerate Already most instances in AWS have SSDs Digital Ocean instances are SSD only Going forward we can assume SSD only clusters

Typical Server Node 30% 30% CPU Memory Bus RAM PCI SSD Ethernet SATA HDD

SSD Capacity Leverage Moore’s law 3D technologies will help outpace Moore’s law

Typical Server Node 30% 30% >30% CPU Memory Bus RAM PCI SSD Ethernet SATA HDD

Memory Bus: +15% per year

Typical Server Node 30% 30% 15% >30% CPU Memory Bus RAM PCI SSD Ethernet SATA HDD

PCI Bandwidth: 15-20% per Year

Typical Server Node 30% 30% 15% >30% 15-20% CPU Memory Bus RAM PCI SSD Ethernet SATA HDD

SATA 2003: 1.5Gbps (SATA 1) 2004: 3Gbps (SATA 2) 2008: 6Gbps (SATA 3) 2013: 16Gbps (SATA 3.2) +20% per year since 2004

Typical Server Node 30% 30% 15% >30% 15-20% 20% CPU Memory Bus RAM PCI SSD Ethernet 20% SATA HDD

Ethernet Bandwidth 2002 2017 33-40% per year 1998 1995

Typical Server Node 30% 30% 15% >30% 15-20% 33-40% 20% CPU Memory Bus RAM >30% 15-20% 33-40% PCI SSD 20% SATA Ethernet HDD

Summary so Far Bandwidth to storage is the bottleneck Will take longer and longer and longer to read entire data from RAM or SSD 30% CPU 30% 15% Memory Bus RAM >30% 15-20% 33-40% PCI SSD 20% SATA Ethernet HDD

But wait, there is more…

3D XPoint Technology Developed by Intel and Micron Released last month! Exceptional characteristics: Non-volatile memory 1000x more resilient than SSDs 8-10x density of DRAM Performance in DRAM ballpark!

(https://www.youtube.com/watch?v=IWsjbqbkqh8)

High-Bandwidth Memory Buses Today’s DDR4 maxes out at 25.6 GB/sec High Bandwidth Memory (HBM) led by AMD and NVIDIA Supports 1,024 bit-wide bus @ 125 GB/sec Hybrid Memory Cube (HMC) consortium led by Intel To be release in 2016 Claimed that 400 GB/sec possible! Both based on stacked memory chips Limited capacity (won’t replace DRAM), but much higher than on-chip caches Example use cases: GPGPUs

Example: HBM

Example: HBM

A Second Summary 3D XPoint promises virtually unlimited memory Non-volatile Reads 2-3x slower than RAM Writes 2x slower than reads 10x higher density Main limit for now 6GB/sec interface High memory bandwidth promise 5x increase in memory bandwidth or higher, but limited capacity so won’t replace DRAM

What does this Mean?

Thoughts For big data processing HDD are virtually dead! Still great for archival thought With 3D XPoint, RAM will finally become the new disk Gap between memory capacity and bandwidth still increasing

Thoughts Storage hierarchy gets more and more complex: L1 cache L2 cache L3 cache RAM 3D XPoint based storage SSD (HDD) Need to design software to take advantage of this hierarchy

Thoughts Primary way to scale processing power is by adding more core Per core performance increase only 10-15% per year now HBM and HBC technologies will alleviate the bottleneck to get data to/from multi-cores, including GPUs Moore’s law is finally slowing down Parallel computation models will become more and more important both at node and cluster levels

Thoughts Will locality become more or less important? New OSes that ignore disk and SSDs? Aggressive pre-computations Indexes, views, etc Tradeoff between query latency and result availability …