1 Lecture 20: Big Data, Memristors Today: architectures for big data, memristors.

Slides:



Advertisements
Similar presentations
Fast Crash Recovery in RAMCloud
Advertisements

 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
SILT: A Memory-Efficient, High-Performance Key-Value Store
RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Nandu Jayakumar, Diego Ongaro, Mendel Rosenblum,
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached Bohua Kou Jing gao.
Shimin Chen Big Data Reading Group Presented and modified by Randall Parabicoli.
The Memory Hierarchy (Lectures #24) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization.
1 Lecture 13: DRAM Innovations Today: energy efficiency, row buffer management, scheduling.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
1 Lecture 16: Virtual Memory Today: DRAM innovations, virtual memory (Sections )
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
Checkpoint Based Recovery from Power Failures Christopher Sutardja Emil Stefanov.
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Application-driven Energy-efficient Architecture Explorations for Big Data Authors: Xiaoyan Gu Rui Hou Ke Zhang Lixin Zhang Weiping Wang (Institute of.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Database Design – Lecture 16
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
CSCI 4717/5717 Computer Architecture
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Some key-value stores using log-structure Zhichao Liang LevelDB Riak.
Cool ideas from RAMCloud Diego Ongaro Stanford University Joint work with Asaf Cidon, Ankita Kejriwal, John Ousterhout, Mendel Rosenblum, Stephen Rumble,
Introduction to Hadoop and HDFS
VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,
Lecture 15: Virtual Memory EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Designing a Fast and Reliable Memory with Memristor Technology
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Lecture 19 Today’s topics Types of memory Memory hierarchy.
RAMCloud: Low-latency DRAM-based storage Jonathan Ellithorpe, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro,
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Cheap and Large CAMs for High Performance Data-Intensive Networked Systems Ashok Anand, Chitra Muthukrishnan, Steven Kappes, and Aditya Akella University.
Amar Phanishayee,LawrenceTan,Vijay Vasudevan
Database Applications (15-415) Part II- Hadoop Lecture 26, April 21, 2015 Mohammad Hammoud.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
RAMCloud Overview and Status John Ousterhout Stanford University.
COEN 180 Memory Hierarchy. We are therefore forced to recognize the possibility of constructing a hierarchy of memories, each of which has greater capacity.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.
CS 540 Database Management Systems
33 rd IEEE International Conference on Computer Design ICCD rd IEEE International Conference on Computer Design ICCD 2015 Improving Memristor Memory.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 7 – Buffer Management.
CSci8211: Distributed Systems: RAMCloud 1 Distributed Shared Memory/Storage Case Study: RAMCloud Developed by Stanford Platform Lab  Key Idea: Scalable.
DATABASE OPERATORS AND SOLID STATE DRIVES Geetali Tyagi ( ) Mahima Malik ( ) Shrey Gupta ( ) Vedanshi Kataria ( )
RAMCloud and the Low-Latency Datacenter John Ousterhout Stanford Platform Laboratory.
CS 704 Advanced Computer Architecture
Seth Pugsley, Jeffrey Jestes,
Database Management Systems (CS 564)
CSE-291 Cloud Computing, Fall 2016 Kesden
Interquery Parallelism
Accelerating Linked-list Traversal Through Near-Data Processing
Accelerating Linked-list Traversal Through Near-Data Processing
Ministry of Higher Education
Lecture 6: Reliability, PCM
CSC3050 – Computer Architecture
Presentation transcript:

1 Lecture 20: Big Data, Memristors Today: architectures for big data, memristors

2 FAWN Andersen et al., SOSP’09 Targeted at key-value pair based workloads Such workloads demand high bandwidth for random data look-ups Each node tries to maximize query rate for a given power budget (3-6 W): this is achieved with low-EPI cores and Flash-based storage It uses a log-structured data store to make sure that writes are append-only and sequential In-DRAM hash tables help locate data on reads

3 Cost Analysis A workload has a given capacity and bandwidth requirement – once you create a node, you’ll need enough nodes to meet these requirements The cost for a node is the sum of CapEx and OpEx: low-EPI processors help reduce both Workloads limited by capacity will benefit from disk and Flash, not DRAM; workloads limited by bw prefer DRAM

4 Cost Analysis

5 RAMCloud Ongaro et al., SOSP’11 Data is stored in DRAM across several nodes DRAM volatility and frequent node failures in a large cluster force the use of data replication and data striping for fast recovery Replication is done in Flash in the background with log-based structures; logs are first placed in DRAM buffers in multiple nodes and periodically written to Flash

6 Thin Servers with Smart Pipes Lim et al., ISCA’13 Designs a custom architecture for Memcached Memcached: table of unique keys and their values stored in DRAM and managed with an LRU policy; GET and SET are used for reads and writes Modern Xeon or Atom based designs are both highly under-utilized, primarily because of per-packet processing overheads in the NIC and OS: high icache/itlb misses and branch mispredictions Customized NIC and accelerator take care of packet processing and partial hash table look-ups in hardware

7 HARP Wu et al., ISCA’13 Data partitioning is an important component in many database operations, e.g., joins Range partitioning is one example, where partitions are defined by the ranges for a given key New software that simply loads/drains stream buffers and a new pipeline that reads in streamed inputs and partitions them into different buffers Full buffers get placed into the streamed outputs The partitioning logic is a pipeline of comparators

8 HARP

9 Resistive RAM Most new NVMs are resistive The term “ReRAM” typically refers to memristors A metal-oxide material sandwiched between two electrodes; the resistance depends on the direction of the last current to flow through the material High density is achieved with a 0T1R cell implemented with a “cross-point” structure Density can be increased with “3D-stacking”; more metal layers  more cells in a given area

10 ReRAM Cross-Point Structure Niu et al., ICCAD’13

11 Memristor Reads/Writes Each cross-point array has limited size so that sneak currents are manageable Can either do a 2-phase write or only write 1 cell in each array – the latter is fine because a cache line can be interleaved across several arrays HWHB for writes: all unselected wordlines and bitlines are at half-voltage and the selected word/bitline are fully biased For reads: only the selected wordline has read voltage; all other wordlines/bitlines are grounded; each bitline current indicates resistance of that cell

12 Title Bullet