Rethinking Database Algorithms for Phase Change Memory

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

Chapter 13: Query Processing
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
ARIN Public Policy Meeting
Chapter 1 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Evaluating Caching and Storage Options on the Amazon Web Services Cloud Gagan Agrawal, Ohio State University - Columbus, OH David Chiu, Washington State.
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
Approximate Spatial Query Processing Using Raster Signatures Leonardo Guerreiro Azevedo, Rodrigo Salvador Monteiro, Geraldo Zimbrão & Jano Moreira de Souza.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Chapter 6 File Systems 6.1 Files 6.2 Directories
Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The.
Gennady Pekhimenko Advisers: Todd C. Mowry & Onur Mutlu
Predicting Performance Impact of DVFS for Realistic Memory Systems Rustam Miftakhutdinov Eiman Ebrahimi Yale N. Patt.
Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1.
Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.
CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.
SE-292 High Performance Computing
CS 105 Tour of the Black Holes of Computing
Storing Data: Disk Organization and I/O
Fast Crash Recovery in RAMCloud
Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
M AINTAINING L ARGE A ND F AST S TREAMING I NDEXES O N F LASH Aditya Akella, UW-Madison First GENI Measurement Workshop Joint work with Ashok Anand, Steven.
Indexing Large Data COMP # 22
Improving DRAM Performance by Parallelizing Refreshes with Accesses
Chapter 4 Memory Management Basic memory management Swapping
1 Overview Assignment 4: hints Memory management Assignment 3: solution.
Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.
Online Algorithm Huaping Wang Apr.21
Cache and Virtual Memory Replacement Algorithms
Module 10: Virtual Memory
UC Santa Cruz Reliability of MEMS-Based Storage Enclosures Bo Hong, Thomas J. E. Schwarz, S. J. * Scott A. Brandt, Darrell D. E. Long Storage Systems Research.
Chapter 10: Virtual Memory
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a cache for secondary (disk) storage – Managed jointly.
1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.
15. Oktober Oktober Oktober 2012.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Chapter 6 File Systems 6.1 Files 6.2 Directories
Introduction to Indexes Rui Zhang The University of Melbourne Aug 2006.
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
25 seconds left…...
Week 1.
SE-292 High Performance Computing
We will resume in: 25 Minutes.
February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
A Non-Blocking Join Achieving Higher Early Result Rate with Statistical Guarantees Shimin Chen* Phillip B. Gibbons* Suman Nath + *Intel Labs Pittsburgh.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
T-SPaCS – A Two-Level Single-Pass Cache Simulation Methodology + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing Wei Zang.
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
Phase Change Memory Aware Data Management and Application Jiangtao Wang.
@ Carnegie Mellon Databases Improving Hash Join Performance Through Prefetching Shimin Chen Phillip B. Gibbons Todd C. Mowry Anastassia Ailamaki ‡ Carnegie.
1 Lecture 16: Virtual Memory Today: DRAM innovations, virtual memory (Sections )
Last Time –Main memory indexing (T trees) and a real system. –Optimize for CPU, space, and logging. But things have changed drastically! Hardware trend:
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Carnegie Mellon Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Joint work with Shimin Chen School of Computer Science Carnegie.
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
Defining Anomalous Behavior for Phase Change Memory
Speaker: 吳晋賢 (Chin-Hsien Wu) Embedded Computing and Applications Lab Department of Electronic Engineering National Taiwan University of Science and Technology,
@ Carnegie Mellon Databases Inspector Joins Shimin Chen Phillip B. Gibbons Todd C. Mowry Anastassia Ailamaki 2 Carnegie Mellon University Intel Research.
© 2007 IBM Corporation MICRO-2009 Start-Gap: Low-Overhead Near-Perfect Wear Leveling for Main Memories Moinuddin Qureshi John Karidis, Michele Franceschini.
Optimizing Multidimensional Index Trees for Main Memory Access Author: Kihong Kim, Sang K. Cha, Keunjoo Kwon Members: Iris Zhang, Grace Yung, Kara Kwon,
Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh
Scalable High Performance Main Memory System Using PCM Technology
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Lecture 6: Reliability, PCM
Presentation transcript:

Rethinking Database Algorithms for Phase Change Memory Shimin Chen* Phillip B. Gibbons* Suman Nath+ *Intel Labs Pittsburgh +Microsoft Research

Introduction PCM is an emerging non-volatile memory technology Samsung is producing a PCM chip for mobile handsets Expected to become a common component in memory/storage hierarchy Recent computer architecture and systems studies argue: PCM will replace DRAM to be main memory PCM-DB project: exploiting PCM for database systems This paper: algorithm design on PCM-based main memory Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Outline Phase Change Memory PCM-Friendly Algorithm Design B+-Tree Index Hash Joins Related Work Conclusion Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Phase Change Memory (PCM) Byte-addressable non-volatile memory Two states of phase change material: Amorphous: high resistance, representing “0” Crystalline: low resistance, representing “1” Operations: e.g., ~610⁰C “RESET” to Amorphous Current (Temperature) Time e.g., ~350⁰C “SET” to Crystalline READ Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Comparison of Technologies   DRAM PCM NAND Flash Page size Page read latency Page write latency Write bandwidth Erase latency 64B 20-50ns 20-50ns ∼GB/s per die N/A 64B ∼ 50ns ∼ 1 µs 50-100 MB/s 4KB ∼ 25 µs ∼ 500 µs 5-40 MB/s per die ∼ 2 ms Endurance ∞ 106 − 108 104 − 105 Read energy Write energy Idle power 0.8 J/GB 1.2 J/GB ∼100 mW/GB 1 J/GB 6 J/GB ∼1 mW/GB 1.5 J/GB [28] 17.5 J/GB [28] 1–10 mW/GB Density 1× 2 − 4× 4× Compared to NAND Flash, PCM is byte-addressable, has orders of magnitude lower latency and higher endurance. Sources: [Doller’09] [Lee et al. ’09] [Qureshi et al.’09] Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Comparison of Technologies   DRAM PCM NAND Flash Page size Page read latency Page write latency Write bandwidth Erase latency 64B 20-50ns 20-50ns ∼GB/s per die N/A 64B ∼ 50ns ∼ 1 µs 50-100 MB/s 4KB ∼ 25 µs ∼ 500 µs 5-40 MB/s per die ∼ 2 ms Endurance ∞ 106 − 108 104 − 105 Read energy Write energy Idle power 0.8 J/GB 1.2 J/GB ∼100 mW/GB 1 J/GB 6 J/GB ∼1 mW/GB 1.5 J/GB [28] 17.5 J/GB [28] 1–10 mW/GB Density 1× 2 − 4× 4× Compared to DRAM, PCM has better density and scalability; PCM has similar read latency but longer write latency Sources: [Doller’09] [Lee et al. ’09] [Qureshi et al.’09] Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Relative Latencies: Read Write DRAM PCM NAND Flash Hard Disk 10ns 1us 10us 100us 1ms 10ms DRAM PCM NAND Flash Hard Disk Write Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

PCM-Based Main Memory Organizations PCM is a promising candidate for main memory Recent computer architecture and systems studies Three alternative proposals: [Condit et al’09] [Lee et al. ’09] [Qureshi et al.’09] For algorithm analysis, we focus on PCM main memory, and view optional DRAM as another (transparent/explicit) cache Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Write operation and hardware optimization Challenge: PCM Writes   PCM Page size Page read latency Page write latency Write bandwidth Erase latency 64B ∼ 50ns ∼ 1 µs 50-100 MB/s per die N/A Endurance 106 − 108 Read energy Write energy Idle power 1 J/GB 6 J/GB ∼1 mW/GB Density 2 − 4× Limited endurance Wear out quickly for hot spots High energy consumption 6-10X more energy than a read High latency & low bandwidth SET/RESET time > READ time PCM chip has limited instantaneous electric current level, requires multiple rounds of writes Write operation and hardware optimization Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Rounds highlighted w/ different colors PCM Write Operation [Cho&Lee’09] [Lee et al. ’09] [Yang et al’07] [Zhou et al’09] Baseline: several rounds of writes for a cache line Which bits in which rounds are hard wired Optimization: data comparison write Goal: write only modified bits rather than entire cache line Approach: read-compare-write Skipping rounds with no modified bits Rounds highlighted w/ different colors Cache line 1 1 1 1 1 PCM 1 1 1 1 1 1 Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Outline Phase Change Memory PCM-Friendly Algorithm Design B+-Tree Index Hash Joins Related Work Conclusion Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Algorithm Design Goals Algorithm design in main memory Prior design goals: Low computation complexity Good CPU cache performance Power efficiency (more recently) New goal: minimizing PCM writes Improve endurance, save energy, reduce latency Unlike flash, PCM word granularity Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

PCM Metrics Algorithm parameters: We propose three analytical metrics : cache misses (i.e. cache line fetches) : cache line write backs : words modified We propose three analytical metrics Total Wear (for Endurance) Energy Total PCM Access Latency PCM Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

B+-Tree Index Cache-friendly B+-Tree: Node size: one or a few cache lines large Problem: insertion/deletion in sorted nodes Incurs many writes! [Rao&Ross’00] [Chen et al’01] [Hankins et al. ’03] Insert/delete Sorted nodes are good for binary search , low instruction overhead. 5 2 4 7 8 9 keys num pointers Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Our Proposal: Unsorted Nodes Unsorted node with bitmap Unsorted leaf nodes, but sorted non-leaf nodes 5 8 2 9 4 7 keys num pointers 1011 1010 8 2 9 4 7 keys bitmap pointers Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Simulation Platform Cycle-accurate out-of-order X86-64 simulator: PTLSim Extended the simulator with PCM support Parameters based on computer architecture papers Sensitivity analysis for the parameters PTLSim PCM Data Comparison Writes Details of Write Backs in Memory Controller Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

B+-Tree Index Unsorted leaf schemes achieve the best performance Node size 8 cache lines; 50 million entries, 75% full; Three workloads: Inserting 500K random keys deleting 500K random keys searching 500K random keys B+-Tree Index Total wear Energy Execution time Unsorted leaf schemes achieve the best performance For insert intensive: unsorted-leaf For insert & delete intensive: unsorted-leaf with bitmap Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Simple Hash Join Build hash table on smaller (build) relation Probe hash table using larger (probe) relation Problem: too many cache misses Build + hash table >> CPU cache Record size is small Build Relation Probe Relation Hash Table Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Cache Partitioning Partition both tables into cache-sized partitions [Shatdal et al.’94] [Boncz et al.’99] [Chen et al. ’04] Partition both tables into cache-sized partitions Join each pair of partitions Problem: too many writes in partition phase! Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Our Proposal: Virtual Partitioning Join a pair of virtual partitions: Preserve good CPU cache performance while reducing writes Compressed Record ID Lists Build Relation Probe Relation Hash Table Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Hash Joins Virtual partitioning achieves the best performance 50MB joins 100MB; varying record size from 20B to 100B. Total wear Energy Execution time Virtual partitioning achieves the best performance Interestingly, cache partitioning is the worst in many cases Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Not considering read/write asymmetry of PCM Related Work PCM Architecture Hardware design issues: endurance, write latency, error correction, etc. Our focus: PCM friendly algorithm design Byte-Addressable NVM-Based File Systems Battery-Backed DRAM Main Memory Database Systems & Cache Friendly Algorithms Not considering read/write asymmetry of PCM Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Conclusion PCM is a promising non-volatile memory technology Expected to replace DRAM to be future main memory Algorithm design on PCM-based main memory New goal: minimize PCM writes Three analytical metrics PCM-friendly B+-tree and hash joins Experimental results show significant improvements Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath

Thank you! shimin.chen@intel.com Rethinking Database Algorithms for Phase Change Memory Shimin Chen, Phillip B. Gibbons, Suman Nath