Some Results on Codes for Flash Memory Michael Mitzenmacher Includes work with Hilary Finucane, Zhenming Liu, Flavio Chierichetti.

Slides:



Advertisements
Similar presentations
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan And improvements with Kai-Min Chung.
Advertisements

Introduction to Flash Memories And Flash Translation Layer
Ulams Game and Universal Communications Using Feedback Ofer Shayevitz June 2006.
I/O Management and Disk Scheduling Chapter 11. I/O Driver OS module which controls an I/O device hides the device specifics from the above layers in the.
RAID Redundant Array of Independent Disks
AE1APS Algorithmic Problem Solving John Drake
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Hebrew Univ., Israel)
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Designing Floating Codes for Expected Performance Hilary Finucane Zhenming Liu Michael Mitzenmacher.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
Clustering and greedy algorithms — Part 2 Prof. Noah Snavely CS1114
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
Codes for Deletion and Insertion Channels with Segmented Errors Zhenming Liu Michael Mitzenmacher Harvard University, School of Engineering and Applied.
1 Eitan Yaakobi, Laura Grupp Steven Swanson, Paul H. Siegel, and Jack K. Wolf Flash Memory Summit, August 2010 University of California San Diego Efficient.
Santa Clara, CA USA August An Information Theory Approach for Flash Memory Eitan Yaakobi, Paul H. Siegel, Jack K. Wolf University of California,
1 Error Correction Coding for Flash Memories Eitan Yaakobi, Jing Ma, Adrian Caulfield, Laura Grupp Steven Swanson, Paul H. Siegel, Jack K. Wolf Flash Memory.
CSE 830: Design and Theory of Algorithms Dr. Eric Torng.
Registers  Flip-flops are available in a variety of configurations. A simple one with two independent D flip-flops with clear and preset signals is illustrated.
Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Coding for Flash Memories
Computer vision: models, learning and inference Chapter 10 Graphical Models.
1 Verification Codes Michael Luby, Digital Fountain, Inc. Michael Mitzenmacher Harvard University and Digital Fountain, Inc.
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Sorting - 3 CS 202 – Fundamental Structures of Computer Science II.
Virtual Memory.
Asaf Cohen (joint work with Rami Atar) Department of Mathematics University of Michigan Financial Mathematics Seminar University of Michigan March 11,
Coding and Algorithms for Memories Lecture 2 1.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Towers of Hanoi. Introduction This problem is discussed in many maths texts, And in computer science an AI as an illustration of recursion and problem.
CPSC 171 Introduction to Computer Science 3 Levels of Understanding Algorithms More Algorithm Discovery and Design.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Reading and Writing Mathematical Proofs
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Author: Sriram Ramabhadran, George Varghese Publisher: SIGMETRICS’03 Presenter: Yun-Yan Chang Date: 2010/12/29 1.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Coding and Algorithms for Memories Lecture 5 1.
Error Correction and Partial Information Rewriting for Flash Memories Yue Li joint work with Anxiao (Andrew) Jiang and Jehoshua Bruck.
Coding and Algorithms for Memories Lecture 4 1.
15-853:Algorithms in the Real World
CSC401 – Analysis of Algorithms Lecture Notes 2 Asymptotic Analysis Objectives: Mathematics foundation for algorithm analysis Amortization analysis techniques.
Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.
Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,
Unique! coding for three different motivation flash codes network coding Slepian-Wolf coding test, take-home test.
Storage and File structure COP 4720 Lecture 20 Lecture Notes.
Coding and Algorithms for Memories Lecture 7 1.
Yue Li joint work with Anxiao (Andrew) Jiang and Jehoshua Bruck.
Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams.
Coding and Algorithms for Memories Lecture 6 1.
Ch. 4 Memory Mangement Parkinson’s law: “Programs expand to fill the memory available to hold them.”
Advanced Science and Technology Letters Vol.35(Software 2013), pp Bi-Modal Flash Code using Index-less.
Chapter 9: Selection of Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Algorithm Complexity is concerned about how fast or slow particular algorithm performs.
Coding and Algorithms for Memories Lecture 2
Subject Name: File Structures
External Memory.
Synaptic Dynamics: Unsupervised Learning
BINARY STORAGE AND REGISTERS
Hidden Markov Models Part 2: Algorithms
Objective of This Course
Analysis of Algorithms
The Selection Problem.
Presentation transcript:

Some Results on Codes for Flash Memory Michael Mitzenmacher Includes work with Hilary Finucane, Zhenming Liu, Flavio Chierichetti

Flash Memory Now becoming the standard for many products and devices. –Even flash hard drives becoming a standard. But flash memory works differently than traditional memories. New, interesting questions….

Basics of Flash Data organized into cells –Can “write” at the cell level –Cells contain electrons –Can ADD electrons at the cell level –Typical ranges are 2-4 possible states, but may increase: 256 someday? Cells organized into blocks –Can only ERASE at the block level –Blocks can be thousands/hundreds of thousands of cells

The Problem with Erasures Erasing a block is expensive –In terms of time ; solve by preemptive moves of data. –In terms of wear. Limited life cycles imply minimizing block erasure an important goal.

Basics of Flash Reading and “one-way” writing = adding electrons is easy. Writing general values is hard. What should our data representation look like in such a setting?

Big Underlying Question How should flash change our underlying algorithms, data structures, data representation? –Memory structure, hierarchy has big impact on performance. –Algorithmists should care! Here focusing on basic question of data representation.

Some History Write-once memories (WOMs) –Introduced by Rivest and Shamir, early 1980’s. –Punch cards, optical disks. –Can turn 0’s to 1’s, but not back again. Question: How many punch card bits do you need to represent t rewrites of a k-bit value? –Starting point for this kind of analysis. –Better schemes than the naïve kt bits.

Floating Codes Data representation for flash memory. State is an n-ary sequence of q-ary numbers. –Represents block of n cells; each cell holds an electric charge, q states. State mapped to variable values. –Gives k-ary sequence of l-ary numbers. State changes by increasing one or more cell values, or reset entire block. –Resets are expensive!!!!

Floating Codes: The Problem As variable values change, need state to track variables. How do we choose the mapping function from states to variables AND the transition function from variable changes to state changes to maximize the time between reset operations? These codes do not correct errors. Just data representation. –Errors a separate issue.

Formal Model General Codes We usually consider limited variation; one variable changes per step.

Example Track k = 4 bits (so l = 2) with n = 8 cells having q = 4 states D D Change bit 3 R D Change bit 2 R D Change bit 1 R D Change bit 1 R

History Floating codes introduced by Jiang, Bohossian, Bruck (ISIT 2007) as model for Flash Memory. –Designed to maximize worst-case time between resets. New multidimensional flash codes suggested by Yaakobi, Vardy, Siegel, Wolf in Allerton Average case studied by Finucane, Liu, Mitzenmacher in Allerton 2008.

Contribution 1: New Worst-Case Codes Hilary Finucane’s senior thesis. –Similar codes also found simultaneously by Yaakobi et al. Simple construction, best known performance. Tracks k bits of data, for even k. Performance measured by deficiency. –Max possible updates is n(q-1). –Deficiency is smallest t such that n(q-1)-t updates always possible.

Mod-Based Codes Break block into groups of k cells. Each group will represent 1 bit. –And at most one active group per bit. –Parity of group determines value of bit. –Increase a cell by 1 each time the bit changes. How do we know which bit for each group? –Start with jth cell within a group to represent bit j. –As cells fill go right, moving back to first cell at end. –Either last empty cell is j - 1, or only non-full cell is j - 1; either way, can figure out which bit. Maximum deficiency: k 2 q. Independent of n!

Examples Track k = 8 bits with cells having q = 4 states Bit 5 is Bit 5 is Bit 1 is Bit 4 is Empty block, ignore Full block, ignore

Further Improvements Can improve basic construction by being more careful as available cells get small. Can prove O(kq(log 2 k)(log q k)) deficiency. –Use smaller blocks of cells, but explicitly write which bit it stores, when number of cells gets small.

Contribution 2: Average Case Argument: Worst-case time between resets is not right design criterion. –Many resets in a lifetime. –Mass-produced product. –Potential to model user behavior. Statistical performance guarantees more appropriate. –Expected time between resets. –Time with high probability. –Given a model.

Specific Contributions Problem definition / model Codes for simple cases

Formal Model : Average Case Above : when –Cost is 0 when R moves to cell state above previous, 1 otherwise. Assumption : variables changes given by Markov chain. –Example : ith bit changes with prob. p i –Given D, R, gives Markov chain on cell states. –Let  be equilibrium on cell states. Goal is to minimize average cost: –Same as maximize average time between resets.

Variations Many possible variations –Multiple variables change per step –More general random processes for values –Rules limiting transitions –General costs, optimizations Hardness results? –Conjecture some variations NP-hard or worse.

Building Block Code : n = 2, k = 2, l = 2 2 bit values. 2 cells. Code based on striped Gray code. Expected time/time with high probability before reset = 2q - o(q) Asymptotically optimal for all p, 0 < p < 1. Worst case optimal: approx 3q/2. D(0,0) = 00 D(1,3) = 11 R((1,0),2,1) = (2,0)

Proof Sketch “Even cells”: down with probability p, right with probability 1-p. “Odd cells” : right with probability p, down with probability 1-p. Code hugs the diagonal. Right/down moves approximately balance for first 2q-o(q) steps.

A Slightly Better Code Changing the final corner improves things.

Performance Results Scheme DWC, DGC, DGC+, DWC, DGC, DGC+,

Codes for k = l = 2 Break into Gray code blocks larger n. Each bit walks along diagonal of its own Gray code block. At the last block, behaves like n = 2, k = 2, l = 2 Expected deficiency O(sqrt(q)).

Example Bit 1 changes recorded from the left Bit 2 changes recorded from the right …. Meet somewhere in the middle, depending on rates

Random Codes Average-case analysis looks at random data Natural also to look at random codes (Shannon-style arguments) We consider random codes in the setting of general transitions. –All k bits can change simultaneously Give some insights into what may be possible. –Results in paper.

Conclusions New questions arising from flash memory. –How to store data to maximize lifetimes. –How to code to deal with errors. –How to optimize algorithms and data structures. –How to optimize memory hierarchies and variable- type memory systems. Big question: is this a core science game- changer? –How much should we be re-thinking?