February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa.

Slides:



Advertisements
Similar presentations
Mathematical Preliminaries
Advertisements

Visible-Surface Detection(identification)
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Addition Facts
CS16: Introduction to Data Structures & Algorithms
SE-292 High Performance Computing
Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
Singly Linked Lists What is a singly-linked list? Why linked lists?
Data Structures ADT List
Data Structures Using C++
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Hash Tables.
David Luebke 1 8/25/2014 CS 332: Algorithms Red-Black Trees.
B-Trees. Motivation When data is too large to fit in the main memory, then the number of disk accesses becomes important. A disk access is unbelievably.
Indexing.
Addition 1’s to 20.
25 seconds left…...
Week 1.
Binary Search Trees Ravi Chugh March 28, Review: Linked Lists Goal: Program that keeps track of friends Problem: Arrays have fixed length Solution:
Rizwan Rehman Centre for Computer Studies Dibrugarh University
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Foundations of Data Structures Practical Session #7 AVL Trees 2.
§2 Binary Trees Note: In a tree, the order of children does not matter. But in a binary tree, left child and right child are different. A B A B andare.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
IP Routing Lookups Scalable High Speed IP Routing Lookups.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Modern Information Retrieval Chapter 8 Indexing and Searching.
Modern Information Retrieval
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
1 CS 430: Information Discovery Lecture 4 Data Structures for Information Retrieval.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Data Structures & Algorithms Radix Search Richard Newman based on slides by S. Sahni and book by R. Sedgewick.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
1 Efficient packet classification using TCAMs Authors: Derek Pao, Yiu Keung Li and Peng Zhou Publisher: Computer Networks 2006 Present: Chen-Yu Lin Date:
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
CHAPTER 71 TREE. Binary Tree A binary tree T is a finite set of one or more nodes such that: (a) T is empty or (b) There is a specially designated node.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
IP Address Lookup Masoud Sabaei Assistant professor
Brought to you by Max (ICQ: TEL: ) February 5, 2005 Advanced Data Structures Introduction.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
Starting at Binary Trees
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
1 Searching Searching in a sorted linked list takes linear time in the worst and average case. Searching in a sorted array takes logarithmic time in the.
Lecture1 introductions and Tree Data Structures 11/12/20151.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
Advanced Data Structure By Kayman 21 Jan Outline Review of some data structures Array Linked List Sorted Array New stuff 3 of the most important.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Unit 9 Multi-Way Trees King Fahd University of Petroleum & Minerals
Tries 07/28/16 11:04 Text Compression
Multiway Search Trees Data may not fit into main memory
IP Routers – internal view
B+ Tree.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Presentation transcript:

February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa Akbar M. Kaykobad

February 12, 2007 WALCOM '2007 2/22 Outline Problem statement Current status and motivation for a new solution Preliminaries DiskTrie Idea Results Limitations Future directions

February 12, 2007 WALCOM '2007 3/22 Problem Statement Let S be a static set of n unique finite strings with the following operations:  Lookup (str) – check if the string str belong to the set S  Prefix-Matching (P) – find all the elements in S that have the same prefix P The Problem: An efficient data structure that can operate in low-spec mobile devices and supports this definition

February 12, 2007 WALCOM '2007 4/22 Current Status At present, use of mobile devices and different sensor networks is increasing rapidly Mobile devices and embedded systems are characterized by –  Low processing power  Low memory (both internal and external)  Low power consumption Data structures and algorithms addressing these devices has huge application

February 12, 2007 WALCOM '2007 5/22 Motivation for a New Solution Use of external memory is necessary Popular external memory data structures for computer include String B-tree, Hierarchy of indexes etc. The problem is still not very well discussed in case of flash memory (Gal and Toledo) Looking for a more space-efficient (both internal and external) data structure that is still competitive in terms of time efficiency

February 12, 2007 WALCOM '2007 6/22 Flash Memory Common memory that is extensively used in mobile/handheld devices Unique read/write/erase behavior than other programmable memories NOR flash memory supports random access and provides byte level addressing NAND flash memory is faster and provides block level access

February 12, 2007 WALCOM '2007 7/22 Trie A trivial trie is an m-ary tree Keys are stored in the leaf level; each unique path from the root to a leaf corresponds to a unique key Its search time can be considered as O(1) Inner Nodes Leaf Nodes

February 12, 2007 WALCOM '2007 8/22 Binary Trie and Path Compression Binary encoding ensures every node to have a maximum degree of two Depth of the trie increases Path-compression is used to reduce this Binary Trie Path-compressed Binary Trie

February 12, 2007 WALCOM '2007 9/22 Patricia Trie & LPC Trie Patricia trie is similar to path-compressed one but needs less memory Finally, level and path-compressed trie reduces the depth but the trie itself does not remain binary anymore Nilsson and Tikkanen has shown that an LPC trie has expected average depth of Θ(log*n) Patricia Trie Level and Path-compressed Trie

February 12, 2007 WALCOM ' /22 DiskTrie Idea Static external memory implementation of the LPC- trie Pre-build the trie in a computer and then transfer it to flash memory Three distinct phases –  Creation in computer  Placement in flash memory  Retrieval

February 12, 2007 WALCOM ' /22 Creation and Placement All the strings are lexicographically sorted and placed contiguously in flash memory Nodes of the DiskTrie are placed separately from the strings and leaf nodes contain pointers to actual strings they represent Page boundaries are always maintained in case of NAND memory All the child nodes of a parent node are placed in sequence to reduce the number of pointers

February 12, 2007 WALCOM ' /22 Retrieval Deals with two types of operations:  Lookup  Prefix-Matching Lookup starts from the root and proceeds until the search string is exhausted Each time a single node is retrieved from the disk in case of NOR flash memory and a whole block for NAND type

February 12, 2007 WALCOM ' /22 Lookup Algorithm procedure Lookup (str) { - currentNode ← root - while ( str is not exhausted & currentNode is NOT a leaf node) - Select childNode using str - currentNode ← childNode - end while - if ( error ) - return false - end if - return CompareStrings (str, currentNode→str) }

February 12, 2007 WALCOM ' /22 Retrieval (Cont.) For Prefix-Matching operation, the searching takes place in two phases:  Identification of a prospective leaf node to find the longest common prefix  Identification of the sub-trie or tries that contain the results

February 12, 2007 WALCOM ' /22 Illustration of the Prefix-Matching Operation (a) ‘P’ ends in a node (b) ‘P’ ends in an arc

February 12, 2007 WALCOM ' /22 Prefix-Matching Algorithm procedure Prefix-Matching (P) { - currentNode ← root - while ( P is not exhausted & currentNode is NOT a leaf node) - Select childNode using str - currentNode ← childNode - end while - if ( error ) - return NULL - end if - lNode ← left-most node in the probable region - rNode ← right-most node in the probable region - return all strings in the range }

February 12, 2007 WALCOM ' /22 Results - Storage Requirement DiskTrie needs two sets of components to be stored in the external memory:  Actual Strings, and  The data structure itself Linear storage space to store all the key strings A Patricia trie holding n strings has (2n – 1) nodes Hence, storage requirement for the total data structure is also linear While storing the nodes, block boundaries must be maintained. It results in some wastage

February 12, 2007 WALCOM ' /22 Results ( Cont. ) - Complexity of the Operations Lookup  Fetch only those nodes from the disk that are on the path to the goal node  The number of disk accesses is bounded by the depth of the trie, which is in turn Θ(log*n). log*n is the iterative logarithm function and defined as,  log*1 = 0  log*n = 1 + log*(ceil (log n)); for n > 1  Minimal internal memory required

February 12, 2007 WALCOM ' /22 Results ( Cont. ) Prefix-Matching  Probable range of the strings starting with the same prefix is identified using methods similar to Lookup. It takes Θ(log*n) disk accesses  In case of a successful search, it takes O(n/B) more disk accesses to retrieve the resultant strings if NAND memory is used (B is the block read size)  Sorted placement of the strings saves a lot of string comparisons  Internal memory requirement is minimal

February 12, 2007 WALCOM ' /22 Limitations Wastage of space in each disk block while storing the DiskTrie nodes In some cases, same disk blocks are accessed more than once

February 12, 2007 WALCOM ' /22 Future Directions More efficient storage management, specially removing the inherent wastage to maintain boundary property Take advantage of spatial locality

February 12, 2007 WALCOM ' /22