Apr 17, 2013 Persistent Data Structures. Definitions An immutable data structure is one that, once created, cannot be modified Immutable data structures.

Slides:



Advertisements
Similar presentations
Introduction to Linked Lists In your previous programming course, you saw how data is organized and processed sequentially using an array. You probably.
Advertisements

The Assembly Language Level
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada.
BTrees & Bitmap Indexes
Higher Order Tries Key = Social Security Number.   9 decimal digits. 10-way trie (order 10 trie) Height
1 CS 177 Week 12 Recitation Slides Running Time and Performance.
Design a Data Structure Suppose you wanted to build a web search engine, a la Alta Vista (so you can search for “banana slugs” or “zyzzyvas”) index say.
Accelerated Cascading Advanced Algorithms & Data Structures Lecture Theme 16 Prof. Dr. Th. Ottmann Summer Semester 2006.
Indexed Search Tree (Trie) Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Binary Search Introduction to Trees. Binary searching & introduction to trees 2 CMPS 12B, UC Santa Cruz Last time: recursion In the last lecture, we learned.
Cmpt-225 Algorithm Efficiency.
The Most Commonly-used Data Structures
Rossella Lau Lecture 1, DCO20105, Semester A, DCO Data structures and algorithms  Lecture 1: Introduction What this course is about:  Data.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
Computer Science 2 Data Structures and Algorithms V section 2 Intro to “big o” Lists Professor: Evan Korth New York University 1.
29-Jun-15 Java Concurrency. Definitions Parallel processes—two or more Threads are running simultaneously, on different cores (processors), in the same.
Arrays. 2 The array data structure An array is an indexed sequence of components Typically, the array occupies sequential storage locations The length.
Chapter 3: Arrays, Linked Lists, and Recursion
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Rossella Lau Lecture 1, DCO20105, Semester A, DCO Data structures and algorithms  Lecture 1: Introduction What this course is about:  Data.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
Chapter 19 Java Data Structures
Computer Science 2 Data Structures and Algorithms V Intro to “big o” Lists Professor: Evan Korth New York University 1.
JAVA: An Introduction to Problem Solving & Programming, 5 th Ed. By Walter Savitch and Frank Carrano. ISBN © 2008 Pearson Education, Inc., Upper.
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
ITEC 352 Lecture 11 ISA - CPU. ISA (2) Review Questions? HW 2 due on Friday ISA –Machine language –Buses –Memory.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
Storage CMSC 461 Michael Wilson. Database storage  At some point, database information must be stored in some format  It’d be impossible to store hundreds.
Collections F The limitations of arrays F Java Collection Framework hierarchy  Use the Iterator interface to traverse a collection  Set interface, HashSet,
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 22 Java Collections.
1 Trees Tree nomenclature Implementation strategies Traversals –Depth-first –Breadth-first Implementing binary search trees.
Oct 29, 2001CSE 373, Autumn External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer.
Data : The Small Forwarding Table(SFT), In general, The small forwarding table is the compressed version of a trie. Since SFT organizes.
CSE AU B-Trees1 B-Trees CSE 373 Data Structures.
Binomial Queues Text Read Weiss, §6.8 Binomial Queue Definition of binomial queue Definition of binary addition Building a Binomial Queue Sequence of inserts.
Introduction to Algorithms Jiafen Liu Sept
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Data structures Abstract data types Java classes for Data structures and ADTs.
Tries1. 2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3)
Higher Order Tries Key = Social Security Number.   9 decimal digits. 10-way trie (order 10 trie) Height
1 CS 430: Information Discovery Lecture 4 Files Structures for Inverted Files.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
CSC 172 DATA STRUCTURES. SETS and HASHING  Unadvertised in-store special: SETS!  in JAVA, see Weiss 4.8  Simple Idea: Characteristic Vector  HASHING...The.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
Data Structure II So Pak Yeung Outline Review  Array  Sorted Array  Linked List Binary Search Tree Heap Hash Table.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
1 Becoming More Effective with C++ … Day Two Stanley B. Lippman
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
Given a node v of a doubly linked list, we can easily insert a new node z immediately after v. Specifically, let w the be node following v. We execute.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Assignment 5 is posted. Exercise 8 is very similar to what you will be doing with assignment 5. Exam.
Introduction to Algorithm Complexity Bit Sum Problem.
Tries 4/16/2018 8:59 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Lecture 6 of Computer Science II
Chapter 19 Java Data Structures
Hashing Exercises.
Tries 9/14/ :13 AM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and.
Searching.
Important Concepts from Clojure
Important Concepts from Clojure
Persistent data structures
Indexing and Hashing Basic Concepts Ordered Indices
Data structures and algorithms
Important Concepts from Clojure
CMSC 341 Extensible Hashing.
Tree A tree is a data structure in which each node is comprised of some data as well as node pointers to child nodes
CSE 326: Data Structures Lecture #14
Presentation transcript:

Apr 17, 2013 Persistent Data Structures

Definitions An immutable data structure is one that, once created, cannot be modified Immutable data structures can (usually) be copied, with modifications, to create a new version The modified version takes up as much memory as the original version A persistent data structure is one that, when modified, retains both the old and the new values Persistent data structures are effectively immutable, in that prior references to it do not see any change Modifying a persistent data structure may copy part of the original, but the new version shares memory with the original This definition is unrelated to persistent storage, which means keeping a copy of data on disk between program executions

Why persistent data structures? Functional programming is based on the idea of immutable data—or persistent data, which is effectively immutable The use of immutable data structures greatly simplifies concurrent programming Synchronization is expensive, and immutable data structures don’t need to be synchronized Copying large data structures is expensive and wastes space, but persistent data structures can use sophisticated structure sharing to reduce the cost on disk between program executions

Lists Lists are the original persistent data structures, and are very heavily used in functional programming xz y original w insert wdelete x As you can see, persistence is automatic with a list, and requires no additional effort

Trees and binary trees Trees and binary trees can also be implemented in a persistent fashion, though it takes a bit more work 5 A BC DEFG HIJKLM N A’ C’ G’

Arrays and vectors It’s more difficult to implement a persistent array The programming language Clojure implements persistent vectors, which are like arrays but can be expanded Any location in a vector can be accessed in (almost) O(1) time Vectors are represented as “fat trees,” or more precisely, as 32-tries 6

Tries A trie is like a binary search tree, only each node may have many children Tries are most often used with strings (and have up to 26 children per node) Each node of a 32-trie may have 32 children 7

Vector implementation I A persistent vector in Clojure is implemented as an N-level trie (N <= 7), where the root and internal nodes are arrays of 32 references, and the leaves are arrays of 32 values The depth of the trie (1 to 7) is also kept as an instance value For example, consider accessing location 5000 in a vector 5000 decimal is binary To acess element 5000 in a trie of depth 4: The binary number in group 4 (green) says to take the 0 th reference The binary number in group 3 (orange) says to take the 5 th reference The binary number in group 2 (green) says to take the 28 th reference The binary number in group 1 (blue) says to take the 8 th value 8

Vector implementation II The trie can be treated as a “fat tree,” with the structure sharing discussed earlier Because the trie is fat (many children per node), there is a high proportion of actual data to structure Access time is “almost” O(1), but as the size increases, the constant factor grows from 1 to 7 (depth of trie) This design is especially good for appending vectors For adding single elements to the end of the vector, there are additional special-case optimizations 9

Persistent Hash Map Since (in Java and Clojure) a hash code is a 32-bit integer, a hash map could be implemented just like a vector For a vector, the additional space required for the trie structure is a reasonable proportion of the total space For a hash map, the additional space required is not reasonable There will be a large number of 32-element arrays which contain mostly nulls The hard part is to use only as much space as needed Basic approach: Use arrays size N <= 32, where N is the number of non-null children Use a 32-bit word to indicate which children are actually present For example: indicates 5 children Find a fast function to map numbers in the range [0, 31] into the range [0, N) Many processors have an instruction to count the number of 1 bits in a word This would make a good assignment for the next time I teach this course 10

The End 11 Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning. --Sir Winston Churchill, Speech in November 1942