Download presentation
Presentation is loading. Please wait.
1
Data Structures Hash Table (aka Dictionary) i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst, Brian Hayes, Andreas Veneris, Glenn Brookshear, Nicolas Christin, or Ion Stoica.
2
John Chuang2 Outline What is a data structure Basic building blocks: arrays and linked lists Data structures (uses, methods, performance): -List, stack, queue -Dictionary -Tree -Graph
3
John Chuang3 Dictionary Also known as hash table, lookup table, associative array, or map A searchable collection of key-value pairs -each key is associated with one value Many possible applications, e.g., -Address book -Student record database -Routing tables -…
4
John Chuang4 Dictionary Methods get(k): if the dictionary has an entry with key k, return its associated value; else return null put(k,v): insert entry (k,v) into the dictionary; if key is not already in dictionary, return null; else return old value associated with k remove(k): if dictionary has an entry with key k, remove it from dictionary and return its associated value; else return null size() isEmpty()
5
John Chuang5 Dictionary in Python Example: >>> agents = {‘006’:’Jack Giddings’, ‘007’:’James Bond’} >>> agents[‘001’] = ’Edward Donne’ >>> agents[‘007’] ‘James Bond’
6
John Chuang6 Desirable Properties Fast search, insert and delete (get, put, and remove) Efficient space usage Arrays: O(1) operations; not space efficient Linked lists: O(n) operations; space efficient Binary trees: O(log n) operations; space efficient Can we do better?
7
John Chuang7 Hash Table A generalization of an array that, if properly designed, can realize fast insert/delete/search operations with good space efficiency -average run-time of O(1) -worst case run-time of O(n) A hash table is: -Good for storing and retrieving key-value pairs -Not good for iterating through a list of items
8
John Chuang8 Hash Table A hash table consists of two major components: -Bucket array (for storing entries) -Hash function (for mapping keys to buckets) Obj5 key=1 obj1 key=15 Obj4 key=2 Obj2 key=36 7654321076543210 table Obj3 key=4 buckets hash value/index
9
John Chuang9 Hash Table Design Bucket array is an array A of size N, where each cell of A is considered a “bucket” Hash function h maps each key k to an integer in the range [0, N -1] Store entry (k,v) in the bucket A[h(k)] Search for entry (k,v) in the bucket A[h(k)] Choose hash function such that -Can take arbitrary objects (keys) as input -Hash computation is fast -Key mappings evenly distributed across [0, N -1]
10
John Chuang10 Example Hash Function h(k) = k mod N mod stands for modulo, the remainder of the division of two numbers. For example: -8 mod 5 = 3 -9 mod 5 = 4 -10 mod 5 = 0 -15 mod 5 = 0 Observe that collisions are possible: -two different keys hash to the same value
11
John Chuang11 Example h(k) = k mod 5 0123401234 key value Insert (2,x) 2 x 0123401234 key value Insert (21,y) 2 x 21 y 0123401234 key value Insert (34,z) 2 x 21 y 34 z Insert (54,w) There is a collision at array entry #4 ???
12
John Chuang12 Dealing with Collisions Hashing with Chaining: every hash table entry contains a pointer to a linked list of keys that hash in the same entry 0123401234 key value Insert (54,w) 2 x 21 y 54 w34 z CHAIN 0123401234 Insert (101,x) 21 y 2 x 101 x 54 w34 z
13
John Chuang13 Dictionary Performance What is the run time for insert/search/delete? -Insert: It takes O(1) time to compute the hash function and insert at head of linked list -Search: It is proportional to the length of the linked list -Delete: Same as search 0123401234 key value Insert (54,w) 2 x 21 y 54 w34 z CHAIN 0123401234 Insert (101,x) 21 y 2 x 101 x 54 w34 z
14
John Chuang14 Load Factor Average length of the linked lists is a function of the load factor -Load factor = number of items in hash table / array size In Python, the implementation details are transparent to the programmer (load factor kept under 2/3) In Java, the programmer can set the initial table size (default=16) and/or the load factor (default = 0.75)
15
John Chuang15 Distributed Hash Table (DHT) Similar to traditional hash table data structure, except data is stored in distributed peer nodes -Each node is analogous to a bucket in a hash table. Applications: distributed search, e.g., peer-to-peer networks, content distribution networks (CDNs), etc. DHTs are typically designed to scale to large numbers of nodes and to handle continual node arrivals and departures/failures. Put(), Get() interface like a regular hash table: -put(id, item); -item = get(id);
16
John Chuang16 DHT Example: Chord Nodes: n1(id=1), n2(id=2), n3(id=0), n4(id=6) Items inserted: f1(id=7), f2(id=1) 0 1 2 3 4 5 6 7 i id+2 i succ 0 2 2 1 3 6 2 5 6 Finger Table i id+2 i succ 0 3 6 1 4 6 2 6 6 Finger Table i id+2 i succ 0 1 1 1 2 2 2 4 6 Finger Table 7 Items 1 i id+2 i succ 0 7 0 1 0 0 2 2 2 Finger Table Slide adapted from Ion Stoica, Nicolas Christin
17
John Chuang17 DHT Example: Chord Upon receiving a query for item id, a node: Check if the item is stored locally If not, forwards the query to the largest node in its successor table that does not exceed id 0 1 2 3 4 5 6 7 i id+2 i succ 0 2 2 1 3 6 2 5 6 Finger Table i id+2 i succ 0 3 6 1 4 6 2 6 6 Finger Table i id+2 i succ 0 1 1 1 2 2 2 4 6 Finger Table 7 Items 1 i id+2 i succ 0 7 0 1 0 0 2 2 2 Finger Table query(7) Slide adapted from Ion Stoica, Nicolas Christin
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.