Dictionaries, Hash Tables and Sets Dictionaries, Hash Tables, Hashing, Collisions, Sets SoftUni Team Technical Trainers Software University

Slides:



Advertisements
Similar presentations
Dictionaries, Hash Tables, Collisions Resolution, Sets Svetlin Nakov Telerik Corporation
Advertisements

Hashing as a Dictionary Implementation
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Software Quality Assurance QA Engineering, Testing, Bug Tracking, Test Automation Software University Technical Trainers SoftUni Team.
FEN 2012UCN Technology - Computer Science 1 Data Structures and Collections Principles revisited.NET: –Two libraries: System.Collections System.Collections.Generics.
C# Advanced Topics Methods, Classes and Objects SoftUni Team Technical Trainers Software University
AngularJS Services Built-in and Custom Services SoftUni Team Technical Trainers Software University
Methods Writing and using methods, overloads, ref, out SoftUni Team Technical Trainers Software University
Software University Curriculum, Courses, Exams, Jobs SoftUni Team Technical Trainers Software University
Dictionaries, Hash Tables, Hashing, Collisions, Sets Svetlin Nakov Telerik Software Academy academy.telerik.com Technical Trainer
Software Testing Lifecycle Exit Criteria Evaluation, Continuous Integration Ivan Yonkov Technical Trainer Software University.
NoSQL Databases NoSQL Concepts SoftUni Team Technical Trainers Software University
Conditional Statements Implementing Control-Flow Logic in C# SoftUni Team Technical Trainers Software University
Loops Repeating Code Multiple Times SoftUni Team Technical Trainers Software University
Methods, Arrays, Lists, Dictionaries, Strings, Classes and Objects
Svetlin Nakov Technical Trainer Software University
Build Processes and Continuous Integration Automating Build Processes Software University Technical Trainers SoftUni Team.
Multidimensional Arrays, Sets, Dictionaries Processing Matrices, Multidimensional Arrays, Dictionaries, Sets SoftUni Team Technical Trainers Software University.
Test-Driven Development Learn the "Test First" Approach to Coding SoftUni Team Technical Trainers Software University
Hashing Hashing is another method for sorting and searching data.
Java Collections Basics Arrays, Lists, Strings, Sets, Maps Svetlin Nakov Technical Trainer Software University
Graphs and Graph Algorithms Fundamentals, Terminology, Traversal, Algorithms SoftUni Team Technical Trainers Software University
Hashing as a Dictionary Implementation Chapter 19.
Arrays, Lists, Stacks, Queues Processing Sequences of Elements SoftUni Team Technical Trainers Software University
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
C# Basics Course Introduction Svetlin Nakov Technical Trainer Software University
Computational Complexity of Fundamental Data Structures, Choosing a Data Structure.
Forms Overview, Query string, Submitting arrays, PHP & HTML, Input types, Redirecting the user Mario Peshev Technical Trainer Software.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
Exam Preparation Algorithms Course: Sample Exam SoftUni Team Technical Trainers Software University
Dictionaries, Hash Tables, Collisions Resolution, Sets Svetlin Nakov Telerik Corporation
Java Collections Basics Arrays, Lists, Strings, Sets, Maps Bogomil Dimitrov Technical Trainer Software University
Design Patterns: Structural Design Patterns General and reusable solutions to common problems in software design Software University
Advanced C# Course Introduction SoftUni Team Technical Trainers Software University
Reflection Programming under the hood SoftUni Team Technical Trainers Software University
Mocking with Moq Tools for Easier Unit Testing SoftUni Team Technical Trainers Software University
1 Principles revisited.NET: Two libraries: System.Collections System.Collections.Generics Data Structures and Collections.
Operators and Expressions
Design Patterns: Behavioral Design Patterns General and reusable solutions to common problems in software design Software University
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Mocking Unit Testing Methods with External Dependencies SoftUni Team Technical Trainers Software University
Mocking with Moq Mocking tools for easier unit testing Svetlin Nakov Technical Trainer Software University
Programming for Beginners Course Introduction SoftUni Team Technical Trainers Software University
Objects and Classes Using Objects and Classes Defining Simple Classes SoftUni Team Technical Trainers Software University
Sets, Dictionaries SoftUni Team Technical Trainers Software University
Lists and Matrices Lists: Variable-Size Arrays Matrices: Arrays of Arrays (Tables) SoftUni Team Technical Trainers Software University
SoftUni Team Technical Trainers Software University Trees and Tree-Like Structures Trees, Tree-Like Structures, Binary Search Trees,
Advanced Tree Structures Binary Trees, AVL Tree, Red-Black Tree, B-Trees, Heaps SoftUni Team Technical Trainers Software University
Functional Programming Data Aggregation and Nested Queries Ivan Yonkov Technical Trainer Software University
First Steps in PHP Creating Very Simple PHP Scripts SoftUni Team Technical Trainers Software University
Inheritance Class Hierarchies SoftUni Team Technical Trainers Software University
Stacks and Queues Processing Sequences of Elements SoftUni Team Technical Trainers Software University
Generics SoftUni Team Technical Trainers Software University
Sets and Maps Chapter 9.
Sets, Hash table, Dictionaries
Multi-Dictionaries, Nested Dictionaries, Sets
Repeating Code Multiple Times
Fast String Manipulation
Processing Variable-Length Sequences of Elements
Combining Data Structures
Multidimensional Arrays, Sets, Dictionaries
Slides by Steve Armstrong LeTourneau University Longview, TX
Hash Tables, Sets and Dictionaries
Sets and Maps Chapter 9.
Presentation transcript:

Dictionaries, Hash Tables and Sets Dictionaries, Hash Tables, Hashing, Collisions, Sets SoftUni Team Technical Trainers Software University

Table of Contents 1.Dictionary (Map) Abstract Data Type 2.Hash Tables, Hashing and Collision Resolution 3.Dictionary Class 4.Sets: HashSet and SortedSet 2

Dictionaries Data Structures that Map Keys to Values

4  The abstract data type (ADT) "dictionary" maps key to values  Also known as "map" or "associative array"  Holds a set of {key, value} pairs  Dictionary ADT operations:  Add(key, value)  FindByKey(key)  value  Delete(key)  Many implementations  Hash table, balanced tree, list, array,... The Dictionary (Map) ADT

5  Sample dictionary: ADT Dictionary – Example KeyValue C# Modern general-purpose object-oriented programming language for the Microsoft.NET platform PHP Popular server-side scripting language for Web development compiler Software that transforms a computer program to executable machine code ……

Hash Tables What is Hash Table? How it Works?

7  A hash table is an array that holds a set of {key, value} pairshash table  The process of mapping a key to a position in a table is called hashing Hash Table …………………… …m-1 T h(k) Hash table of size m Hash function h : k → 0 … m-1

8  A hash table has m slots, indexed from 0 to m-1  A hash function h(k) maps the keys to positions:  h: k → 0 … m-1  For arbitrary value k in the key range and some hash function h we have h(k) = p and 0 ≤ p < m Hash Functions and Hashing …………………… …m-1 T h(k)

9  Perfect hashing function (PHF)  h(k) : one-to-one mapping of each key k to an integer in the range [0, m-1]  The PHF maps each key to a distinct integer within some manageable range  Finding a perfect hashing function is impossible in most cases  More realistically  Hash function h(k) that maps most of the keys onto unique integers, but not all Hashing Functions

10  A collision comes when different keys have the same hash value h(k 1 ) = h(k 2 ) for k 1 ≠ k 2  When the number of collisions is sufficiently small, the hash tables work quite well (fast)  Several collisions resolution strategies exist  Chaining collided keys (+ values) in a list  Re-hashing (second hash function)  Using the neighbor slots (linear probing)  Many other Collisions in a Hash Table

11 h (" Pesho ") = 4 h (" Kiro ") = 2 h (" Mimi ") = 1 h (" Ivan ") = 2 h (" Lili ") = m-1 Collision Resolution: Chaining Ivan null collision Chaining the elements in case of collisionnullMimiKironullPesho…Lili 01234…m-1 T null

12  Open addressing as collision resolution strategy means to take another slot in the hash-table in case of collision, e.g.  Linear probing: take the next empty slot just after the collision  h(key, i) = h(key) + i  where i is the attempt number: 0, 1, 2, …  Quadratic probing: the i th next slot is calculated by a quadratic polynomial ( c 1 and c 2 are some constants)  h(key, i) = h(key) + c 1 *i + c 2 *i 2  Re-hashing: use separate (second) hash-function for collisions  h(key, i) = h 1 (key) + i*h 2 (key) Collision Resolution: Open Addressing

13  The load factor (fill factor) = used cells / all cells  How much the hash table is filled, e.g. 65%  Smaller fill factor leads to:  Less collisions (faster average seek time)  More memory consumption  Recommended fill factors:  When chaining is used as collision resolution  less than 75%  When open addressing is used  less than 50% How Big the Hash-Table Should Be?

14 Adding Item to Hash Table With Chaining Ivan null nullMimiKironullnull…Lili 01234…m-1 T Add("Tanio") hash("Tanio") % m = 3 Fill factor >= 75%? Resize & rehash yes no map[3] == null? Insert("Tanio") Initiliaze linked list null yes

15 Lab Exercise Implement a Hash-Table with Chaining as Collision Resolution

16  The hash-table performance depends on the probability of collisions  Less collisions  faster add / find / delete operations  How to implement a good (efficient) hash function?  A good hash-function should distribute the input values uniformly  The hash code calculation process should be fast  Integer n  use n as hash value ( n % size as hash-table slot)  Real number r  use the bitwise representation of r  String s  use a formula over the Unicode representation of s Implementing a Good Hash Function

17  All C# / Java objects already have GetHashCode() method  Primitive types like int, long, float, double, decimal, …  Built-in types like: string, DateTime and Guid Built-In Hash Functions in C# / Java int c, hash1 = (5381<<16) ; int hash2 = hash1; char *s = src; while ((c = s[0]) != 0) { hash1 = ((hash1 << 5) + hash1) ^ c; hash1 = ((hash1 << 5) + hash1) ^ c; c = s[1]; c = s[1]; if (c == 0) if (c == 0) break; break; hash2 = ((hash2 << 5) + hash2) ^ c; hash2 = ((hash2 << 5) + hash2) ^ c; s += 2; s += 2;} return hash1 + (hash2 * ); Hash function for System.String

18  What if we have a composite key  E.g. FirstName + MiddleName + LastName ? 1. Convert keys to string and get its hash code: 2. Use a custom hash-code function: Hash Functions on Composite Keys var hashCode = (this.FirstName != null ? this.FirstName.GetHashCode() : 0); hashCode = (hashCode * 397) ^ (this.MiddleName != null ? this.MiddleName.GetHashCode() : 0); this.MiddleName.GetHashCode() : 0); hashCode = (hashCode * 397) ^ (this.LastName != null ? this.LastName.GetHashCode() : 0); this.LastName.GetHashCode() : 0); return hashCode; var key = string.Format("{0}-{1}-{2}", FirstName, MiddleName, LastName);

19  Hash table efficiency depends on:  Efficient hash-functions  Most implementations use the built-in hash-functions in C# / Java  Collisions should be as low as possible  Fill factor (used buckets / all buckets)  Typically 70% fill  resize and rehash  Avoid frequent resizing! Define the hash table capacity in advance  Collisions resolution algorithm  Most implementations use chaining with linked list Hash Tables and Efficiency

20  Hash tables are the most efficient dictionary implementation  Add / Find / Delete take just few primitive operations  Speed does not depend on the size of the hash-table  Amortized complexity O(1) – constant time  Example:  Finding an element in a hash-table holding elements takes average just 1-2 steps  Finding an element in an array holding elements takes average steps Hash Tables and Efficiency

Hash Tables in C# The Dictionary Class

22 Dictionaries:.NET Interfaces and Classes

23  Implements the ADT dictionary as hash table  The size is dynamically increased as needed  Contains a collection of key-value pairs  Collisions are resolved by chaining  Elements have almost random order  Ordered by the hash code of the key  Dictionary relies on:  Object.Equals() – compares the keys  Object.GetHashCode() – calculates the hash codes of the keys Dictionary

24  Major operations:  Add(key, value) – adds an element by key + value  Remove(key) – removes a value by key  this[key] = value – add / replace element by key  this[key] – gets an element by key  Clear() – removes all elements  Count – returns the number of elements  Keys – returns a collection of all keys (in order of entry)  Values – returns a collection of all values (in order of entry) Dictionary (2) Exception when the key already exists Returns true / false Exception on non-existing key

25 Dictionary (3)  Major operations:  ContainsKey(key) – checks if given key exists in the dictionary  ContainsValue(value) – checks whether the dictionary contains given value  Warning: slow operation – O(n)  TryGetValue(key, out value)  If the key is found, returns it in the value parameter  Otherwise returns false

26 Dictionary – Example var studentGrades = new Dictionary (); studentGrades.Add("Ivan", 4); studentGrades.Add("Peter", 6); studentGrades.Add("Maria", 6); studentGrades.Add("George", 5); int peterGrade = studentGrades["Peter"]; Console.WriteLine("Peter's grade: {0}", peterGrade); Console.WriteLine("Is Peter in the hash table: {0}", studentsGrades.ContainsKey("Peter")); studentsGrades.ContainsKey("Peter")); Console.WriteLine("Students and their grades:"); foreach (var pair in studentsGrades) { Console.WriteLine("{0} --> {1}", pair.Key, pair.Value); Console.WriteLine("{0} --> {1}", pair.Key, pair.Value);}

Dictionary Live Demo

28 Counting the Words in a Text string text = "a text, some text, just some text"; var wordsCount = new Dictionary (); string[] words = text.Split(' ', ',', '.'); foreach (string word in words) { int count = 1; int count = 1; if (wordsCount.ContainsKey(word)) if (wordsCount.ContainsKey(word)) count = wordsCount[word] + 1; count = wordsCount[word] + 1; wordsCount[word] = count; wordsCount[word] = count;} foreach(var pair in wordsCount) { Console.WriteLine("{0} -> {1}", pair.Key, pair.Value); Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);}

Counting the Words in a Text Live Demo

30  Data structures can be nested, e.g. dictionary of lists: Dictionary > Nested Data Structures static Dictionary > studentGrades = new Dictionary >(); new Dictionary >(); private static void AddGrade(string name, int grade) { if (! studentGrades.ContainsKey(name)) if (! studentGrades.ContainsKey(name)) { studentGrades[name] = new List (); studentGrades[name] = new List (); } studentGrades[name].Add(grade); studentGrades[name].Add(grade);}

31 Nested Data Structures (2) var countriesAndCities = new Dictionary >(); new Dictionary >(); countriesAndCities["Bulgaria"] = new Dictionary ()); countriesAndCities["Bulgaria"]["Sofia"] = ; countriesAndCities["Bulgaria"]["Plovdiv"] = ; countriesAndCities["Bulgaria"]["Pernik"] = 30000; foreach (var city in countriesAndCities["Bulgaria"]) { Console.WriteLine("{0} : {1}", city.Key, city.Value); Console.WriteLine("{0} : {1}", city.Key, city.Value);} var totalPopulation = countriesAndCities["Bulgaria"].Sum(c => c.Value);.Sum(c => c.Value);Console.WriteLine(totalPopulation);

Dictionary of Lists Live Demo

Balanced Tree-Based Dictionaries The Sorted Dictionary Class

34  SortedDictionary implements the ADT "dictionary" as self-balancing search tree  Elements are arranged in the tree ordered by key  Traversing the tree returns the elements in increasing order  Add / Find / Delete perform log 2 (n) operations  Use SortedDictionary when you need the elements sorted by key  Otherwise use Dictionary – it has better performance SortedDictionary

35 Counting Words (Again) string text = "a text, some text, just some text"; IDictionary wordsCount = new SortedDictionary (); new SortedDictionary (); string[] words = text.Split(' ', ',', '.'); foreach (string word in words) { int count = 1; int count = 1; if (wordsCount.ContainsKey(word)) if (wordsCount.ContainsKey(word)) count = wordsCount[word] + 1; count = wordsCount[word] + 1; wordsCount[word] = count; wordsCount[word] = count;} foreach(var pair in wordsCount) { Console.WriteLine("{0} -> {1}", pair.Key, pair.Value); Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);}

Counting the Words in a Text Live Demo

Comparing Dictionary Keys Using Custom Key Classes in Dictionary and SortedDictionary

38  Dictionary relies on  Object.Equals() – for comparing the keys  Object.GetHashCode() – for calculating the hash codes of the keys  SortedDictionary relies on IComparable for ordering the keys  Built-in types like int, long, float, string and DateTime already implement Equals(), GetHashCode() and IComparable  Other types used when used as dictionary keys should provide custom implementations IComparable

39 Implementing Equals() and GetHashCode() public struct Point { public int X { get; set; } public int X { get; set; } public int Y { get; set; } public int Y { get; set; } public override bool Equals(Object obj) public override bool Equals(Object obj) { if (!(obj is Point) || (obj == null)) return false; if (!(obj is Point) || (obj == null)) return false; Point p = (Point)obj; Point p = (Point)obj; return (X == p.X) && (Y == p.Y); return (X == p.X) && (Y == p.Y); } public override int GetHashCode() public override int GetHashCode() { return (X > 16) ^ Y; return (X > 16) ^ Y; }}

40 Implementing IComparable public struct Point : IComparable public struct Point : IComparable { public int X { get; set; } public int X { get; set; } public int Y { get; set; } public int Y { get; set; } public int CompareTo(Point otherPoint) public int CompareTo(Point otherPoint) { if (X != otherPoint.X) if (X != otherPoint.X) { return this.X.CompareTo(otherPoint.X); return this.X.CompareTo(otherPoint.X); } else else { return this.Y.CompareTo(otherPoint.Y); return this.Y.CompareTo(otherPoint.Y); } }}

Sets Sets of Elements

42  The abstract data type (ADT) "set" keeps a set of elements with no duplicates  Sets with duplicates are also known as ADT "bag"  Set operations:  Add(element)  Contains(element)  true / false  Delete(element)  Union(set) / Intersect(set)  Sets can be implemented in several ways  List, array, hash table, balanced tree,... Set and Bag ADTs

43 Sets:.NET Interfaces and Implementations

44  HashSet implements ADT set by hash table  Elements are in no particular order  All major operations are fast:  Add(element) – appends an element to the set  Does nothing if the element already exists  Remove(element) – removes given element  Count – returns the number of elements  UnionWith(set) / IntersectWith(set) – performs union / intersection with another set HashSet

45 HashSet – Example ISet firstSet = new HashSet ( new string[] { "SQL", "Java", "C#", "PHP" }); new string[] { "SQL", "Java", "C#", "PHP" }); ISet secondSet = new HashSet ( new string[] { "Oracle", "SQL", "MySQL" }); new string[] { "Oracle", "SQL", "MySQL" }); ISet union = new HashSet (firstSet); union.UnionWith(secondSet); foreach (var element in union) { Console.Write("{0} ", element); Console.Write("{0} ", element);}Console.WriteLine();

46  SortedSet implements ADT set by balanced search tree (red-black tree)  Elements are sorted in increasing order  Example: SortedSet ISet firstSet = new SortedSet ( new string[] { "SQL", "Java", "C#", "PHP" }); new string[] { "SQL", "Java", "C#", "PHP" }); ISet secondSet = new SortedSet ( new string[] { "Oracle", "SQL", "MySQL" }); new string[] { "Oracle", "SQL", "MySQL" }); ISet union = new HashSet (firstSet); union.UnionWith(secondSet); PrintSet(union); // C# Java PHP SQL MySQL Oracle

HashSet and SortedSet Live Demo

48 Data StructureInternal Structure Time Compexity (Add/Update/Delete) Dictionary<K,V>HashSet<K> amortized O(1) SortedDictionary<K,V>SortedSet<K>O(log(n)) Dictionaries and Sets Comparison

49  Dictionaries map key to value  Can be implemented as hash table or balanced search tree  Hash-tables map keys to values  Rely on hash-functions to distribute the keys in the table  Collisions needs resolution algorithm (e.g. chaining)  Very fast add / find / delete – O(1)  Sets hold a group of elements  Hash-table or balanced tree implementations Summary

? ? ? ? ? ? ? ? ? Dictionaries, Hash Tables and Sets

License  This course (slides, examples, labs, videos, homework, etc.) is licensed under the "Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International" licenseCreative Commons Attribution- NonCommercial-ShareAlike 4.0 International 51  Attribution: this work may contain portions from  "Fundamentals of Computer Programming with C#" book by Svetlin Nakov & Co. under CC-BY-SA licenseFundamentals of Computer Programming with C#CC-BY-SA  "Data Structures and Algorithms" course by Telerik Academy under CC-BY-NC-SA licenseData Structures and AlgorithmsCC-BY-NC-SA

Free Software University  Software University Foundation – softuni.orgsoftuni.org  Software University – High-Quality Education, Profession and Job for Software Developers  softuni.bg softuni.bg  Software Facebook  facebook.com/SoftwareUniversity facebook.com/SoftwareUniversity  Software YouTube  youtube.com/SoftwareUniversity youtube.com/SoftwareUniversity  Software University Forums – forum.softuni.bgforum.softuni.bg