Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hashing & Hash Tables. Sets/Dictionaries Set - Our best efforts to date:

Similar presentations


Presentation on theme: "Hashing & Hash Tables. Sets/Dictionaries Set - Our best efforts to date:"— Presentation transcript:

1 Hashing & Hash Tables

2 Sets/Dictionaries Set - Our best efforts to date:

3 Easy Set Fast way to represent set if 0-9 only possible values: 0123456789 0100010011

4 Easy Set Fast way to represent set if 0-9 only possible values: Could apply to letters A-J via mapping char  int 0123456789 0100010011

5 Easy Set Fast way to represent set if 0-9 only possible values: How could we apply same strategy to all English words? AaAbAcAdAeAfAg… 1001000???

6 Hashing Hash function : maps data onto fixed size value

7 Cryptographic Hashing Desirable traits: – Output is fixed size – Easy to compute – Output varies wildly with small input change – One way

8 Hash Table Hash Table : – Use hash function to map values into array indexes – Constant time to find index and check

9 Hash Table Hash Functions Desirable qualities – Return number 0…(tablesize – 1) map values into array indexes – Efficiently computable constant time to find index – Evenly distribute keys over table

10 Hash Table Functions Desirable qualities – Return number 0…(tablesize – 1) – Efficiently computable – Evenly distribute keys over table Don't waste space – Mapping is onto – every index has 1+ keys Minimize collisions

11 Hash Table Functions Split roles – hash function vs mapping to table: – Hash Function: Evenly distribute keys over space (unsigned ints) – Table mapping: Hash function's result % table size = index

12 Optimal Hash Functions If all keys and table size known, can compute optimal hash… – Rarely the case

13 Hash Function - Integral For integral types: – Hash(x) = x – Table size should be prime

14 Hash Function - Integral For integral types: – Hash(x) = x – Table size should be prime Keys often have pattern – if not relatively prime to table size, get paterns: 0123456789 0, 10, 20 2, 12, 22 4, 14, 24 6, 16, 26 8, 18, 28

15 Hash Function - String String approach 1 – add up characters: for (i=0;i<key.length();i++) hashVal += key[i]; Problem 1: What if TableSize is 10,000 and all keys are 8 or less characters long? Problem 2: What if keys often contain the same characters (“abc”, “bca”, etc.)?

16 Hash Function - String String approach 2 – multiply each character by different powers of some number: – "apple" : 'a' * 31 4 'p' * 31 3 'p' * 31 2 'l' * 31 1 'e' * 31 0

17 Hash Function - String String approach 2 – multiply each character by different powers of some number: – "apple" : 'a' * 31 4 + 'p' * 31 3 + 'p' * 31 2 + 'l' * 31 1 + 'e' * 31 0 Efficiently do via bit shifting: for (i=0;i<key.length();i++) hashVal = (hashVal << 6) ^ key[i]; * 64

18 Hash Function - String String approach 2 – multiply each character by different powers of some number: – "apple" : 'a' * 31 4 + 'p' * 31 3 + 'p' * 31 2 + 'l' * 31 1 + 'e' * 31 0 Efficiently do via bit shifting: for (i=0;i<key.length();i++) hashVal = (hashVal << 6) ^ key[i]; Binary XOR

19 Collisions Collision : two keys map to same index: – 12 and 22 0123456789 12 22

20 Probing Linear Probing: value goes in next available slot 0123456789 12

21 Probing Linear Probing: value goes in next available slot 0123456789 1222

22 Probing Linear Probing: value goes in next available slot 0123456789 122232

23 Probing Linear Probing: value goes in next available slot Issue: – No longer constant access 0123456789 122232

24 Load Factor Must be < 1 for linear probing Performance drops rapidly past.5

25 Clustering Say we go to put in 3: Now 2-5 are blocked – Anything 2-6 will fill 6 0123456789 1222323

26 Finding Probing used again to find keys: Find 32 – yep its there 0123456789 1222322

27 Finding Probing used again to find keys: Find 42 – nope – must not be 0123456789 1222322

28 Deletion Say we delete 22: Find 32… 0123456789 12322

29 Deletion Say we delete 22: Find 32… not there! 0123456789 12322

30 Tombstone Special value indicating something was there Search knows to continue Insertion can use that slot – But need to continue search to avoid duplicate 0123456789 12#322


Download ppt "Hashing & Hash Tables. Sets/Dictionaries Set - Our best efforts to date:"

Similar presentations


Ads by Google