Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 172 DATA STRUCTURES.

Similar presentations


Presentation on theme: "CSC 172 DATA STRUCTURES."— Presentation transcript:

1 CSC 172 DATA STRUCTURES

2 Representation of Sets
List Simple O(n) dictionary operations Binary Search Trees O(log n) average time Range queries, sorting Characteristic Vector O(1) dictionary ops, but limited to small sets Hash Table O(1) average for dictionary ops Tricky to expand, no range queries

3 Characteristic Vectors
Boolean Strings whose position corresponds to the members of some fixed “universal” set A “1” in a location means that the element is in the set A “0” means that it is not

4 MUSIC THEORY A chord is a set of notes played at the same time.
Represented by a 12 bit vector called a “pitch class” {B,A#,A,G#,G,F#,F,E,D#,D,C#,C} represents C major represents C minor Rotation is “transposition” Bit reversal is “inversion”

5 UNIX file privileges {user, group, others} x {read, write, execute}
9 possible privileges Type “ls –l” on UNIX total 142 -rw-rw-r pawlicki none Jun PKG416.desc -rw-rw-r pawlicki none Jun PKG416.pdf -rw-rw-r pawlicki none Jun let.1 -rw-rw-r pawlicki none Apr 2 13:03 out -rw-rw-r pawlicki none Jun stapp.uu

6 UNIX files The order is rwx for each of user (owner), group, and others So, a protection mode of means that the owner may read and write (but not execute), the group can read only and others cannot even read

7 GAMBLING A deck has 52 cards {2C,2H,2S,2D,3C, .... KD,AC,AH,AS,AD}
Represent a “hand” as a vector of 52 bits is a pair of aces In “Texas Hold'em” everyone gets two “hole” cards and 5 “board” cards We can use bitwise & to find “hands”

8 CV advantages If the universal set is small, sets can be represented by bits packed 32 to a word Insert, delete, and lookup are O(1) on the proper bit Union, intersection, difference are implemented on a word-by-word basis O(m) where m is the size of the set Small constant factor (1/32) Fast, machine operations

9 Hashing A cool way to get from an element x to the place where x can be found An array [0..B-1] of buckets Bucket contains a list of set elements B = number of buckets A hash function that takes potential set elements and quickly produces a “random” integer [0..B- 1]

10 Example If the set elements are integers then the simplest/best hash function is usually h(x) = x % B or h(x) = x - (x%B), (never 0). Suppose B = 6 and we wish to store the integers {70, 53, 99, 94, 83, 76, 64, 30} They belong in the buckets 4, 5, 3, 4, 5, 4, 4, and 0 Note: If B = ,4,1,3,6,6,1,2

11 Pitfalls of Hash Function Selection
We want to get a uniform distribution of elements into buckets Beware of data patterns that cause non-uniform distribution

12 Example If integers were all even, then B = 6 would cause only buckets 0,2, and 4 to fill If we hashed words in the UNIX dictionary into 10 buckets by length of word then 20% go into bucket 7

13 Dictionary Operations
Lookup Go to head of bucket h(x) Search for bucket list. If x is in the bucket Insertion: append if not found Delete – list deletion from bucket list

14 Analysis If we pick B to be new N, the number of elements in the set, then the average list is O(1) long Thus, dictionary ops take O(1) time Worst case: all elements go into one bucket O(n)

15 Managing Hash Table Size
If n gets as high as 2B, create a new hash table with 2B buckets “Rehash” every element into the new table O(n) time total There were at least n inserts since the last “rehash” All these inserts took time O(n) Thus, we “amortize” the cost of rehashing over the inserts since the last rehash Constant factor, at worst So, even with rehashing we get O(1) time ops

16 Collisions A collision occurs when two values in the set hash to the same value There are several ways to deal with this Chaining (using a linked list or some secondary structure) Open Addressing Double hashing Linear Probing

17 Chaining Very efficient Time Wise 4 5 6 3 2 1 70 99 64 30 94 Other approaches Use less space 53 83 76

18 Open Addressing When a collision occurs,
if the table is not full find an available space Linear Probing Quadratic Probing Double Hashing

19 Linear Probing If the current location is occupied, try the next table location LinearProbingInsert(K) { if (table is full) error; probe = h(K); while (table[probe] is occupied) probe = ++probe % M; table[probe] = K; } Walk along table until an empty spot is found Uses less memory than chaining (no links) Takes more time than chaining (long walks) Deleting is a pain (mark a slot as having been deleted)

20 Linear Probing 18 12 11 10 9 8 7 6 5 4 3 2 1 h(K) = K % 13
Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : , 18 12 11 10 9 8 7 6 5 4 3 2 1

21 Linear Probing 18 41 12 11 10 9 8 7 6 5 4 3 2 1 h(K) = K % 13
Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : , 2, 18 41 12 11 10 9 8 7 6 5 4 3 2 1

22 Linear Probing 22 18 41 12 11 10 9 8 7 6 5 4 3 2 1 h(K) = K % 13
Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : , 2, 9, 22 18 41 12 11 10 9 8 7 6 5 4 3 2 1

23 Linear Probing 22 59 18 41 12 11 10 9 8 7 6 5 4 3 2 1 h(K) = K % 13
Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : , 2, 9, 7, 22 59 18 41 12 11 10 9 8 7 6 5 4 3 2 1

24 Linear Probing 22 59 32 18 41 12 11 10 9 8 7 6 5 4 3 2 1 h(K) = K % 13
Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : , 2, 9, 7, 6, 22 59 32 18 41 12 11 10 9 8 7 6 5 4 3 2 1

25 Linear Probing 22 59 32 18 41 12 11 10 9 8 7 6 5 4 3 2 1 h(K) = K % 13
Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : , 2, 9, 7, 6, 5, 22 59 32 18 41 12 11 10 9 8 7 6 5 4 3 2 1

26 Linear Probing 22 59 32 18 41 12 11 10 9 8 7 6 5 4 3 2 1 h(K) = K % 13
Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : , 2, 9, 7, 6, 5, 22 59 32 18 41 12 11 10 9 8 7 6 5 4 3 2 1

27 Linear Probing 22 59 32 18 41 12 11 10 9 8 7 6 5 4 3 2 1 h(K) = K % 13
Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : , 2, 9, 7, 6, 5, 22 59 32 18 41 12 11 10 9 8 7 6 5 4 3 2 1

28 Linear Probing h(K) = K % 13 Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : , 2, 9, 7, 6, 5, 22 31 59 32 18 41 12 11 10 9 8 7 6 5 4 3 2 1

29 Linear Probing h(K) = K % 13 Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : , 2, 9, 7, 6, 5, 8 22 31 59 32 18 41 12 11 10 9 8 7 6 5 4 3 2 1

30 Linear Probing h(K) = K % 13 Insert: 18, 41, 22, 59, 32, 31, 73 h(K) : , 2, 9, 7, 6, 5, 8 22 31 59 32 18 41 73 12 11 10 9 8 7 6 5 4 3 2 1

31 Double Hashing If the current location is occupied, try another table location Use two hash functions If M is prime, eventually will examine every location DoubleHashInsert(K) { if (table is full) error; probe = h1(K); offset = h2(K); while (table[probe] is occupied) probe = (probe+offset) % M; table[probe] = K; } Many of the same (dis)advantages as linear probing Distributes keys more evenly than linear probing

32 Quadratic Probing Don't step by 1 each time. Add i2 to the h(x) hashed location (mod B of course) for i = 1,2,...

33 Double Hashing 12 11 10 9 8 7 6 5 4 3 2 1 h1(K) = K % 13
Insert: 18, 41, 22, 59, 32, 31, 73 h1(K) : 5, 2, 9, 7, 6, 5, 8 h2(K) : 6, 7, 2, 5, 8, 1, 7 12 11 10 9 8 7 6 5 4 3 2 1

34 Double Hashing h1(K) = K % 13 h1(K) = 8 - K % 8 Insert: 18, 41, 22, 59, 32, 31, 73 h1(K) : 5, 2, 9, 7, 6, 5, 8 h2(K) : 6, 7, 2, 5, 8, 1, 7 22 59 32 18 41 31 12 11 10 9 8 7 6 5 4 3 2 1

35 Double Hashing h1(K) = K % 13 h1(K) = 8 - K % 8 Insert: 18, 41, 22, 59, 32, 31, 73 h1(K) : 5, 2, 9, 7, 6, 5, 8 h2(K) : 6, 7, 2, 5, 8, 1, 7 22 59 32 18 41 73 31 12 11 10 9 8 7 6 5 4 3 2 1

36 Theoretical Results Double Hashing Linear Probing Chaining Found
Not Found 1+ 𝛼 2 1+𝛼 1−𝛼 2 1−𝛼 1 1−𝛼 1 𝛼 ln 1 1−𝛼

37 Double Hashing Linear Probing Expected Probes Chaining 1.0 0.5 1.0


Download ppt "CSC 172 DATA STRUCTURES."

Similar presentations


Ads by Google