CS 240: Data Structures Tuesday, July 24 th Searching, Hashing Graphs.

CS 240: Data Structures Tuesday, July 24 th Searching, Hashing Graphs

Assignments Self-Revisions for Lab 5 and Lab 6 are due next Tuesday (the writeup is pivotal in terms of a grade) – July 31 st at 4pm. Self-Revisions for Lab 5 and Lab 6 are due next Tuesday (the writeup is pivotal in terms of a grade) – July 31 st at 4pm. Project 1 revisions need to be submitted by next Thursday, August 2 nd at 4pm. Project 1 revisions need to be submitted by next Thursday, August 2 nd at 4pm. New defenses can be scheduled New defenses can be scheduled

The Test The code for Lab 7 will be released early (before the lab is due). The code for Lab 7 will be released early (before the lab is due). Make sure you understand the questions. Make sure you understand the questions. Your lab 7 grade will be based on your answer to the Lab 7 questions on the test. Your lab 7 grade will be based on your answer to the Lab 7 questions on the test. Know the first three sorts and either quicksort or mergesort. Know the first three sorts and either quicksort or mergesort. You need to understand how to do a radix sort. You need to understand how to do a radix sort. Know two of the lists from the presentation today – one should be what you actually did today. Know two of the lists from the presentation today – one should be what you actually did today.

Puppets.

Searches Linear Search Linear Search Binary Search Binary Search Not too good with linked list… Not too good with linked list… This is why we add more links (trees!) This is why we add more links (trees!) Hashing, an insertion/searching combo Hashing, an insertion/searching combo See Chapter 9.3, Hash Tables (p. 530) See Chapter 9.3, Hash Tables (p. 530)

Hashing What is hashing? What is hashing? Corned beef hash: Corned beef hash: Hash(Corn(Beef)): --------------------- 

Hashing Hash Browns: Hash Browns: Hash( ): ----------- 

Hashing So, what does this have to do with anything? So, what does this have to do with anything? Well…. Maybe we should look at real hash browns… Much better!

Hashing The point is: The point is: We have no idea what is in corned beef hash. We have no idea what is in hash browns. However, Hash(Corn(Beef)): ---------------------  No matter what Beef is. Beef is generic! The same with hash browns too…

Hashing Hashing lets us represent some “data” with some smaller “data”. Hashing lets us represent some “data” with some smaller “data”. The representation is not perfect! Look at the corned beef hash! But it is consistent. That makes it useful!

Hashing Ok, back to seriousness for a moment: Ok, back to seriousness for a moment: Remember the algorithmic complexity of our various searches? Remember the algorithmic complexity of our various searches? Linear Search = Linear Search = Binary Search = Binary Search = Balanced Binary Search = Balanced Binary Search = Why do we care if it is balanced? Why do we care if it is balanced? Because this tree is as bad as a linear search! We’ll leave fixing this for another time.

Hashing Other than making corned beef, there are other, more useful, hashing schemes: Other than making corned beef, there are other, more useful, hashing schemes: Consider this: Instead of putting all the records of computers, Binghamton University decides to keep only paper records of grade due to malicious CS students changing their grades each morning. Now, you need some money. You get this cushy work-study job pulling up folders to answer grade requests. Sound good, right?

Hashing So, if I ask you for the grades for “El Zilcho” (first name “El”, last name “Zilcho”) how do you find them? So, if I ask you for the grades for “El Zilcho” (first name “El”, last name “Zilcho”) how do you find them? Linear search right? We start from Alan Aardvark! You are a born bureaucrat! You start by going to “Z”. But, how did you know to do that (if nobody suggested this, stop lesson, go home and cry)?

Hashing Hashing by first letter is a common hash. Hashing by first letter is a common hash. With a small enough list we can search pretty quickly! //firstletterhash represents h(x) //tohash represents x int firstletterhash(string tohash) { return(int(tohash.at(0))%26); }

Hashing The first letter implementation requires that we have 26 entries. The first letter implementation requires that we have 26 entries. If we only have a few entries we are wasting space! If we only have a few entries we are wasting space! A tradeoff decision must be made! A tradeoff decision must be made! What are the tradeoffs?

Hashing Ok, we are done. Ok, we are done. You know all there is to know about hashing.You know all there is to know about hashing. Cool.Cool. A winner is you.A winner is you. Alright, quick quiz. Let us make a first-letter hash table. Add the following: Apple Alabama Uh oh. Now what?

Hashing We have a collision! We have a collision! One solution is linear probing: Finding stuff isn’t too much harder. What about deleting stuff?

Hashing Some options: Some options: Larger table Larger table Different collision scheme Different collision scheme Better hash function (MD5?) Better hash function (MD5?) Protip: Hash tables should be about 1.5-2 times as large as the number of items to store to keep collisions low.

But… I really like linear probing The point is: “too bad” You have more to learn! There is always more. Look how long I’ve been here…. No, don’t. It makes me feel old.

But… I really like linear probing Linear probing can cluster data! You can probe quadratically: i – 1, i + 1, i – 4, i + 4, i – 9, i + 9, Better… but… How about a secondary hash? These can be really useful! Casting, Mapping, Folding, Shifting

More hashes? Are you sold? Well, some of you may have thought of this: Well, some of you may have thought of this: Isn’t this similar to the example we started with?

Hashes How long should the hash function take? How long should the hash function take? Moreover, why does it matter? Moreover, why does it matter? No matter what the data is (as long as it is the correct type) the hash function needs to be able to evaluate it!

Hashes Some theory: Some theory: If Load Factor = Num Elements in Table / Table Size If Load Factor = Num Elements in Table / Table Size When we don’t use a linked list (we use probing) our load factor should be < 0.5 When we don’t use a linked list (we use probing) our load factor should be < 0.5 But, if we do use a linked list then we want to load factor to be closer to 1. But, if we do use a linked list then we want to load factor to be closer to 1. Why? Open addressing/Closed Hashing: Use Probing Open addressing/Closed Hashing: Use Probing Closed addressing/Open Hashing: Use chaining (linked list) Closed addressing/Open Hashing: Use chaining (linked list)

Uh oh. New topic. New topic. You will miss the hash. You will miss the hash. Maybe not. Maybe not.

Yes, you can sort the chaining hash table. Yes, you can sort the chaining hash table.

Graphs So far, all of our Nodes only point to one other node So far, all of our Nodes only point to one other node This changed today with the linked list presentations: This changed today with the linked list presentations: Next and previous pointer Next and previous pointer Multiple pointers based on pieces of data Multiple pointers based on pieces of data But, they can point to multiple nodes! But, they can point to multiple nodes!

Trees First, we generally don’t count a previous pointer as a pointer. First, we generally don’t count a previous pointer as a pointer. Our linked lists point to 1 other node (not counting special lists) hence a unary list. Our linked lists point to 1 other node (not counting special lists) hence a unary list. However, we can point to two different nodes. A path “next” and “othernext”. However, we can point to two different nodes. A path “next” and “othernext”. For a tree: “left” and “right” For a tree: “left” and “right”

Graphs… We will talk more about trees next week. We will talk more about trees next week. A graph has an unlimited number of pointers that can pointer anwhere. A graph has an unlimited number of pointers that can pointer anwhere.

Representation Now, our Node needs new data: Now, our Node needs new data: A list of Node* instead of just “next” A list of Node* instead of just “next” Some way to select a “next” Some way to select a “next” Graphs will often take distance between Nodes into account (so far, our distances have been irrelevant) Graphs will often take distance between Nodes into account (so far, our distances have been irrelevant) Hence each Node* is associated with a distance Hence each Node* is associated with a distance We can store this as a “pair ” We can store this as a “pair ” Requires #include Requires #include

Linked List A Linked List is a subset of Graph. A Linked List is a subset of Graph. It has nodes with only 1 Node* (list size == 1) It has nodes with only 1 Node* (list size == 1) And the distance between each Node is the same (no value is needed, but we might as well say 0). And the distance between each Node is the same (no value is needed, but we might as well say 0).

CS 240: Data Structures Tuesday, July 24 th Searching, Hashing Graphs.

Similar presentations

Presentation on theme: "CS 240: Data Structures Tuesday, July 24 th Searching, Hashing Graphs."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 240: Data Structures Tuesday, July 24 th Searching, Hashing Graphs.

Similar presentations

Presentation on theme: "CS 240: Data Structures Tuesday, July 24 th Searching, Hashing Graphs."— Presentation transcript:

Similar presentations

About project

Feedback