Lecture # 30 Data Organization and Binary Search
Data Organization
Problem Huge amounts of information How do I find –Information that I know I want –Information related to what I want How do I understand –Particular pieces of information –The whole collection of information
Limitations Screen space Network bandwidth –Bandwidth - how much information can be transmitted per second Human attention
Kinds of things to organize Menu items –MS Word - about 150 menu items Text –Pages in a book –Documents on the WWW - gazillions Images –All of the pictures created in a commercial advertising company
Kinds of things to organize Sounds –Sound tracks to all TV and Radio news broadcasts Video –A complete collection of classic movies Structured information (records) –People –Cars –Students –Electronic appliance parts
A question of scale 10 things 100 things - menu 1,000 things - files on your computer 10,000 things - students at a university 1,000,000 things - books in a library gazillion things - WWW pages
Three ways to find things Lists –arrays Trees –organize in to categories Search –describe what you want and have the computer find it
The Phone Book Challenge How long will it take to find “ Bill Lund ” in the BYU Directory? How long will it take to find “ ” in the BYU Directory?
What Algorithm did you use to search the phone book? Where did you start? How many steps did it take? Is there a more efficient way?
Binary search - for “ Goodrich ”
Lower = 0 Upper = 10 Guess = (0+10)/2 = 5
Binary search - for “ Goodrich ” Lower = 0 Upper = 5 Guess = (0+5)/2 = 2
Binary search - for “ Goodrich ” Lower = 2 Upper = 5 Guess = (2+5)/2 = 3
Binary search - for “ Goodrich ” Lower = 3 Upper = 5 Guess = (3+5)/2 = 4
Binary search If there are 64 things in a list, how many times can you divide that list in half? –32, 16, 8, 4, 2, 1 6 times
Binary search If there are 1024 things in a list, how many times can you divide that list in half? –512, 256, 128, 64, 32, 16, 8, 4, 2, 1 10 times
Binary search If the size of the list doubles, how many more steps are required in a binary search? 1
Binary search If there are N items in a list then binary search takes log 2 (N) steps
Binary search Estimating log 2 (N) –Count the number of digits and multiply by –4*2.5 = 10 steps 1,000,000 –7*2.5 = steps 1,000,000,000 –10*2.5= 25 steps
Provo/Orem phone book How long to find “ Bill Lund? ” ~ 5000 in the BYU Directory –Log 2 (5000) approx 4*2.5 = 10 steps
How to find a phone number –1 step –11 steps Average? –5 steps Average N? –N/2
Provo/Orem phone book How many steps to find a phone number? –5,000/2 = 2,500 average How can we improve this?
Sort the phone book by phone number What if I want to search on both name and number?
Using an Index Last NamePhone number
Using an Index Last NamePhone number Anderson
Using an Index Last NamePhone number Anderson, Bilinski
Using an Index Last NamePhone number Anderson, Bilinski, Clark
Using an Index Last NamePhone number Anderson, Bilinski, Clark, Garcia
Using an Index Last NamePhone number
Using an Index Last NamePhone number ,
Using an Index Last NamePhone number , ,
Using an Index Last NamePhone number , , ,
Search for Goodrich Last Name Lower = 0 Upper = 10 Guess = 5 lower
Search for Goodrich Last Name Lower = 0 Upper = 5 Guess = 2 above
Search for Goodrich Last Name Lower = 2 Upper = 5 Guess = 3 above
Search for Goodrich Last Name Lower = 3 Upper = 5 Guess = 4 above
Search for Lower = 0 Upper = 10 Guess = 5 above Phone number
Search for Lower = 5 Upper = 10 Guess = 7 below Phone number
Search for Lower = 5 Upper = 7 Guess = 6 MATCH Phone number
Using an Index Last NamePhone number What about first name or city? –another index
Data Organization Summary What are we organizing for? Scale –10 - 1, ,000, ,000,000,000 Lists –Unsorted (N/2) –Sorted Log 2 (N) count the digits and multiply by 2.5 To access in many ways –Use many indices into the same data