Algorithms November 27, 2001
Administrivia Homework Assignment 6 –If you forgot to put your name on it, let me know Homework Assignment 7 –Due next Tuesday Lab 6 (Visual Basic Part 2) –This week; due Friday
The big picture We built a computer We built an operating system to control the computer We attached the computer to a network We wrote a compiler to make programming the computer easier We share CPU and disk across the network Need to talk about algorithms
Algorithms Recipes for doing computations The underpinnings of programming –Think out your algorithm –Show that it works –Determine it’s efficiency –Write it as a program
What is an algorithm Algorithm is a recipe Has –Inputs –Rules –Evaluation Criteria –Output
When do we use algorithms? Always! Assignment 5 –Step 1 -- Create a message of between 150 and 200 characters that you wish to transmit. –Step 2 -- Give an encoding of the alphabet –Step 3 -- Use the compression ideas we discussed to compress your message. –Step 4 -- Write your compressed message as a sequence of hexadecimal digits in this encoding. –Step 5 -- Now you are ready to create the message to be hidden. Your message will … – Step 6 -- We now consider a picture that could be displayed on your web page.
Examples of problems Baking cookies Putting things in alphabetical order Being a web search engine
Chocolate chip cookies
Input –flour (2 ¼ c) –baking soda (1t) –salt (1t) –butter (1c) –granulated sugar (3/4 c) –brown sugar(3/4c) –vanilla(1t) –eggs (2) –chocolate chip morsels (2c) –chopped nuts (1c) Output –5 dozen cookies
Chocolate chip cookies Steps in the algorithm –Combine flour, baking soda, and salt in small bowl. –Beat butter, granulated sugar, brown sugar and vanilla in large bowl –Add eggs one at a time Beating after adding each egg –Gradually beat in flour mixture –Stir in morsels and nuts –Drop by rounded tablespoons onto ungreased baking sheets –Bake 9-11 minutes –Let stand for 2 minute
Chocolate chip cookie algorithm Primitives –Inputs Flour, baking soda, salt, butter, brown sugar, granulated sugar, vanilla, egg, morsels, nuts Alternatively, chocolate chip cookie mix Alternatively, wheat, sugar cane, hen, … –Operators Combine, Beat, Gradually beat, Stir, Drop, Bake, Let stand
Chocolate chip cookie algorithm Execution –First 2 steps can be done in parallel? Parbegin (Combine(),Beat()) Parend –Machine dependencies Ovens vary (Bake 9-11 minutes) Ingredients vary and so need to be handled differently
Chocolate chip cookie algorithm Algorithm testing –Proof of the pudding is in the eating –How do we mechanize this?
Chocolate chip cookie algorithm Comparing different algorithms –Quality of input/output map –User time –Machine (oven) time
Putting things in alphabetical order Data set sizes –Course list for COS students –PU directory assistance10,000 people –Manhattan phone book1 million people –Social Security database1 billion records –Long distance call billing records 100 billion/year Different methods for different tasks –Fast for large –Simple for small
A simple method for sorting Find smallest value -- put it first in list Find second smallest value -- put it second … Find next smallest value – put it next … When no more values, you’re done
How it works
How it works Find smallest value -- put it first in list
How it works Find second smallest value -- put it second
How it works Finish the sorting
A simple method for sorting To sort array x = {x[1],x[2], …, x[n]} For I = 1 to n For J = I+1 to n If (x[I] > x[J]) Then swap their values next
Another sorting algorithm Sorting by Merging Key idea It’s easy to merge 2 sorted lists Sort larger lists by –Sort smaller lists –Merge the results How do we sort smaller lists?
Merging 2 sorted lists
Merging 2 sorted lists Start at the top of each list
Merging 2 sorted lists is bigger than 155
Merging 2 sorted lists Record 155 and move the arrow
Merging 2 sorted lists is less than 255
Merging 2 sorted lists Finished when at the end of each list
Sort then merge Subdivide
Sort then merge Subdivide Sort pieces By merging
Sort then merge Subdivide Sort pieces By merging Merge
SortMerge algorithm Function SortMerge(x,1,n) If n = 1 then Return End if Mid = (1+ n)/2 SortMerge(x,1, Mid ) SortMerge(x, Mid +1, n) Merge(x,1, Mid, Mid +1, n) End Function
Does it work? Have to be careful about stopping There are always a lot of things going on Sort(n) Sort(n/2) Merge Sort(n/4) Merge Sort(n/2) Merge Sort(n/8) Merge Sort(n/4) Merge Sort(n/2) Merge
Divide and conquer Use recursion –reduce solving for problem of size n to solving two problems of size n/2 –then combine the solutions S(n) = 2 S(n/2) + M(n/2,n/2) Solving a sorting problem of size n requires solving 2 sorting problems of size n/2 and doing a merge of 2 sets of size n/2
Comparing running times NInsertion (ms) SortMerge(ms) , , ,000,
Comparing running times NInsertion (ms) SortMerge(ms) , , ,000, Reducing 20 hours to 3 seconds
Searching Once a list is in alphabetical order, how do you find things in it? For example, is COS 111 on the list of courses that satisfy the (EC) Epistemology and Cognition requirement?
EC courses PHI 201 PHI 204 PHI 301 PHI 304 PHI 312 PHI 321 PHI 333 PHI 338 PSY 255 PSY 306 PSY 307 PSY 316 AAS 391 ANT 201 COS 302 FRS 135 FRS 137 GER 306 HUM 365 LIN 213 LIN 302 LIN 306 LIN 315 PHI 200
Searching for COS 111 Compare to the middle AAS 391 ANT 201 COS 302 FRS 135 FRS 137 GER 306 HUM 365 LIN 213 LIN 302 LIN 306 LIN 315 PHI 200 PHI 201 PHI 204 PHI 301 PHI 304 PHI 312 PHI 321 PHI 333 PHI 338 PSY 255 PSY 306 PSY 307 PSY 316 COS 111
Searching Compare to the middle If smaller search first half If larger search second half AAS 391 ANT 201 COS 302 FRS 135 FRS 137 GER 306 HUM 365 LIN 213 LIN 302 LIN 306 LIN 315 PHI 200 PHI 201 PHI 204 PHI 301 PHI 304 PHI 312 PHI 321 PHI 333 PHI 338 PSY 255 PSY 306 PSY 307 PSY 316 COS 111
Repeat Compare to the middle If smaller search first half If larger search second half AAS 391 ANT 201 COS 302 FRS 135 FRS 137 GER 306 HUM 365 LIN 213 LIN 302 LIN 306 LIN 315 PHI 200 COS 111
Building indices PHI 201 PHI 204 PHI 301 PHI 304 PHI 312 PHI 321 PHI 333 PHI 338 PSY 255 PSY 306 PSY 307 PSY 316 AAS 391 ANT 201 COS 302 FRS 135 FRS 137 GER 306 HUM 365 LIN 213 LIN 302 LIN 306 LIN 315 PHI 200 AAS ANT COS FRS GER HUM LIN PHI PSY
Search indices then data PHI 201 PHI 204 PHI 301 PHI 304 PHI 312 PHI 321 PHI 333 PHI 338 PSY 255 PSY 306 PSY 307 PSY 316 AAS 391 ANT 201 COS 302 FRS 135 FRS 137 GER 306 HUM 365 LIN 213 LIN 302 LIN 306 LIN 315 PHI 200 AAS ANT COS FRS GER HUM LIN PHI PSY COS 111
How do we describe algorithms? Pseudocode –Combines English, Visual Basic constructs –Works with various types of primitives Could be + - / * Could be more complex things –Describes how data is organized –Describes operations on the data –Is meant to be higher level than programming
Searching with indices (pseudocode) Build the indices –Do this by going through the list and determining where department names change –Store the results in an array called Indices Search the indices –Do a binary search on the array Indices Do this by comparing to the middle element –Then use binary search to compare to the upper half –Or use binary search to compare to the lower half
Building a web search engine Crawl the web Organize the results for fast query processing Process queries
Crawl the web Every month use TCP/IP to go to all reachable web pages –1.5B pages, 10 Kbytes/page, so 15 terabytes Can compress an average page to 3Kbytes Numeracy –Crawl 1.5B pages in 14 days so Crawl 100M pages per day Crawl 4M pages per hour Crawl 1,000 pages per second
Organize the results Put into alphabetical order Build indices Make multiple copies so that searching can proceed in parallel. When you update, you rebuild the indices
Process queries Look up indices Look up words/phrases –Advertiser can buy a word or phrase This search gives you internal addresses of web pages –Look them up to build results page
Searching time Want to answer a query in less than ½ second Use PageRank to get good results
Page Rank The web is a collection of links –A document’s importance is determined by How many pages point to it How important those pages are –This is its PageRank Used for determining –How often to crawl a page –How to order pages presented.
Remaining subtask Matching strings –Is this the word computer? Comparing strings –Did the word computer occur before or after?
How does string matching work? State machines –Move along states as long as you keep matching –Back off when you miss a match
State machine – looking for abcd Read a Read bRead c Read d Other SaSa SbSb ScSc SdSd OK
State machine – looking for abcd Read a Read bRead c Read d Other SaSa SbSb ScSc SdSd OK What happens if input is abccadbacabcd? S a S b S c S d S a S b S a S a S b S a S b S c S d OK
State machine – looking for abcd Read a Read bRead c Read d Other SaSa SbSb ScSc SdSd OK What happens if input is abcabcd? S a S b S c S d S a S a S a S a
State machine – looking for abcd Read a Read bRead c Read d Other SaSa SbSb ScSc SdSd OK Read a
Larger search challenges Allow strings to have don’t cares –Starts with a and ends with e –Has come number of copies of the substring ab Finding strings close to your string –For spelling corection
Algorithms -- summary Methods of modeling processes Understand at a high level Make sure your reasoning is correct Worry about efficiency in situations where that matters Write as pseudocode
What’s next Problems for which there are no algorithms Problems for which all algorithms run slowly Applications of problems where algorithms run slowly