Download presentation
Presentation is loading. Please wait.
1
Algorithms (Contd.)
2
How do we describe algorithms? Pseudocode –Combines English, simple code constructs –Works with various types of primitives Could be + - / * Could be more complex operations –Describes how data is organized –Describes operations on the data –Is meant to be higher level than programming
3
Searching with indices (pseudocode) Build the indices –Do this by going through the list and determining where department names change –Store the results in an array called Indices Search the indices –Do a binary search on the array Indices Do this by comparing to the middle element –Then use binary search to compare to the upper half –Or use binary search to compare to the lower half
4
Building a web search engine Crawl/spider the web Organize the results for fast query processing Process queries
5
Crawl the web Every month use networking to go to as many reachable web pages as you can –10B pages, 10 Kbytes/page, so 100 terabytes Can compress an average page to 3Kbytes Numeracy –To crawl 10B pages in 100 days: Crawl 100M pages per day Crawl 4M pages per hour Crawl 1,000 pages per second
6
Organize the results Put into alphabetical order Build indices for faster lookup Make multiple copies so that searching can proceed in parallel. When you update, you rebuild the indices
7
Process search queries Look up indices Look up words/phrases –Advertiser can buy a word or phrase This search gives you internal addresses of web pages –Look them up to build results page Ranking results: content match, popularity, price paid by advertisers, …
8
Ranking by Popularity The web is a collection of links –A document’s importance is determined by How many pages point to it How important those pages are Used for determining –How often to crawl a page –How to order pages presented.
9
Content Relevance Simple string matching –Does the document/string contain the word computer? More complex string matching –Did the word computer occur before or after the word science? –Did it appear within 10 words of the word science?
10
How does string matching work? State machines –Move along states as long as you keep matching –Back off when you miss a match
11
State machine – looking for abcd Read a Read bRead c Read d Other SaSa SbSb ScSc SdSd OK What happens if input is abccadbacabcd? S a S b S c S d S a S b S a S a S b S a S b S c S d OK
12
State machine – looking for abcd Read a Read bRead c Read d Other SaSa SbSb ScSc SdSd OK What happens if input is abcabcd? S a S b S c S d S a S a S a S a
13
State machine – looking for abcd Read a Read bRead c Read d Other SaSa SbSb ScSc SdSd OK Read a
14
Larger search challenges Allow strings to have don’t cares –Starts with a and ends with e –Has come number of copies of the substring ab Finding strings similar to but not the same as your string –For spelling corection
15
Algorithms -- summary Methods for solving problems Understand at a high level Make sure your reasoning is correct Worry about efficiency in situations where that matters Write as pseudocode
16
Distributed Algorithms
17
Distributed computing Key idea –Buying 1000 machines of speed x is significantly cheaper than buying one machine of speed 1000x –No one person has to buy all 1000 machines: A lot of computational, communication and storage resources already in place and can be harvested for bigger things Key challenge –Making the machines work together for effective speedup. Communication between machines is a key challenge. Approaches –Find problems that can be distributed easily
18
Distributed problems Problems that can use decentralized computing –Weather prediction Weather in a location is most affected by weather nearby –Movie generation Individual frames can be generated separately –Google search engine 10,000s PC’s. all of them cheap, many of them identical Can answer over 100,000,000 queries per day in ½ sec or less each –Looking for the origin of the universe Can be localized like weather prediction –File swapping and access (distributed storage) –Looking for extra terrestrial intelligence –Content caching and distribution
19
Distributed computers Scales of distributed computing –Cluster-in-a-roomhundreds of machines All dedicated to the task –PCs on a campusthousands of machines Using spare cycles –SETI clustermillions of machines Screen saver situation
20
Cluster in a Room Machines are dedicated to the network All machines run similar software Problem is divided into pieces –Each piece is assigned to a machine in the cluster Problem pieces should be loosely linked –Computation is faster than communication
21
PCs on a Campus Loosely coupled on a local-area-network PCs do other things some of the time When free cycles are available, they’re used Many more machines, but less of each machine available
22
Workstation Network at Google Front end 100 machines called www.google.com Searching machines Retrieving machines Fit 40-80 machines in a 7’x2’x3’ rack
23
SETI Telescope at Arecibo, PR collects data Data is processed in real time by fast machines But, no one looks for weak signals –Too costly SETI@Home project built to do this
24
SETI@Home Receive data from Arecibo –35 Gbytes per day by snail mail Break into Work Units –.25 Mbyte each, so 140,000 WU’s per day WU takes 20 hours to process Need about 117,000 dedicated machines to process one day
25
SETI@Home Get individual users to download software Machine idle and screen saver runs software –Download WU –Compute –When finished send back result Database at Berkeley reassembles results Progress to date -- Seti@HomeStatsSeti@HomeStats
26
Medical/Biological Applications Peer-to-Peer Medicine Cancer Research …
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.