Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFO 344 Web Tools And Development

Similar presentations


Presentation on theme: "INFO 344 Web Tools And Development"— Presentation transcript:

1 INFO 344 Web Tools And Development
CK Wang University of Washington Spring 2014

2 Announcements PA3 due TONIGHT, 11PM PST
Please submit by 10:30pm if you do not have any late days!!! No late days for PA4 I didn’t know Monday 5/26 is a holiday… so I’ll go through things faster…

3 Final Hints

4 Message Passing Queue to store yet-to-visit URLs
We also need to send “start” and “stop” messages Start crawler = “start: Disallows => setup filters Sitemap => crawl xml and add urls (in the xml) to queue (read assignment for condisions) Stop crawler But how do we reliably pass “start” and “stop” message? Cannot use URL queue!! Queue is FIFO. 1m URLs in queue, “stop” might take days to happen… Solution = use another storage! (which one? you decide!)

5 Worker Role Similar to quiz code Run() Sleep 500ms Wake up
GetMessage() from Admin Storage Handle Admin message GetMessage() from URL Queue Get page title => store {url,pagetitle} to table storage Get URLs in page Remove disallowed ones Remove already visited ones Store URLs into Queue Loop back to #1

6 Questions?


Download ppt "INFO 344 Web Tools And Development"

Similar presentations


Ads by Google