Download presentation
Presentation is loading. Please wait.
1
INFO 344 Web Tools And Development
CK Wang University of Washington Spring 2014
2
Announcements PA3 due TONIGHT, 11PM PST
Please submit by 10:30pm if you do not have any late days!!! No late days for PA4 I didn’t know Monday 5/26 is a holiday… so I’ll go through things faster…
3
Final Hints
4
Message Passing Queue to store yet-to-visit URLs
We also need to send “start” and “stop” messages Start crawler = “start: Disallows => setup filters Sitemap => crawl xml and add urls (in the xml) to queue (read assignment for condisions) Stop crawler But how do we reliably pass “start” and “stop” message? Cannot use URL queue!! Queue is FIFO. 1m URLs in queue, “stop” might take days to happen… Solution = use another storage! (which one? you decide!)
5
Worker Role Similar to quiz code Run() Sleep 500ms Wake up
GetMessage() from Admin Storage Handle Admin message GetMessage() from URL Queue Get page title => store {url,pagetitle} to table storage Get URLs in page Remove disallowed ones Remove already visited ones Store URLs into Queue Loop back to #1
6
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.