Download presentation
Presentation is loading. Please wait.
Published byElmer Pearson Modified over 9 years ago
1
SCrawler Group: Priyanshu Gupta WHAT WILL I DO?? I will develop a multi-threaded parallel crawler.I will run them in both cross-over and Exchange mode and study various results. Also, I will keep check of the backlink (in coming link) counts of each web page I crawl.
2
Cross-over mode: Each C- proc(parallel crawler) downloads pages within its partition, but when it runs out of pages in its partition, it also follows inter-partition links. In this mode, downloaded pages may clearly overlap. Exchange mode: In this mode C- proc's periodically and incrementally exchange inter-partition URLs. I will use batch mode communication for transferring links between 2 C-procs.
3
I will use the java language for this project. I will crawl the USC CS and EE department websites using 2 parallel crawlers and I will test the following: Cross over mode results in overlap,so I will observe the overlap which is defined as: (total pages downloaded – total unique pages) / total pages
4
OBJECTIVE 2: Next I will run the crawler in exchange mode which results in communication overhead. I will find the communication overhead when the crawler is run in this mode. Communication overhead is computed as follows: (Total links transferred)/(Total pages downloaded) OBJECTIVE 3: Finally I will find the number of in- bound links to a page and show them in tabular form.
5
OBSERVATION TABLES OBJECTIVE 1 Number of C-ProcsCommunication overhead OBJECTIVE 2 Number of C-Procs Overlap OBJECTIVE 3 URLNo. of In bound Links
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.