Download presentation
Presentation is loading. Please wait.
Published byWilfred Woods Modified over 9 years ago
1
Qing-Cai Chen; Xiao-Hong Yang; Xiao-Long Wang Machine Learning and Cybernetics (ICMLC), 2011 International Conference on Year: 2011, Page(s): 1878 – 1883 1 Speaker : Chang, Kun-Hsiang
2
Outline Abstract P2P based passive web crawling system Crawler server registering Content updated notification Download updated content by P2P network Website discovering 2
3
Abstract This paper proposes an innovative client/server based web crawling system. main benefits : Capability of timely management web changes for a crawle. The saving of website bandwidth resources. The capability of downloading large files or multimedia content features. The capability of protection intellectual properties while indexing and searching the content. 3
4
The basic principle of a Crawler 4
5
P2P based passive web crawling system 5
6
Responsibilities Assignment for Crawler Server and Crawler Client 6
7
Crawler server registering robots.xml Port IP address. 7
8
Content updated notification 8 a new registered server, it has to wait for several days or weeks to be notified to download all history contents on this website.
9
Download updated content by P2P network 9
10
Website discovering 10
11
END 11
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.