Download presentation
Presentation is loading. Please wait.
Published byMalcolm Whitehead Modified over 9 years ago
1
Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI
2
Introduction Related Work Main Focus Problem Formulation and Targets Foundational Methodologies and Algorithms Experimental Setup And Result Application Conclusions Further plans OUTLINE
3
INTRODUCTION The ability to predict key types of changes can be used in a variety of setting. In particular, the content of a page enables better prediction of its change. Pages that are related to the prediction page may also change in similar.
4
Incremental Web Crawling Setting- Recrawling a web page is linked to the probability of its change. User Centric Utility- Utility Weights each page. Several works Use Past change frequency and change recency of a page. Related Work
5
Prediction based on content based features. Type of correlation structure at the website level by using a sample of web pages from a website. Extends above idea by clustering pages based on static and dynamic content features. Related Work
6
1. The task of predicting significant changes rather than any change to a web page. 2. Develop a wide array of dynamic content based features that may be useful for the more general temporal mining case beyond crawling. To predict Dynamic Content Change On The Web, so that one can improves a variety of retrieval and web related components. Focus
7
3. Explore a wide variety of methods to identify related pages including content, web graph distance and temporal content similarity. 4.Derive a novel expert prediction framework that effectively leverages information from related pages without the need for sampling from the current time slice. Focus
8
where o ϵ O at time Types of Web Page Change 1. Whether the page o ϵ O changes significantly. 2. Whether the change in page o ϵ O corresponds to a change from non relevant previous content to relevant current content. 3. Whether there is a new out link from a page o ϵ O. PROBLEM FORMULATION AND TARGETS
9
Information Settings 1. 1D setting 2. 2D setting 3. 3D setting …..Continued
10
Information Observability 1.Partially Observed 2. Fully Observed …..Continued
11
BASELINE ALGORITHM Prediction is based on the probability of the page change significantly. i.e. p(h( o i,t j )=1 | h( o i,t k ) ϵ E where t k < t j and (t j – t k )≤ l). SINGLE EXPERT ALGORITHM Represents the pages with set of features. MULTIPLE EXPERT ALGORITHM Consider both page’s features and features of other pages LEARNING ALGORITHMS
12
EXPERIMENTAL SETUP RESULTS
13
Application to Crawling Maximising Freshness APPLICATION:
14
CONCLUSIONS Tackled the problem of predicting significant content change. Sheds light on how and why content changes on the web and how it can be predicted. the addition of the page content improves prediction when compared to simple frequency-based prediction. Additionally, the addition of information of related pages content improves over the usage of page's content alone.
15
To predict the appropriate analysis in Real time Scenario. FURTHER PLANS
16
REFERENCES E. Adar, J. Teevan, S. Dumais, and J. Elsas. The web changes everything: Understanding the dynamics of web content. In Proc. of WSDM, 2009. J. Cho and H. Garca-Molina. The evolution of the web and implications for an incremental crawler. In Proc. of VLDB, 2000. J. Cho and H. Garca-Molina. Estimating frequency of change. TOIT, 3(3):256{290, 2003.
17
D. Fetterly, M. Manasse, M. Najork, and J. L. Wiener. A large-scale study of the evolution of web pages. In Proc. Of WWW, 2003. Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. JMLR, 4:933{969, 2003. REFERENCES
18
L. Getoor and L. Mihalkova. Exploiting statistical and relational information on the web and in social media. In Proc. of WSDM, 2011.
19
THANK YOU !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.