Page Ranking Algorithms for Digital Libraries Submitted By: Shikha Singla MIT-872-2K11 M.Tech(3 rd Sem) Information Technology
Need for Ranking Algorithms Todays the main challenge of a search engine is to present relevant results to the user. To represent the documents in an ordered manner, Page ranking methods are applied which can arrange the documents in order of their relevance and importance.
RANKING ALGORITHMS
Similarity of Documents with user Profile The similarity between the document d and the document d’ in the user profile is computed using three methods: a content-based method and two citation based methods. similarity(d, p) = summation(d’ ∈ p)[similarity(d, d’)] where as d= resulted document p=sum of user profile similarity with d d’= user profile’s document
Two citation methods-: Bibliographic Coupling Co-Citation Bibliographic Coupling- Similarity b/w two documents is computed based on the number of their co-references. more no. of same references= more similar documents. Co-Citation- The relatedness between two papers is based on their co-citation frequency. The co-citation frequency is the number of times that two papers are co-cited. To get this information, we have to extract the citation graph from the actual library.
Citation Count Algorithm If a paper has more number of citations to it then paper become important. CCi= |Ii| Where as Cci= citation count of publication i, Ii= number of citations of the paper i. Thus, a paper obtains a high rank if the number of its backlinks is high.
EXAMPLE
Time Dependent Citation Count Algorithms The freshness of citations and link structure are the factors that used to compute the importance of a paper. Weight= exponential of[-w(Tp – T)] Where as Tp= present time T= publication year of paper w= time decay factor if Tp- T is less than w then w= 0, otherwise w= 1.
Example PaperPublication year A2011 B2008 C1998 D1980 E2007 F2000
Page Ranking Algorithm This states that if a link comes from an important paper than this link is given higher weightage than those which are coming from non-important paper. PR(u)= (1-d)+ d[sumation of(PRv)/Nv] Where as PR= page rank d= normalization factor N= total no. outlinks u= resulted paper v= set of papers that points to u
Results Citation count algorithm- CC (C) > CC (D) and CC (F) > CC (E) > CC (A) and CC (B) Time Dependent Citation Count Algorithms- TDCC(C)>TDCC (F) > TDCC (D) > TDCC (E) > TDCC (A) and TDCC (B) Page Ranking Algorithm- PR (D) > PR (C) >PR (F) > PR (E) >PR (A) and PR (B)
Conclusion It is becoming difficult to manage the scientific information on the Web and satisfies the user needs. Thus these ranking algorithms play an important role in ranking the papers in digital libraries so that the user could retrieve the information which is most relevant to the user's query. depending upon the technique used, the ranking algorithms present a different order of resultant papers.
References A Comparison of Re-ranking Methods in Digital Libraries using User Profiles by Thanh-Trung Van and Michel Beigbeder. A Comparative Study of Page Ranking Algorithms for Online Digital Libraries by Sumita Gupta, Neelam Duhan, Poonam Bansal.
THANK YOU
QUERIES?