Download presentation
Presentation is loading. Please wait.
Published byJohan Hadiman Modified over 6 years ago
1
An Efficient method to recommend research papers and highly influential authors.
VIRAJITHA KARNATAPU
2
Research Idea Overview:
Rising trends in research and advancement of technology that provides access to a great deal of these research papers like the digital libraries, online journals etc., But, how far they make it easy for an upcoming researcher to find relevant papers based on his topics of interest. One of the most interesting and challenging problems of the recent days.
3
Generalized Solution:
How do you identify a highly qualified paper in a particular topic of interest? One well-known answer is, Citation Networks, where the value of a paper is determined by the number of citations. If a paper is cited by many other research papers, it is likely that the paper is considered to be highly important.
4
VIEW OF CITATION NETWORKS
5
Then what’s new? There’s a lot of research done and still a lot of research has been going on in this field - Analysis of Citation Networks. This is because citation of a research paper depends on various factors such as the time when the paper is published, the type of journal/conference the paper is published in etc.,
6
Traditional Methods: Traditional methods like h-index, g-index and impact factor are used in determining the number of citations. All of them are based on quantity of the citations. Each method mentioned above is different and each has its own advantages and disadvantages.
7
Traditional Methods (Contd.,)
For example, h-index does not consider self-citations which also influences the paper’s rank upto some extent. Moreover, all these metrics does not take into account the quality of papers being cited, the time when the paper is published and the type of conference/journal the paper is published in. These metrics only take into account the number of citations.
8
Solution? Because of their disadvantages, these metrics cannot be used all the time and you need a more generalized algorithm which when applied, gives you a more accurate result. Hence, I use a modified version of page-rank algorithm, to rank the research papers, taking into consideration the time when the paper is published and the type of conference the paper is published in.
9
Why consider time? Time is one of the most important factors, that is missing in the previous research done. Recently published papers are less likely to get cited; so they have less number of citations which influences their rank. So, I consider time as a constraint in my algorithm; which results in a more accurate ranking of the research papers.
10
Why consider conference/journal?
Higher the requirements of the journal/conference, higher is the quality of the paper published. So, based on the ranks of the research papers, we rank the conferences/journals. Taking both the ranks of papers and conferences, we calculate an authoritative score for each author, based on which we rank the authors.
11
Solution (Contd.,) Alongside, finding the ranks of authors using research papers, for the same dataset, we can construct a co-authorship network and using the same algorithm, we can find the ranks of the authors. But, this requires a lot of effort as almost every research paper written today is multi-authored.
12
VIEW OF CO-AUTHORSHIP NETWORKS
13
Solution (contd.,) Co-authorship networks are similar to citation networks in all the aspects, except the edges in a co-authorship network represents the scientific collaboration between two authors. This co-authorship network also helps us to find the collaborations in a research community.
14
Solution - overview Overall the ranking of authors individually based on the research papers and journals they are published in and finding the highly influential authors from the co- authorship network helps us in identifying the most qualified authors in a particular research field.
15
Missing pieces of study in previous research:
All the metrics use citation count as a parameter to determine how important a paper is, but this gives an approximation of how important the paper is rather than the actual picture. So, it’s not only the quantity of citations that matter, but also the quality of citations.
16
Contd., So, the rank of a research paper can be measured by taking into account the importance of the citing paper rather than just taking citation count as a measure. This provided the basis for the use of page-rank algorithm in determining the ranks of research papers.
17
Page-rank algorithm This is one of the most widely used algorithms in search engine optimization. Proposed by Larry page and Sergey Brin, it’s used in the world’s most powerful search engine - Google. This not only counts the number of inlinks, but also considers the number of outlinks to determine the quality and importance of a node.
18
General version of Page-rank algorithm
In general page-rank algorithm is represented as: PR(A) = (1-d) + d (PR(T1)/C(T1) PR(Tn)/C(Tn)) PR(A) is the PageRank of page A, PR(Ti) is the PageRank of pages Ti which link to page A, C(Ti) is the number of outbound links on page Ti and d is a damping factor which can be set between 0 and 1.
19
Contd., The page-rank algorithm does not rank website as a whole, but rank is determined for each page individually. Further, the rank of a page A is recursively determined by the ranks of pages that link to A.
20
Contd., We modify this page-rank algorithm by taking into account the importance of the citing journal in which the paper is published. We still modify this version of page-rank algorithm by taking the date of publishing the papers into account and thus rank the papers and then the authors accordingly.
21
Co-authorship networks:
Now, why does co-authorship come into the picture? Co-authorship not only helps us to find the ranks of highly influential authors but also the scientific collaborations. But, do these scientific collaborations matter? Are these of any help? Ofcourse, yes!
22
Contd., These helps us to determine the collaborations among different scientific communities in a particular field of research. Now, this helps a new researcher to identify how far the research has been done in a particular field and the way research is going on.
23
Contd., The collaboration between different communities provide the opportunity to discover the increasing specialization, combine the different knowledge and skills of various researchers. Also, this helps different communities to share the complex infrastructure by bringing them together, thus reducing the cost of research.
24
Contd., Even in the co-authorship networks, it’s the quality that should matter and not the quantity. One famous example for this is The Erdos Number project, a co- authorship network analysis, which points out the importance of quality - “It is not the number of authors that you publish with matters but rather whom you publish with”.
25
Research Idea - Conclusion
There is a mutually reinforcing relationship between the ranks of papers and authors. But, as of today co-authorship networks received relatively less importance than citation networks. In this project, I would like to determine the ranks of authors from citation networks and a list of highly influential authors from co-authorship networks from various disciplines in Computer Science.
26
Approach: DATASET: Extract the data from DBLP website.
DBLP is a well-known website that tracks journal articles, conference papers and other publications in Computer Science. Algorithm implementation in MATLAB.
27
THANK YOU
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.