Presentation is loading. Please wait.

Presentation is loading. Please wait.

“The Rise and Rise of Citation Analysis”

Similar presentations


Presentation on theme: "“The Rise and Rise of Citation Analysis”"— Presentation transcript:

1 “The Rise and Rise of Citation Analysis”
Tanmoy Chakraborty CNeRG, IIT-Kgp, India

2 Twofold Research Interests
Analyzing communities/clusters in complex networks Studying different aspects of citation network

3 “The Rise and Rise of Citation Analysis”
In collaboration with Suhansanu Kumar, Pawan Gowel, Animesh Mukherjee, Niloy Ganguly

4 Mixed Sentiment Sense and non-sense about citation analysis (*6860)
--- T. Opthof, Cardiovascular research, 97 The rise and rise of citation analysis (*1399) --- Lokman I. Meho, Phy. Res., 07 Does citation pay? (*887) --- Fowler & Aksnes, Scientometrics, 07 Think beyond citation analysis (*1009) --- Sarli et al., NIPS, 10

5 Sooner or later, you will definitely be subjected to such an analysis
Raw Citations Count To assess Quality of a paper Prominence of a researcher Success of a collaboration/group Quality of a conference/journal Quality of an Institute Impact of a research area Sooner or later, you will definitely be subjected to such an analysis Only Citation Count

6 Bibliometrics: Raw Citation Count
Journal Impact factor Immediacy factor Eigen factor Altmetric 5 years Impact factor Common assumption Citation

7 Publication Universe Crawled entire Microsoft Academic Search
Papers only in Computer Science domain Basic preprocessing Basic Statistics of papers from Values Number of valid entries 3,473,171 Number of authors 1,186,412 Number of unique venues 6,143 Avg. number of papers per author 5.18 Avg. number of authors per paper 2.49

8 Available Metadata for each paper
Publication Universe Available Metadata for each paper Title Unique ID Named entity disambiguated authors’ name Year of publication Named entity disambiguated publication venue Related research field(s) References Keywords Abstract Citation context

9 Citation Profile An exhaustive analysis of the citation profiles
Papers having at least 10 yrs history and consider at most 20yrs history Scale the entries of the citation profile between 0-1 Use peak-detection heuristics Each peak should be at least 75% of the max peak Two consecutive peak should be separated at least 3 yrs

10 Five Universal Citation Profiles
Avg. behavior Q1 and Q3 represent the first and third quartiles of the data points respectively. Another category: ‘Oth’ => having less than one citation (on avg) per year

11 Five Universal Citation Profiles A deeper look

12 Immediate Questions Is the Journal Impact factor (JIF) formula correct? JIF at year 2000 : Eugene Garfield (1975) # of citations received in 2000 by papers published in that journal in 1998 and 1999 divided by # of papers published in the journal in 1998 and 1999

13 What does JIF really imply?
Immediate Questions What does JIF really imply? Importance of the recent papers in current time period? Relevance of the journal itself in current time period Why not all the citations at current time Why last 2 years?

14 Immediate Questions JIF overlooks the importance of Peak_Late and MonIncr Over-consider Over-consider Under-consider

15 More on the Categories Are they biased on the year of publication? (Aging factor) Ans: No Same age Mean year Year deviation Peak_Int 1994 5.19 Peak_Mul 1992 6.68 Peak_Late 6.54 MonDec 5.43 MonIncr 1993 5.36

16 More on the Categories 2. Are they biased on Journals/conferences?
Peak_Int Peak_Mul Peak_Late MonDec MonIncr % of conferences paper 65 39.03 39.89 60.73 25.26 % of Journal paper 35 60.97 60.11 39.27 74.74

17 More on the Categories 3. Are they affected by self-citation?
Transition matrix showing the transition of categories after removing self citations Least affected by self citations Peak_Int Peak_Mul Peak_Late MonDec MonIncr Oth 0.72 0.10 0.03 0.01 0.15 0.02 0.81 0.04 0.11 0.06 0.86 0.05 0.14 0.41 0.35 0.88 0.09 Most affected by self citation

18 More on the Categories 4. What about Peak_Mul? Avg Peak Height
Peak_Int 5.1 Time 4.2 Peak_Mul = 5.6 Avg Peak Height ( ) = 6.8 ~ ( ) 3.1 2.5 5.3 12.1 Peak_Late 5.3 10.8 Years after publication Peak_Mul Might be Intermediary between Peak_Int and Peak_Late

19 Where does this classification help?
To improve Bibliometrics in scientific research Various prediction systems Future citation prediction system Predicting emerging field/topic Predating future star/seminal papers Paper search and Recommender systems

20 at the Time of Publication
On predicting Future Citation Count at the Time of Publication

21 Problem Definition

22 Traditional Framework
Yan et al., JCDL, 2012

23 Problems in Traditional Framework
Consider initial few years’ statistics after publication Proved to be very effective Lack of time dimension in prediction Suffers a lot from outlier points during regression

24 Problems in Traditional Framework: how to tackle
Consider initial few years’ statistics after publication Proved to be very effective Lack of time dimension in prediction Suffers a lot from outlier points during regression Try to predict citations as early as possible (may be at the time of publication ) Should consider the time dimension Reduce outlier points as much as possible

25 Our Framework: 2-stage Model

26 Features Author-centric (Max/Avg) Productivity H-index Versatility
Sociality Venue-centric Prestige Impact Factor Paper-centric Team-size Reference count Reference diversity Keyword diversity Topic diversity

27 Performance Evaluation
Coefficient of determination (R2) The more, the better Mean squared error (θ) The less, the better Pearson correlation coefficient (ρ) The more, the better

28 Performance of SVM Confusion Matrix

29 Performance of Regression Model

30 Feature Analysis

31 Conclusion Five universal citation profiles
Different analysis on these categories Can help to reframe existing bibliometrics Can be a generic way in machine learning Can enhance the performance of the existing systems

32 Thank You


Download ppt "“The Rise and Rise of Citation Analysis”"

Similar presentations


Ads by Google