WIS/COLLNET’2016 Nancy, France

Slides:



Advertisements
Similar presentations
Richard M. Jacobs, OSA, Ph.D.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
The building blocks What’s in it for me? Bibliometrics – an overview Research impact can be measured in many ways: quantitative approaches include publication.
A brief overview What is program evaluation? How is an evaluation conducted? When should it be used? When can it be used? Used with Permission of: John.
Mining for High Complexity Regions Using Entropy and Box Counting Dimension Quad-Trees Rosanne Vetro, Wei Ding, Dan A. Simovici Computer Science Department.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Bibliometrics overview slides. Contents of this slide set Slides 2-5 Various definitions Slide 6 The context, bibliometrics as 1 tools to assess Slides.
Berkeley Parlab 1. INTRODUCTION A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing 2. CORRELATIONS TO THE GROUND.
Using Journal Citation Reports The MyRI Project Team.
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
Journal Impact Factors and H index
Ravid Rodney Or Maltabashi Outlines What is Fractal? History Fractal dimension Box Counting Method Fractal dimension Calculations:
To accompany Quantitative Analysis for Management, 8e by Render/Stair/Hanna 17-1 © 2003 by Prentice Hall, Inc. Upper Saddle River, NJ Chapter 17.
Social Networking Techniques for Ranking Scientific Publications (i.e. Conferences & journals) and Research Scholars.
EERQI Final Conference, Brussels, March 2011 This project is funded by the Socioeconomic Sciences and Humanities Section. Interrelations Of Indicators.
Quantitative Skills: Data Analysis
Impact factorcillin®: hype or hope for treatment of academititis? Acknowledgement Seglen O Per (BMJ 1997; 134:497)
Agronomic Spatial Variability and Resolution What is it? How do we describe it? What does it imply for precision management?
THOMSON SCIENTIFIC Patricia Brennan Thomson Scientific January 10, 2008.
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
Low-Dimensional Chaotic Signal Characterization Using Approximate Entropy Soundararajan Ezekiel Matthew Lang Computer Science Department Indiana University.
Question paper 1997.
Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
PSY 325 AID Education Expert/psy325aid.com FOR MORE CLASSES VISIT
Assessing Hyperthermia and Cancer Research Productivity Shu-Wan Yeh 1 *, Shih-Ting Hung 1, Yuan-Hsin Chang 1, Yee-Shuan Lee 2 and Yuh-Shan Ho 1# 1 School.
THE BIBLIOMETRIC INDICATORS. BIBLIOMETRIC INDICATORS COMPARING ‘LIKE TO LIKE’ Productivity And Impact Productivity And Impact Normalization Top Performance.
Tools for Effective Evaluation of Science InCites David Horky Country Manager – Central and Eastern Europe
Where Should I Publish? Journal Ranking Tools
Data analysis is one of the first steps toward determining whether an observed pattern has validity. Data analysis also helps distinguish among multiple.
Research Indicators for Open Science
Intro to Research Methods
Correlation & Forecasting
Data Analysis.
Overview of probability and statistics
Bibliometrics toolkit: Thomson Reuters products
Machine Learning for the Quantified Self
Johannes Sorz, Bernard Wallner, Horst Seidler and Martin Fieder
07/13/2016 Antonia Gogoglou Department of Informatics
I. Introduction to statistics
Presented by Khawar Shakeel
Research Methods in Psychology PSY 311
Internal Assessment 2016 IB Chemistry Year 2 HL.
Statistics: The Z score and the normal distribution
Dimension Review Many of the geometric structures generated by chaotic map or differential dynamic systems are extremely complex. Fractal : hard to define.
Statistical Data Analysis
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Analyzing and Interpreting Quantitative Data
Bibliometric Analysis of Water Research
Chapter 12 Using Descriptive Analysis, Performing
OSEP Project Directors Meeting
By: Azrul Abdullah Waeibrorheem Waemustafa Hamdan Mat Isa Universiti Teknologi Mara, Perlis Branch, Arau Campus SEFB, Universiti Utara, Malaysia Disclosure.
Unit 4 Introducing the Study.
Advanced Scientometrics Workshop
Numerical Descriptive Measures
Using Friendship Ties and Family Circles for Link Prediction
Managing uncertainty and quality in the classification process
Introduction Previous lessons have demonstrated that the normal distribution provides a useful model for many situations in business and industry, as.
An examination of the purpose and techniques of inequality measurement
Statistical Data Analysis
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
A Story of Functions Module 2: Modeling with Descriptive Statistics
H-indexes & Aging Peter Ingwersen Royal School of LIS, Denmark
Econometric Tests of Copyright Openness
Kostas Kolomvatsos, Christos Anagnostopoulos
Latent Semantic Analysis
CASE STUDY.
Presentation transcript:

WIS/COLLNET’2016 Nancy, France Quantifying an Individual’s Scientific Output Using the Fractal Dimension of the Whole Citation Curve Gogoglou A.[1], Sidiropoulos A.[2], Katsaros D.[3], Manolopoulos Y. [1] [1] Aristotle University of Thessaloniki, Greece [2] Alexander Technological Educational Institute of Thessaloniki, Greece [3] University of Thessaly, Volos, Greece WIS/COLLNET’2016 Nancy, France

Structure Introduction Citation curve Some theory on dimensionality Dataset Experimentation Outcome

Introduction Until 2005 Impact Factor (IF) was used as a main metric for the evaluation of researchers In 2005 h-index was proposed by Hirsch

Overview of Existing Approaches The popular h-index and a family of closely related bibliometric indices focus on different parts of the citation curve Standard measures are the publication count and the citation count A number of approaches have attempted to characte-rize the distribution of citations, but across a network of citations instead of individual citation curves Power laws, Tsalis distributions, Yule law and various other exponential distributions have been examined as possible fits to citation distribution

(Maximum) Citation Curve

(Maximum) Citation Curve properties The more a citation curve differs from the maximum citation curve, the more skewed it becomes Citation curves significantly different from line t and closer to the origin of the axes represent a heavily-tailed and skewed publishing behavior The citation curve is not in reality a continuous curve but a set of discrete points The fractal dimension can better represent it than any metric that attempts to quantify parts of the citation curve and the relationship between them

Contribution: the Fractal Dimension Firstly, given the current state of a scientist (i.e., p, Cmax, Ctot), the fractal dimension expresses how much this particular state differs from the maximum citation curve Second, the distinguishing power of the fractal dimension especially for common values of p, Ctot and h-index makes it an appropriate index for several data mining tasks performed on bibliometric data (extracting top scientists from a group, ranking, clustering scientists in groups, skyline operation etc.)

Dimensions of a Point Set (1) Definition 1: The embedding dimension E of a dataset is the dimension of its address space. In other words, it is the number of attributes of the dataset The dataset can have an embedding dimension lower than the dimension of the space where it is embedded. E.g., a line has an embedding dimension of 1, even if it is represented in a higher dimensional space Definition 2: The intrinsic dimension D of a dataset is the dimension of the object represented by the dataset, regardless of the space where it is embedded

Dimensions of a Point Set (2) Property 1: The fractal dimension of a Euclidean object corresponds to its Euclidean dimension and is always an integer Property 2: The fractal dimension of a dataset cannot be higher than the embedding dimension A point has fractal dimension of 0, whereas a line has a fractal dimension of 1 The citation curve lies between a set of points and a line, as a result its fractal dimension will lie in the range [0,1]

Fractal Dimension: Definition For a set of points, the fractal dimension provides a statistical index of its complexity comparing how detail in a geometrical pattern changes with the scale at which it is measured The boxcount method is used to calculate the fractal dimension: N is the number of boxes of size r that are needed to cover the space around a geometrical object The fractal dimension is represented as the slope of the doubly logarithmic plot of N(r) versus r

Connection to Power Law The calculation of fractal dimension is based on a power law relationship between the number of boxes N and their respective sizes r However, it is not necessary that the entire set of points itself follows a power law Fractal dimension measures how self-similar, dynamic and skewed a geometrical object is The fractal dimension of a point set is rarely an integer as it connects the point set to a higher dimension than the dimensional space where the set is embedded

Dataset Description More than 9,000,000 publications and over 38,000,000 citations collected from MAS 30,000 Computer Scientists during years 1970-2013 with h2013>=8 Awarded scientists: ACM Turing 1980-2015 ACM SIGMOD 1992-2015 ACM SIGCOMM 1992-2015 ACM Fellows 1980-2013

Correlation with Other Indices (1) A set of popular indices were compared in q-q plots with the values of fractal dimension Average citation count, total citation count, number of papers h, g, hw, hI, hnor, v and PI indices The more the points deviate from the 45o line, the less correlated the two samples (indices values)

Correlation with Other Indices (2) Indices that take into account the whole curve (like hI and v index) are more correlated with the fractal dimension than the ones focusing on the h-core

Scientist Ranking (1) Explore the distinguishing power of the fractal dimension for a set of high impact scientists Also investigate whether it can distinguish moderately performing scientists with academic potential

Scientist Ranking (2) Identified the scientists with the highest fractal dimension values in each distinct h-index value for the range [26,50] The set contains awarded scientists (asterisk) as well as acknowledged high impact scientists who have not been awarded yet

Merits of the Fractal Dimension Distinguishes high impact scientists High fractal dimension value for moderate citation counts (and h-index values) indicates academic potential and may assist peer decisions in award or grant allocation, tenure committees, High h-index and high fractal dimension constitutes a pattern for increased academic impact and complies with the criteria of peer assessment Challenge: distinguishing scientists from the most highly populated groups of computer scientists with 15<h<35

Conclusions & Future Work We introduce single number metric to convey the information expressed by the entire citation curve as a geometric object Fractal dimension constitutes complementary metric to other indices to represent in a more complete way a scientists’ portfolio Future challenges include exploring its distinguishing power in different groups, identify the particular qua-lities of scientific impact it focuses on and expand the concept to journals, institutions, publications, etc.

Thank you for your attention!