1 Quality criteria for data aggregation used in academic rankings IREG FORUM on University rankings Methodologies under scrutiny 16-17 May 2013, Warsaw,

Slides:



Advertisements
Similar presentations
Higher education and labour markets: looking for solutions to contrasting needs Marino Regini Department of Labour and Welfare Studies University of Milano.
Advertisements

A small taste of inferential statistics
Poster & Project Presentations The Robert Gordon University
National university rankings and the evolution of global rankings Kazimierz Bilanow Managing Director, Perspektywy Education Foundation, Poland How quality.
IREG-4, Astana, 16 June Rickety Numbers Volatility of international rankings of higher education and implications for policy making ANDREA Saltelli.
See ( OECD-JRC handbook on CI The ‘pros’: Can summarise complex or multi-dimensional issues in view of supporting decision-makers.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
Research Methods for Counselors COUN 597 University of Saint Joseph Class # 8 Copyright © 2015 by R. Halstead. All rights reserved.
Testing the validity of indicators in the field of education The experience of CRELL Rome, October 3-5, 2012, Improving Education through Accountability.
Effect Size and Meta-Analysis
CONCEPTUAL ISSUES IN CONSTRUCTING COMPOSITE INDICES Nadia Farrugia Department of Economics, University of Malta Paper prepared for the INTERNATIONAL CONFERENCE.
CONFERENCE ON SMALL STATES AND RESILIENCE BUILDING Malta, APRIL 2007 " Weighting Procedures for Composite Indicators " Giuseppe Munda European Commission,
1 Constructing Composite Indicators: From Theory to Practice ECFIN, November 11-12, 2010 Andrea Saltelli Ranking and rating: Woodo or Science? Andrea Saltelli,
Institute for Transport Studies FACULTY OF EARTH AND ENVIRONMENT VERTICAL SEPARATION OF RAILWAY INFRASTRUCTURE - DOES IT ALWAYS MAKE SENSE? Jeremy Drew.
Ranking - New Developments in Europe Gero Federkeil CHE – Centre for Higher Education Development The 3rd International Symposium on University Rankings.
1 Academic Rankings of Universities in the OIC Countries April 2007 April 2007.
Michaela Saisana Second Conference on Measuring Human Progress New York, 4-5 March “Reflections on the Human Development Index” (paper by J. Foster)
Getting Started with Hypothesis Testing The Single Sample.
Chapter 9: Introduction to the t statistic
The world’s first global, multi-dimensional, user-driven university* ranking (* includes all higher education institutions) Jordi Curell Director Higher.
OECD Short-Term Economic Statistics Working PartyJune Impact and timing of revisions for seasonally adjusted series relative to those for the.
OECD Short-Term Economic Statistics Working PartyJune Analysis of revisions for short-term economic statistics Richard McKenzie OECD OECD Short.
Decision analysis and Risk Management course in Kuopio
Higher Education in Europe: Crisis and Opportunity Howard Davies Director London School of Economics Microsoft Government Leaders Forum Lisbon 31 January.
Rating and Ranking: Pros and Cons Dr. Mohsen Elmahdy Said Professor, Mechanical Design and Production Department Faculty of Engineering – Cairo University.
The CHE ranking The multi-dimensional way of Ranking Isabel Roessler CHE – Centre for Higher Education Development International Conference “Academic Cooperation.
Understanding Research Results
Determining Sample Size
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
14 Elements of Nonparametric Statistics
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Understanding Statistics
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Innovation for Growth – i4g Universities are portfolios of (largely heterogeneous) disciplines. Further problems in university rankings Warsaw, 16 May.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
PARAMETRIC STATISTICAL INFERENCE
Introduction In medicine, business, sports, science, and other fields, important decisions are based on statistical information drawn from samples. A sample.
Eng.Mosab I. Tabash Applied Statistics. Eng.Mosab I. Tabash Session 1 : Lesson 1 IntroductiontoStatisticsIntroductiontoStatistics.
Gero Federkeil Expert Seminar „Quality Assurance and Accreditation in Lifelong Learning“, Berlin, February 2011 Rankings and Quality Assurance.
SICENTER Ljubljana, Slovenia TRACKING THE IMPLEMENTATION OF THE MDGs WITH TIME DISTANCE MEASURE Professor Pavle Sicherl SICENTER and University of Ljubljana.
League tables as policy instruments: the political economy of accountability in tertiary education Jamil Salmi and Alenoush Saroyan CIEP, June 2006.
All Hands Meeting 2005 The Family of Reliability Coefficients Gregory G. Brown VASDHS/UCSD.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Access to Medicine Index Problem Statement Long-standing debate about: What is the role of the pharmaceutical industry in access to medicines? Where are.
Academic cooperation and competitiveness. University ranking methodologies TRANSPARENCY TOOLS VS. RANKINGS Prof. univ. dr. Radu Mircea Damian Chair, CDESR.
C M Clarke-Hill1 Analysing Quantitative Data Forming the Hypothesis Inferential Methods - an overview Research Methods.
The Measurement of Nonmarket Sector Outputs and Inputs Using Cost Weights 2008 World Congress on National Accounts and Economic Performance Measures for.
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
Issues concerning the interpretation of statistical significance tests.
Chapter 6: Analyzing and Interpreting Quantitative Data
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
The Analysis of Variance ANOVA
Using State Tests to Measure Student Achievement in Large-Scale Randomized Experiments IES Research Conference June 28 th, 2010 Marie-Andrée Somers (Presenter)
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
User perspectives on rankings/ U-Multirank Karina Ufert European Students’ Union, Chairperson Rankings and the Visibility of Quality Outcomes in the EHEA.
League tables as policy instruments: the political economy of accountability in tertiary education Jamil Salmi and Alenoush Saroyan 2 nd IREG Meeting Berlin,
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Rosie Drinkwater & Professor Lawrence Young Group Finance Director, Pro Vice-Chancellor (Academic Planning & Resources) League Tables Where are we, why.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 3 Investigating the Relationship of Scores.
Academic Writing Fatima AlShaikh. A duty that you are assigned to perform or a task that is assigned or undertaken. For example: Research papers (most.
Statistical analysis.
Johannes Sorz, Bernard Wallner, Horst Seidler and Martin Fieder
Statistical analysis.
L. Isella, A. Karvounaraki (JRC) D. Karlis (AUEB)
Fourth Lisbon Research Workshop on Economics, Statistics and Econometrics of Education January 26th to 26th 2017 Performance indicators and rankings in.
LAUNCHING THE 2019 REGIONAL COMPETITIVENESS INDEX RCI 2019
Presentation transcript:

1 Quality criteria for data aggregation used in academic rankings IREG FORUM on University rankings Methodologies under scrutiny May 2013, Warsaw, Poland Michaela Saisana European Commission, Joint Research Centre, Econometrics and Applied Statistics Unit

2 Outline  Global rankings at the forefront of the policy debate  Overview of two global university rankings (ARWU, THES)  Statistical Coherence Tests  Uncertainty analysis  Policy Implications  Conclusions

3 Outline  Global rankings at the forefront of the policy debate  Overview of two global university rankings (ARWU, THES)  Statistical Coherence Tests  Uncertainty analysis  Policy Implications  Conclusions

4 Definition of the university is broad: A university – as the name suggests – tends to encompass a broad range of purposes and dimensions, focus and missions  difficult to condense into a compact measure Still, for reasons of governance, accountability and transparency, there is an increasing interest among policymakers as well as among practitioners in measuring and benchmarking "excellence" across universities. The growing mobility of students and researchers has also created a market for these measures among the prospective students and their families. Global rankings at the forefront of the policy debate

5 Global rankings have raised debates and policy responses (EU, national level):  to improve the positioning of a country within the existing measures,  to create new measures,  to discuss regional performance (e.g. show that USA is well ahead of Europe in terms of cutting-edge university research) Global rankings at the forefront of the policy debate

6 10-fold increase in the last 10 years Guess how many contain the word “THES ranking” or “ARWU ranking”? 20% Global rankings at the forefront of the policy debate

7 1.Academic Ranking of World Universities (ARWU) (Shanghai Jiao Tong University), Webometrics (Spanish National Research Council), World University Ranking (Times Higher Education/Quacquarelli Symonds), 2004–09 4.Performance Ranking of Scientific Papers for Research Universities (HEEACT), Leiden Ranking (Centre for Science & Technology Studies, University of Leiden), World's Best Colleges and Universities (US News and World Report), SCImago Institutional Rankings, Global University Rankings (RatER) (Rating of Educational Resources, Russia), Top University Rankings (Quacquarelli Symonds), World University Ranking (Times Higher Education/Thomson Reuters—THE-TR), U-Multirank (European Commission), 2011 Global rankings at the forefront of the policy debate Over 60 countries have introduced national rankings, and there are numerous regional, specialist and professional rankings.

8 University rankings are used to judge about the performance of university systems … whether intended or not on by their proponents Global rankings at the forefront of the policy debate

9 France:  Creation of 10 centres of HE excellence  The minister of Education set a target to put at least 10 French universities among the top 100 in ARWU by 2012  President has put French standing in these international ranking at the forefront of the policy debate (Le Monde, 2008). Italy (0 Uni in the top 100 of the ARWU ranking  seen as failure of the national educational system). Spain ( 1 Uni in the top 200 of the ARWU  hailed as a great national achievement) Global rankings at the forefront of the policy debate

10 An OECD study shows that worldwide university leaders are concerned about ranking systems with consequences on the strategic and operational decisions they take to improve their research performance. (Hazelkorn, 2007) There over 16,000 HEIs, yet some of the global rankings merely capture the top 100 universities – less than 1%. (Hazelkorn, 2013) Global rankings at the forefront of the policy debate

11 An extreme impact of Global Rankings What THES created a major controversy in Malaysia: country’s top two universities slipping by almost 100 places compared to Why - change in the ranking methodology (not well known fact and of limited comfort) Impact - Royal commission of inquiry to investigate the matter. A few weeks later, the Vice-Chancellor of the University of Malaysia stepped down. Global rankings at the forefront of the policy debate

12  Global rankings at the forefront of the policy debate  Overview of two global university rankings (ARWU, THES)  Statistical Coherence Tests  Uncertainty analysis  Policy Implications  Conclusions

13 PROS and CONS 6 « objective » indicators Focus on research performance, overlooks other U. missions. Biased towards hard-science institutions Favours large institutions METHODOLOGY 6 indicators Best performing institution =100; score of other institutions calculated as a percentage Weighting scheme chosen by rankers Linear aggregation of the 6 indicators Overview – 2007 ARWU ranking

14 PROS and CONS Attempt to take into account teaching quality Two expert-based indicators: 50% of total (Subjective indicators, lack of transparency) yearly changes in methodology Measures research quantity METHODOLOGY 6 indicators z-score calculated for each indicator; best performing institution =100; other institutions are calculated as a percentage Weighting scheme: chosen by rankers Linear aggregation of the 6 indicators Overview – 2007 THES ranking

15 1 – Same top10: Harvard, Cambridge, Princeton, Cal- tech, MIT and Columbia 2 - Greater variations in the middle to lower end of the rankings 3 - Europe is lagging behind: both ARWU (else SJTU) and THES rankings Overview- Comparison (2007) 4 – THES favours UK universities: all UK universities below the line (in red)

16 University rankings- yearly published + Very appealing for capturing a university’s multiple missions in a single number + Allow one to situate a given university in the worldwide context - Can lead to misleading and/or simplistic policy conclusions

17 Question: Can we say something about the quality of the university rankings and the reliability of the results?

18  Global rankings at the forefront of the policy debate  Overview of two global university rankings (ARWU, THES)  Statistical Coherence Tests  Uncertainty analysis  Policy Implications  Conclusions

19 The Stiglitz report (p.65): […] a general criticism that is frequently addressed at composite indicators, i.e. the arbitrary character of the procedures used to weight their various components. […] The problem is not that these weighting procedures are hidden, non- transparent or non-replicable – they are often very explicitly presented by the authors of the indices, and this is one of the strengths of this literature. The problem is rather that their normative implications are seldom made explicit or justified. Statistical coherence

20 Question: Can we say something about the quality of the university rankings and the reliability of the results?

21 Y = 0.5 x x 2 Statistical coherence - Dean’s example X 1 : hours of teachingX 2 : # of publications Estimated R 1 2 = , R 2 2 = 0.826, corr(x 1, x 2 ) =−0.151, V(x 1 ) = 116, V(x 2 ) = 614, V(y) = 162

22 To obviate this, the dean substitutes the model A professor comes by, looks at the last formula, and complains that publishing is disregarded in the department … X 1 : hours of teaching X 2 : number of publications Statistical coherence - Dean’s example Y = 0.5 x x 2 Y = 0.7 x x 2 with

23 Using these points we can compute a statistic that tells us: Example: Si =0.88  we could reduce the variation of the ARWU scores by 88% by fixing ‘Papers in Nature & Science’. Si: ruler for ‘importance’ Statistical coherence ARWU score

24 Statistical coherence First order sensitivity index Pearson’s correlation ratio Smoothed curve Unconditional variance Our suggestion: to assess the quality of a composite indicator using – instead of R i 2 (Pearson product moment correlation coefficient of the regression of y on x i ) its non-parametric equivalent

25 Features: it offers a precise definition of importance, that is ‘the expected reduction in variance of the CI that would be obtained if a variable could be fixed’; it can be used regardless of the degree of correlation between variables; it is model-free, in that it can be applied also in non-linear aggregations; it is not invasive, in that no changes are made to the CI or to the correlation structure of the indicators (unlike what we will see next on uncertainty analysis). Statistical coherence Pearson’s correlation ratio ‐ First order effect ‐ Top marginal variance - Main effect … Source: Paruolo, Saisana, Saltelli, 2013, J.Royal Stat. Society A

26 One can hence compare the importance of an indicator as given by the nominal weight (assigned by developers) with the importance as measured by the first order effect (Si) to test the index for coherence. Statistical coherence

27 Statistical coherence - ARWU Si’s are more similar to each other than the nominal weights, i.e. ranging between 0.14 and 0.19 (normalized Si’s to unit sum; CV estimates) when weights should either be 0.10 or Source: Paruolo, Saisana, Saltelli, 2013, J.Royal Stat. Society A

28 Statistical coherence - THES The combined importance of peer-review variables (recruiters and academia) appears larger than stipulated by developers, indirectly supporting the hypothesis of linguistic bias at times addressed to THES. The teacher/student ratio, a key variable aimed at capturing the teaching dimension, is much less important than it should be (normalized Si is 0.09, nominal weight is 0.20). Source: Paruolo, Saisana, Saltelli, 2013, J.Royal Stat. Society A

29  Global rankings at the forefront of the policy debate  Overview of two global university rankings (ARWU, THES)  Statistical Coherence Tests  Uncertainty analysis  Policy Implications  Conclusions

30 Notwithstanding recent attempts to establish good practice in composite indicator construction (OECD, 2008), “there is no recipe for building composite indicators that is at the same time universally applicable and sufficiently detailed” (Cherchye et al., 2007). Booysen (2002, p.131) summarises the debate on composite indicators by noting that “not one single element of the methodology of composite indexing is above criticism”. Andrews et al. (2004)] argue that “many indices rarely have adequate scientific foundations to support precise rankings: […] typical practice is to acknowledge uncertainty in the text of the report and then to present a table with unambiguous rankings” Uncertainty analysis - Why?

31 Space of alternatives Including/ excluding variables Normalisation Missing data Weights Aggregation Country Model averaging: whenever a choice in the composite setting-up may not be strongly supported or if you may not trust one single model, we’ll recommend you to use more models Country 2Country 3 Uncertainty analysis - How?

32 How to shake coupled stairs How coupled stairs are shaken in most of available literature Uncertainty analysis - How?

33 Objective of UA:  NOT to verify whether the two global university rankings are legitimate models to measure university performance  To test whether the rankings and/or their associated inferences are robust or volatile with respect to changes in the methodological assumptions within a plausible and legitimate range. Uncertainty analysis – ARWU & THES Question: Can we say something about the quality of the university rankings and the reliability of the results? Source: Saisana, D’Hombres, Saltelli, 2011, Research Policy 40, 165–177

34 Activate simultaneously different sources of uncertainty that cover a wide spectrum of methodological assumptions Estimate the FREQUENCY of the university ranks obtained in the different simulations imputationweighting normalization Number of indicators Aggregation 70 scenarios Uncertainty analysis – ARWU & THES

35  Harvard, Stanford, Berkley, Cambridge, MIT: top 5 in more than 75% of our simulations.  Univ California: original rank 18 th but could be ranked anywhere between the 6 th and 100 th position  Impact of assumptions: much stronger for the middle ranked universities Uncertainty analysis – ARWU

36  Impact of uncertainties on the university ranks is even more apparent.  M.I.T.: ranked 9th, but confirmed only in 13% of simulations (plausible range [4, 35])  Very high volatility also for universities ranked 10 th -20th position, e.g., Duke Univ, John Hopkins Univ, Cornell Univ. Uncertainty analysis – THES

37 Uncertainty analysis – ARWU results

38 Uncertainty analysis – THES results

39 1.HEI provide an array of services and positive externalities to society (universal education, innovation and growth, active citizens, capable entrepreneurs and administrators, etc.) which call for multi-dimensional measures of effectiveness and/or efficiency. 2.A clear statement of the purpose of any such measure is also needed, as measuring scientific excellence is not the same as measuring e.g. employability or innovation potential, or where to study, or how to reform the university system so as to increase the visibility of national universities. Policy implications

40 3.Indicators and league tables are enough to start a discussion on higher education issues BUT not sufficient to conclude it. 4.Assigned university rank largely depends on the methodological assumptions made in compiling the rankings. 9 in 10 universities shift over 10 positions in the 2008 SJTU. 92 positions (Univ Autonoma Madrid) and 277 positions (Univ Zaragoza) in Spain, 71 positions (Univ Milan) and 321 positions (Polytechnic Inst Milan) in Italy, 22 positions (Univ Paris 06) and 386 positions (Univ Nancy 1) in France. Policy implications

41 5.A multi-modeling approach can offer a representative picture of the classification of universities by ranking institutions in a range bracket, as opposed to assigning a specific rank which is not representative of the plurality of opinions on how to assess university performance. 6.The compilation of university rankings should always be accompanied by coherence tests & robustness analysis. Policy implications

42 ‘rankings are here to stay, and it is therefore worth the time and effort to get them right’ (Alan Gilbert, Nature News, 2007) ‘because they define what “world-class” is to the broadest audience, these measures cannot be ignored by anyone interested in measuring the performance of tertiary education institutions’ (Jamil Salmi, 2009) Conclusions

43 ‘rankings are here to stay’ (Sanoff, 1998) ‘ranking systems are clearly here to stay’ (Merisotis, 2002) ‘tables: they may be flawed but they are here to stay’ (Leach, 2004) ‘they are here to stay’ (Hazelcorn, 2007) ‘like them or not, rankings are here to stay’ (Olds, 2010) ‘whether or not colleagues and universities agree with the various ranking systems and league table findings is insignificant, rankings are here to stay’ (UNESCO, 2010) ‘educationalists are well able to find fault with rankings on numerous grounds and may reject them outright. However, given that they are here to stay…’ (Trofallis, 2012) ‘while many institutions had reservations about the methodologies used by the rankings compliers, there was a growing recognition that rankings and classifications were here to stay’ (Osborne, 2013) Conclusions

44 More at: (or simply Google “composite indicators” – 1 st hit)

45 1.Paruolo P., Saisana M., Saltelli A., 2013, Ratings and Rankings: voodoo or science?. J Royal Statistical Society A 176(2). 2.Saisana M., D’Hombres B., Saltelli A., 2011, Rickety Numbers: Volatility of university rankings and policy implications. Research Policy 40, 165– Saisana M., D’Hombres B., 2008, Higher Education Rankings: Robustness Issues and Critical Assessment, EUR 23487, Joint Research Centre, Publications Office of the European Union, Italy. 4.Saisana M., Saltelli A., Tarantola S., 2005, Uncertainty and sensitivity analysis techniques as tools for the analysis and validation of composite indicators. J Royal Statistical Society A 168(2), OECD/JRC, 2008, Handbook on Constructing Composite Indicators. Methodology and user Guide, OECD Publishing, ISBN References and Related Reading