SAINT Toolkit for Applied Scientometrics Edwin Horlings August 2012
Structure Applied scientometrics The SAINT Toolkit Examples of applications Collaboration Edwin Horlings | 2 / 28 | Patenting in the Netherlands
August 2012 APPLIED SCIENTOMETRICS Edwin Horlings | 3 / 28 | Patenting in the Netherlands
August 2012 Applied scientometrics We are living in the age of Big Data −continuous increase in the amounts of data on science, technology and innovation −Web of Science c. 45 million scientific publications, PATSTAT c. 67 million patent applications, with detailed information on every record −enormous expansion of web data (e.g. twitter, blogs) −we now have the computer power to mine and analyse those data Increasing call for evidence-based policy −support policy and politics with reliable information −applied scientometrics can help understand the effects of policy Edwin Horlings | 4 / 28 | Patenting in the Netherlands
August 2012 Applied scientometrics Edwin Horlings | 5 / 28 | Patenting in the Netherlands Applying advanced quantitative methods to large heterogeneous datasets in order to extract patterns that show the structure and development of science, technology and innovation
August 2012 What sort of patterns do we look for? Networks −co-authors of scientific papers −inventors and assignees of patents −members of the same assocations −researchers working on or talking about the same topics Clustering of similar items −publications about the same topic or in the same specialisation −similar patents by different organisations −clusters in collaboration networks Statistical analysis of patterns (e.g. Social Network Analysis) Edwin Horlings | 6 / 28 | Patenting in the Netherlands
August 2012 SAINT TOOLKIT Edwin Horlings | 7 / 28 | Patenting in the Netherlands
August 2012 SAINT Toolkit the Science Assessment Integrated Network Toolkit Main components at the moment −ISI Parser: convert raw Web of Science data into a relational database −Word Splitter: cuts full text into words, eliminating stop words, and shortening words to their stem using different algorithms −Network Tools: identify clusters in network using one of the best clustering algorithms (Blondel et al. 2008) Edwin Horlings | 8 / 28 | Patenting in the Netherlands
August 2012 SAINT Toolkit Under development −Integrating all tools into a Work Flow Manager −Word splitting using Natural Language Programming to extract terms rather than words −Adding alternative clustering algorithms for network analysis −Improve the Parser for new data sources, such as Scopus and online data −Develop tools for disambiguation of authors and addresses Edwin Horlings | 9 / 28 | Patenting in the Netherlands
August 2012 APPLICATIONS Edwin Horlings | 10 / 28 | Patenting in the Netherlands
August 2012 Author disambiguation Thousands of researchers with identical names (e.g. Y. Zhang): how to tell the difference? Important for evaluation and for research Developed an algorithm with % accuracy Now developing software tool with University of Paris Est (ESIEE) Edwin Horlings | 11 / 28 | Patenting in the Netherlands
August 2012 Portfolio of individual researchers Edwin Horlings & Thomas Gurney | 12 / 28 | Search strategies along the academic lifecycle How do individual researchers develop their scientific portfolio? −different stages in their career −different problem areas, often simultaneous −coherence of their portfolio −author position Developed a scientometric method to visualise and statistically analyse
August 2012 Ronald Plasterk, former Minister of Education, Science and Culture Barend van der Meulen | 13 / 28 | SAINT Toolkit for applied scientometrics
August 2012 Edwin Horlings | 14 / 22 | Science policy for the bio-economy Bio-energy worldwide 8,414 publications primary strength of China secondary strength of China primary strength of Netherlands secondary strength of Netherlands
August 2012 Advantage of having a large facility Does a large-scale facility provide home advantage to local research groups −accumulating reputation −opening up new avenues of research −developing social networks −producing scientific Examine for high-field magnet laboratories, such as in Hefei and in Nijmegen Edwin Horlings & Thomas Gurney | 15 / 28 | Search strategies along the academic lifecycle
August 2012 Collaboration networks in graphene Collaboration between institutes in graphene research worldwide (17,968 publications) Collaboration between institutes in graphene research worldwide (17,968 publications) INSTITUTION (E.G. UNIVERSITY) INSTITUTION (E.G. UNIVERSITY)
August 2012 Collaboration networks in graphene Clusters of institutes that collaborate more with each other than with other institutes Clusters of institutes that collaborate more with each other than with other institutes CLUSTER
August 2012 Collaboration networks in graphene EU NORTH AMERICA NORTH AMERICA SOUTH EAST ASIA SOUTH EAST ASIA Networks of scientific collaboration in graphene are highly regionally clustered
August 2012 Collaboration networks in graphene All Chinese institutions in the network highlighted in black
August 2012 Collaboration in graphene research in China Edwin Horlings & Thomas Gurney | 20 / 28 | Search strategies along the academic lifecycle
August 2012 How Dutch universities work on scientific topics Edwin Horlings & Thomas Gurney | 21 / 28 | Search strategies along the academic lifecycle Celiac Disease Consortium TOPIC UNIVERSITY
August 2012 How Dutch universities work on scientific topics Denser network More institutions involved More coherent: more universities work on the same small set of topics Edwin Horlings & Thomas Gurney | 22 / 28 | Search strategies along the academic lifecycle Celiac Disease Consortium