Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAINT: Tools for large scale bibliometric analysis André Somers | 1 September 3, 2009.

Similar presentations


Presentation on theme: "SAINT: Tools for large scale bibliometric analysis André Somers | 1 September 3, 2009."— Presentation transcript:

1 SAINT: Tools for large scale bibliometric analysis André Somers | 1 September 3, 2009

2 Targets Large data sets Fast Flexible: structured database Easy to use Open (Source) Get it from: http:// www.rathenau.nl/tools André Somers | 2 | SAINT: tools for large-scale bibliometric analysis

3 September 3, 2009 Structured database Structured database: different queries possible Standard relational database: SQL for combining data Special tools for things that are impossible, hard or slow in SQL Currently: MS Access only Other backends soon! André Somers | 3 | SAINT: tools for large-scale bibliometric analysis

4 September 3, 2009 Data processing workflow Into a relational database ISI Parser (demo) Word Splitter (demo) Component Identifier Record Grouper (Pre defined) SQL Relation Calculator (demo) Matrix Builder (demo) Pajek Working on: GeoPlotter André Somers | 4 | SAINT: tools for large-scale bibliometric analysis

5 September 3, 2009 Structure data: ISI Data Importer Download set of articles from ISI Web of Knowledge Selected on keywords, journals, authors, years, … Import as many as you want Optionally filter by type Grouping authors can be turned off Demo time… André Somers | 5 | SAINT: tools for large-scale bibliometric analysis

6 September 3, 2009 Refine data: Word Splitter Split titles, abstracts, etc. into separate words Optionally use stop word lists Or even regular expressions Result: table with words, and tables with data on which word is used where Uses: Co-title word analysis, identify topics in a field, etc. Demo time… André Somers | 6 | SAINT: tools for large-scale bibliometric analysis

7 September 3, 2009 Query data Use one of the simple, pre-defined queries, or Construct your own queries –Requires you to work with SQL Or use the Calculator tool to create and run well-known analyses –Work in progress –Drag & drop construction of analysis –Store your analysis for re-use –Run-time selection of data sources –Multi-threaded: fast on multi-core computers Demo time… André Somers | 7 | SAINT: tools for large-scale bibliometric analysis

8 September 3, 2009 Output data: Matrix Compiler Output data to a Pajek-readable format Based on the assumption that: One table or view/query contains the information on the relations you want to visualize in the network (edges or arcs) Optionally (but recommended!) another table or query contains information about the nodes, like the labels Different kinds of matrices supported Output to.net matrix format (Pajek) Output size limited by memory and disk space only Demo time… André Somers | 8 | SAINT: tools for large-scale bibliometric analysis

9 September 3, 2009 Possible outputs Basically anything that is supported by the data is possible. Co-authorships Co-citation relations Clustering of authors based their keyword usage Clustering of Journals based on the authors that publish in them or vise versa … You come up with new ideas! Salton, Cosine, Jaccard indices All these can be expressed in SQL! André Somers | 9 | SAINT: tools for large-scale bibliometric analysis

10 September 3, 2009 Many plans… There are already more tools, such as: Grouping records (like similar words, addresses, names…) Identifying network components at different thresholds Importing other data sources Interact with BibTechMon Calculator: calculate coefficients like Jaccard with ease Plans for extensions to existing tools: Matrix Compiler output including attributes Have Record Grouper use Relation Calculator Have Relation Calculator use GPU for calculations (CUDA) New tools: Integrate into a shell, harvest book data, GeoPlotter… André Somers | 10 | SAINT: tools for large-scale bibliometric analysis

11 September 3, 2009 Remember… Demo worked with ISI data, but most tools are equally aplicable on other structured data –Many tools require an integer-type ID field –Some other limitations apply Recent version on your USB stick, but download the latest from the internet André Somers | 11 | SAINT: tools for large-scale bibliometric analysis

12 September 3, 2009 Open & Free Open source (GPL 3.0) Open issue tracker, your input is very welcome! Open source code repository (Git) Free as in beer, free as in freedom, but please cite… http://www.rathenau.nl/tools André Somers | 12 | SAINT: tools for large-scale bibliometric analysis

13 September 3, 2009 Thank you Questions? André Somers | 13 | SAINT: tools for large-scale bibliometric analysis

14 September 3, 2009 Database structure André Somers | 14 | SAINT

15 September 3, 2009 Result in Pajek André Somers | 15 | SAINT: tools for large-scale bibliometric analysis

16 September 3, 2009 Edwin Horlings and Peter van den Besselaar | 16 | Where is e-social science going? Title word – cited reference cooccurrence 1995-1998 Title word-cited reference combinations 1995-1998 Partitioned by domain using Pajek; top cluster, 814 nodes; Kamada Kawai, separate components, circular starting positions cellular automata models for traffic simulation game theory in physics and theoretical biology simulation in chemistry (lattice gas simulation) cellular automata in topics relating to computer science, chemistry, physics, biology, medicine applications of neural networks and genetic algorithms; also learning in neural network and machine learning interface between learning and agent- based modeling some geography papers interspersed in CA (urban studies; spatial dynamics; land use interface between learning and neural networks (neural learning and control) theoretical and technical heart of neural networks and genetic algorithms (math and computer science) cellular automata applied to animal and human behaviour (self- organisation) Image by Edwin Horlings

17 September 3, 2009 Edwin Horlings and Peter van den Besselaar | 17 | Where is e-social science going? Title word-cited reference combinations 2005-2008 Partitioned by domain using Pajek; all connected clusters, 3,430 nodes; Kamada Kawai, separate components, circular starting positions clear geography cluster using CA, neural networks, multi-agent systems simulation in materials science social network analysis and game theory cellular automata models for traffic simulation, now including crowd behaviour learning meets game theory and multi-agent analysis applications of neural networks and genetic algorithms multi-agent systems Image by Edwin Horlings

18 September 3, 2009 Edwin Horlings and Peter van den Besselaar | 18 | Where is e-social science going? physics computer & information science biology, ecology economics psychology other social science Title word-cited reference combinations 2005-2008 Partitioned by domain using Pajek; all connected clusters, 3,430 nodes; Kamada Kawai, separate components, circular starting positions Image by Edwin Horlings

19 September 3, 2009 Edwin Horlings and Peter van den Besselaar | 19 | Where is computational social science going? computer science physics fuzzy systems Nature and PNAS neuroscience psychology 1 psychology 2 psychology 3 mathematical computer modeling operational research statistics sociology geography finance management and organisation environmental economics game theory mathematical economics econometrics APPLICATION AREAS general areas and problem-specific niches TECHNICAL AND MATHEMATICAL FOUNDATIONS political science Journal citation environment 2007 Similarity between citation structures of journals mapped in 2D-space (Kamada-Kawai) J Math Sociol, J Math Econ, Math Soc Sci, J Math Psych, J Econ Dyn Control ISI, Journal Citation Reports, 0.5% threshold Image by Edwin Horlings


Download ppt "SAINT: Tools for large scale bibliometric analysis André Somers | 1 September 3, 2009."

Similar presentations


Ads by Google