Download presentation
Presentation is loading. Please wait.
using Bibexcel and Pajek
Mapping science using Bibexcel and Pajek By Olle Persson
Relations Units of analysis - document level - aggregated level: authors, universities, countries, journals … Citation based relationes - direct citations - shared references - co-citations Co-occurrences - co-authorships - co-word
Citatbased relations between dokuments
A cites C = direct citation A and C both cites B = bibliografic coupling A and C are both cited by D = co-citation
Similarity measures Frequencies (raw counts) - n of direct of citations - n of co-occurences - n of shared references Normalized measures - Salton’s index - Jaccard’s index - Pearsons correlation
Mapping science Preparing data Calculating measures Making maps Good if you have some experience with Pajek. You will learn the basics of Bibexcel in this tutorial!
You will need this material
A set of data Bibexcel sofware Pajek Reading material 1st chapter in:
Preparing data
1. Convert to Dialog format
Topic=(co-citation* OR cocitation*) Databases=SCI-EXPANDED, SSCI, A&HCI Timespan=All Years. Update 1. Convert to Dialog format We have already searched and downloaded 569 records from Web of Science on co-citation analysis and We have already replaced line feeds with carriage return in the downloaded file using Bibexcel: Edit doc-file/Replace line feed with carriage return The file to be used is cocit569.tx2 Put Bibexcel.exe in c:\Bibexcel and coccit569.tx2 in c:\Bibexcel\Data Start bibexcel.exe, and next we will have to convert to Dialog format that Bibexcel is designed for
You can open Bibexcel and make all steps in this presentation!
Select the cocit569.tx2 file and run Misc/Convert to Dialog format/Convert from Web of Science
Select cocit569.doc and press View file
Two letter field tag ; = Separates units | = End of field | |= End of record
2. Extracting data from CD- field (cited documents)
Let’s start! Put tag here Units are separated by semicolon
cocit569.out has the cited documents
This is the reference list of doc nr 1
3. Refining the out-file To improve data quality the Edit out-files menu has several options. For example, you may wish to reduce variation by only allowing the 1st initial in author names. Select cocit569.out and run Edit out-files/Keep only author’s first initial
Look at cocit569.1st and you can see that EOM SB is changed to EOM S
Let’s improve a little bit more: Select cocit569
Let’s improve a little bit more: Select cocit569.1st and run Edit outfiles/Convert Upper lower Case/Good for Cited reference strings
Look at cocit569.low. I think this looks much nicer compared to the out-file!
Calculating data
1. Looking at frequencies
Select cocit569.low. Tick here Choose Whole string Press Start!
Look at cocit569.cit which has the cited references in decreasing frequency! For anyone familiar with co-citation research, the top 3 papers shouldn’t come as a surprise.
2. Making co-citations Select the cocit569.cit-file, press View file. In The list, mark cited references down to frequency=30 and then press Copy, then Clear and then Paste. These are the references for which you want co-citations
Select the cocit569.low-file, and run Analyze/Co-occurrence/Make pairs via listbox, and answer No to the next question, and OK for the question after that!
The cocit569. coc had the co-citation frequencies
The cocit569.coc had the co-citation frequencies. We will use that file for mapping!
Select cocit569.coc and run Mapping/Create net-file for Pajek … be sure to answer No to the question if directed arcs, since we do not have any directions here.
The file can be opened from within Pajek, Netdraw, Mapquation etc for drawing maps.
Mapping with Pajek
Open file in Pajek, and then Draw/Draw
This is the first layout with randomly ordered nodes
This is the first layout with randomly ordered nodes. To the upper left, choose Layout/Energy/Kamada-Kawai/Separate components or just press Ctrl-K
The Kamada-Kawai layout is better but still there is perhps too many lines in the graph, since almost everyone is connected to all others
To reduce complexity minimize the draw window and then run Net/Transform/Remove/Lines with Value/lower than/ and put 10 in the box and answer yes to Make new network. After that run Draw/Draw again!
This map ha more structure
This map ha more structure. We find that papers to the left and newer ones to the right. You can press Ctrl-K several times to see what happens
Making vectors Making circles on nodes based on citation frequencies. Go to Bibexcel and select cocit569.cit and the run Mapping/Create vec-file. Below you can see that cocit569.vec is created
Go back to Pajek. Open the Vector file cocit569.vec
and then run Draw/Draw-Vector
Now you can see that circles correspond to n of citations
Making partitions If you wish you can create a clu-file using Bibexcel that indicates the publication year, or decade of the cited documents. Select cocit569.cit and run Edit out-file/Extract publication year from references and you will get a file named cocit569.dpy. Select cocit569.dpy and run Mapping/Create clu-file and you will get a file named cocit569.clu Go to Pajek and open cocit569.clu as partiotion Run Draw/Draw-Partition-Vector and then in the draw window Layers/In y direction
Makes sense?
Using Options/Lines/Different Widths and GreyScale and Options/Size/Of lines = 0.25
This could be a chronological reading list for reviewers and students
Bibexcel makes so many files….
cocit569.tx2: text-file where LF was replaced by CR cocit569.doc: converted to Dialog-format cocit569.out : out-file based on CD-field cocit569.1st : keep only author’s first initial cocit569.low: convert to upper and lower case cocit569.cit: frequencies cocit569.coc: co-occurrences net-file to be open in Pajek cocit569.vec: vec-file to be open as Vectors in Pajek cocit569.clu: clu-file to be open as Partitions in Pajek cocit569.vel: vertices for net-file for use by Bibexcel …. but better to have them than not!
All author co-citation analysis using Scopus records “Its always better not to limit to 1st cited author as in WoS” Get scopuscocit.ris from Select scopuscocit.ris and run Edit doc-file/Replace line feed with carriage return Select scopuscocit.tx2 and run Misc/Convert to Dialog format/Convert from Scopus RIS format Select scopuscocit.doc, put CD in Old tag, choose “Any ; separated field” and press Prep Select scopuscocit.out and run Edit out-file/Scopus tools/Extract all authors from Scopus references Select scopuscocit.sco and run Edit out-file/Decompress outfile Select scopuscocit.nnu, choose Whole string, mark Remove duplicates and Make new out-file, and then press Start Select scopuscocit.oux, mark Sort decending and press Start Select scopuscocit.cit and press View file and select units down to frequencies=30, and be sure only these are in The List Select scopuscocit.oux and run Analyze/Co-occurrences/Make pairs via list box Select the scopuscocit.coc file and then run Mapping/Create net-file for Pajek… Select scopuscocit.cit and run Mapping/Create vec-file Go to Pajek and open as Network and scopuscocit.vec as Vectors Run Draw/Draw-Vector…
To reduce complexity minimize the draw window and then run Net/Transform/Remove/Lines with Value/lower than/ and put 10 in the box and answer yes to Make new network. After that run Draw/Draw-vectorand then ctrl-K Griffith BC would probably not show up in 1st author analysis Webometrics Go back and fix this variant!
For vector graphic quality
For vector graphic quality. At the Draw window run Export/2D/SVG/General and save as allauthormap.htm Get Inkscape free from and open allauthormap.htm, edit and export to png-format
Analyzing direct citations on Web of Science records
Select cocit569.low and run Analyze/Citations among docs/Make citation links. This will make cocit569.lin that has citing docnr in first column and cited docnr in second column. Of course you need to label the doc numbers. Select the cocit569.ddc and double click in the box at “Type new file name here” and the path to cocit569.ddc should appear. Select cocit569.lin and run Add data classify/Add labels to docnr-docnr pairs. Answer No to questions about swapping, self-related pairs, overlapping sets, and about writing doc numbers in addition to labels Select cocit569.add and then run Mapping/Create net-file for Pajek and answer Yes for directed graphs! Open in Pajek and Draw/Draw You will need to reduce complexity: Run Net/Transform/Reduction/Degree/Input and set value=15. Then Draw! If you would like to have different circle sizes: Minimize Draw window and then run Net/Vector/Summing up values of lines/Input a Vector is created that has the number of inlinks to each node. Then Draw/Draw-vector…
Analyzing using Weighted Direct Citations (WDC) We can add number of shared outlinks and inlinks to each direct citation, to give each direct citation different strength Select cocit569.lin and run Analyze/Citations among docs/ Weighted Direct Citations (WDC). The cocit569.wdc has the WDC values for each docnr-docnr pair Again you need to label the doc numbers. Select the cocit569.ddc and double click in the box at “Type new file name here” and the path to cocit569.ddc should appear. Select cocit569.wdc and run Add data classify/Add labels to freq-docnr-docnr/making freq-label-label. Answer No to questions about swapping, self-related pairs, and overlapping sets. Select the cocit569.cdd file and run Edit out-file/Sort numeric/Descending by first column and you will see which are the strongest links by the WDC measure Select cocit569.cdd and run Mapping/Create net-file for Pajek, and answer Yes for directed arcs! In Pajek use Net/Transform/Remove/Lines with Values/Lower than=10! Then Draw/Draw and you will see one big network component and several smaller ones and quite many isolates. You can zoom in to the bigger one by pressing right mourse button and draw. If you go back to Pajek main window and run Net/Components/Weak and type size=20 you will get 1 component and then with Operations/Extract from network/Partition=1 you will get a new network with the big component. Then Draw that network!
…further improvement by saving major component and adding new partitions and vectors
Be sure to mark the main component (with 63 nodes) Then File/Network/Save and then overwrite In Bibexcel select the and run Mapping/Create vel-file from net-file Select the cocit569.ddc file and run and run Edit out-file/Extract publication year from references Select cocit569.dpy and run Mapping/Create clu-file Open cocit569.clu as Partition in Pajek and then Draw/Draw-partition and then Layers/In y direction If you would like to have different circle sizes: Minimize Draw window and then run Net/Vector/Summing up values of lines/Input a Vector is created that has the sum of WDC values of inlinks to each node. Then Draw/Draw-Partition-Vector…
…reduce direct citations by citation year lag
Select cocit569.cdd and run Analyze/Calculate year lags in pairs and answer Yes to add year lag values, which will come in column 1. Column 2 has a normalization (col.3 divided by col.3,) and col. 3 has the WDC value, col. 4 citing doc and col.5 cited doc. Select cocit569.lag and to get year lags 0-2 years put 2 in Max number Box and then run Edit out-files/Delete values high frequencies Select cocit569.max, put 3/4/5 in The Box and run Edit out-file/Select columns Now cocit569.col has WDC values only for links no older than 2 years! Select cocit569.col and run Mapping/Create net-file for Pajek Go to Pajek and open the net-file and the vec-file! Removed lines with values less than 5, then Net/Componenets/Weak (min 20), then extract and save the major component to file In Bibexcel, select cocit569.cdd, put 1/3 in The Box and run Edit out-files/Select columns, and then select cosit569.col and make frequencies with whole string, then cocit569.cit will have number of times a paper is cited. In Bibexcel select and run Mapping/Create vel-file from net-file and then select the cocit 569.cit and run Mapping/Create vec-file Back to Pajek and open the vec-file, and then Draw/Draw-vector
Time dimension is here!
…also, you can reduce co-citations by citation year lag
Select cocit569.coc and run Analyze/Calculate year lags in pairs and answer Yes to add year lag values Select cocit569.lag and to get year lags 0-5 years put 5 in Max number Box and then run Edit out-files/Delete values high frequencies Select cocit569.max, put 1/4/5 in The Box and run Edit out-file/Select columns Now cocit569.col has co-citations values only for pairs no older than 5 years! Select cocit569.col and run Mapping/Create net-file for Pajek Also select cocit569.cit and run Mapping/Create vec-file Go to Pajek and open the net-file and the vec-file!
The same graph as previous, but now ordered in year layers and edited using Inkscape
The End
Similar presentations
© 2025 Inc.
All rights reserved.