Tutorial 2: Using Matlab for network construction, ranking, clustering, topic modeling, and path finding Erjia Yan.

2  Network construction  Ranking  Clustering  Topic modeling  Path finding

4  Bibliographical data

5  Paper-to-paper citation network is the base  Web of Science cited references format:  First Author, Year Of Publication, Abbreviated Journal Name, Volume Number, Beginning Page Number  AANESTAD M, 2011, J STRATEGIC INF SYST, V20, P161  All fields can be found in “full record + cited references” downloading option Some of the newer records may also have DOI. For a better match, it is better to remove the DOI from the cited references

6  For citing papers, extract these fields and format them into Web of Science cited reference format.  Now we have citing papers and cited references that have the same format  Use these two fields, construct an internal citation network that only contains those cited references that are cited by the citing papers in the data set

7  If you can write an app for this, it would be great!  Otherwise, you can follow these instructions  Converting into  Use Access to construct the network  Have a table for citing papers  Import the converted citation pairs to Access  Use query to extract those pairs whose papers are in the table  Now you have the node info and link info  Import both into Matlab CP1CR1; CR2; CR3 CP1CR1 CP1CR2 CP1CR3

8  Now we have paper-to-paper citation networks, but in order to construct for instance author-to-author citation or author co-citation networks, we need to use adjacent matrices. Authors Papers a cell number 1 (i,j)=1 indicates paper i is written by author j

9  Convert into  Add to the beginning of the file  Use Txt2Pajek on the linkage file  Import the edge section of file to Matlab  Select M(1:n,n+1:m) where m is the col size. The selection is our author-paper adjacent matrix ID1AU1; AU2; AU3 ID1AU1 ID1AU2 ID1AU3 ID1 ID2 …… IDn




14  By David Gleich of Purdue University  xchange/11613-pagerank xchange/11613-pagerank  pagerank(M,options)  options.c: the teleportation coefficient [double | {0.85}]  options.v: the personalization vector [vector | {uniform: 1/n}]

16  K-means  IDX = kmeans(X,k)  ml ml  Hierarchical clustering  -clustering.html -clustering.html

17  By MIT Strategic Engineering  matlab_networks matlab_networks  [modules,module_hist,Q] = newmangirvan(adj,k)  [groups_hist,Q]=newman_comm_fast(adj)

18  By Nees van Eck and Ludo Waltman of Leiden University   A variant of the modularity-based clustering technique  [X, cluster_size, V] = VOS_clustering(A, P)

20  By Mark Steyvers of University of California Irvine  a/toolbox.htm a/toolbox.htm  Input: The input is a bag of word representation containing the number of times each words occurs in a document.

22  aphshortestpath.html aphshortestpath.html  [dist, path, pred]=graphshortestpath(G,S,T)  from S to T in graph G  [dist] = graphallshortestpaths(G)  find all shortest path in graph G; dist is a distance matrix for the shortest path of each pair of nodes

