Network Pajek.

Slides:



Advertisements
Similar presentations
Network Overview Discovery and Exploration for Excel (NodeXl) Hands On Exercise Presented by: Samer Al-khateeb Class: Social Media Mining and Analytics.
Advertisements

EndNote. What is EndNote:  EndNote is referencing software that enables you to create a database of references from your readings. Your database of references.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Getting Started with EndNote X4 Dr. Christiane Holtz, 28.September
CS 206 Introduction to Computer Science II 03 / 27 / 2009 Instructor: Michael Eckmann.
Informetric methods seminar Tutorial 2: Using Pajek for network properties Qi Yu.
Graphs Chapter 20 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
Graphs Graphs are the most general data structures we will study in this course. A graph is a more general version of connected nodes than the tree. Both.
Data Structure and Algorithms (BCS 1223) GRAPH. Introduction of Graph A graph G consists of two things: 1.A set V of elements called nodes(or points or.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge.
Microsoft Office 2010 Access Chapter 1 Creating and Using a Database.
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
UNESCO ICTLIP Module 4. Lesson 3 Database Design, and Information Storage and Retrieval Lesson 3. Information storage and retrieval using WinISIS.
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
Graph Visualization CSC4170 Web Intelligence and Social Computing Tutorial 2 Tutor: Tom Chao Zhou
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Graphs Chapter 20 Data Structures and Problem Solving with C++: Walls and Mirrors, Frank Carrano, © 2012.
Introduction to MATLAB MECH 300H Spring Starting of MATLAB.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Clustering Software Artifacts Based on Frequent common changes Presented by: Ashgan Fararooy Prepared by: Haroon Malik (Modified)
Bibliometric Analysis with Sci2: Choose Your Own Adventure Laura Ridenour School of Library and Information Science, Indiana University.
1 Lesson 6 Exploring Microsoft Office 2007 Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition Morrison / Wells.
Sunbelt XXIV, Portorož, Pajek Workshop Vladimir Batagelj Andrej Mrvar Wouter de Nooy.
Introducing Dreamweaver MX 2004
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Morpho Activity Start Entering/Practicing with real data.
Risk Assessment/Risk Reduction © Risk Assessment/Risk Reduction Risk Assessment Risk Reduction Software.
Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License.
Tutorial 1: Browser Basics.
© 2006 Pearson Addison-Wesley. All rights reserved14 A-1 Chapter 14 Graphs.
1 Working with MS SQL Server Textbook Chapter 14.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 1 1 Browser Basics Introduction to the Web and Web Browser Software Tutorial.
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room A, Chris Hill, Room ,
Pajek – Program for Large Network Analysis Vladimir Batagelj and Andrej Mrvar.
Temporal Analysis using Sci2 Ted Polley and Dr. Katy Börner Cyberinfrastructure for Network Science Center Information Visualization Laboratory School.
Support.ebsco.com Introduction to EBSCOhost Tutorial.
Specview Tutorial for the Line Identification Tool I. Busko Space Telescope Science Institute March, 2010.
Chapter 1 Review Chapter 2 Whatcha Gonna Do???
How to Analyse Social Network?
EndNote. What is EndNote? EndNote is referencing software that enables you to create a database of references from your readings.
1 12/2/2015 MATH 224 – Discrete Mathematics Formally a graph is just a collection of unordered or ordered pairs, where for example, if {a,b} G if a, b.
October RefWorks Basics Creating accounts and folders Adding references (manually & electronically) Sorting, editing and linking Creating a bibliography.
Graphs A graphs is an abstract representation of a set of objects, called vertices or nodes, where some pairs of the objects are connected by links, called.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Ver Chapter 13: Graphs Data Abstraction & Problem Solving with C++
“Pajek”: Large Network Analysis. 2 Agenda Introduction Network Definitions Network Data Files Network Analysis 2.
Informatics tools in network science
B Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Working with PDF and eText Templates.
Tutorial 3: Pajek basics Qi YU.  Getting started  Data format in Pajek  Window tools in Pajek  Resources.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
1 Berger Jean-Baptiste
Tutorial 1 Getting Started with Adobe Dreamweaver CS5.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
Fundamentals of Python: First Programs
Copyright © Zeph Grunschlag,
An Introduction to Computers and Visual Basic
Gephi Gephi is a tool for exploring and understanding graphs. Like Photoshop (but for graphs), the user interacts with the representation, manipulate the.
Network analysis.
Multi-host Internet Access Portal (MIAP) Enhancement Guide
EndNote by: fatimah alotaibi.
ENDNOTE Software – The Basics
Introduction to EBSCOhost
Graphs Chapter 11 Objectives Upon completion you will be able to:
Chapter 14 Graphs © 2006 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

Network Pajek

Introduction Pajek is a program, for Windows, for analysis and visualization of large networks having some thousands or even millions of vertices. In Slovenian language the word pajek means spider.

Application Pajek should provide tools for analysis and visualization of such networks: collaboration networks, organic molecule in chemistry, protein-receptor interaction networks, genealogies, Internet networks, citation networks, diffusion (AIDS, news, innovations) networks, data-mining (2-mode networks), etc. See also collection of large networks at: http://vlado.fmf.uni-lj.si/pub/networks/data/

Main goals to support abstraction by (recursive) decomposition of a large network into several smaller networks that can be treated further using more sophisticated methods; to provide the user with some powerful visualization tools; to implement a selection of efficient (subquadratic) algorithms for analysis of large networks.

six data structures in pajek network – main object (vertices and lines - arcs, edges): graph, valued network, 2-mode or temporal network partition Nominal property of vertices. Default extension: .clu vector numerical property of vertices. Default extension: .vec permutation reordering of vertices. Default extension: .per cluster subset of vertices (e.g. a class from partition). Default extension: .cls. hierarchy hierarchically ordered clusters and vertices. Default extension: .hie

Network – .net Network can be defined in different ways on input file. Look at three of them: 1. List of neighbours (Arcslist / Edgeslist)(see test 1.net) *Vertices 5 1 ”a” 2 ”b” 3 ”c” 4 ”d” 5 ”e” *Arcslist 1 2 4 2 3 3 1 4 4 5 *Edgeslist 1 5

Explanation Data must be prepared in an input (ASCII) file. Program NotePad can be used for editing. Much better is a shareware editor, TextPad. Words, starting with *, must always be written in first column of the line. They indicate the start of a definition of vertices or lines. Using *Vertices 5 we define a network with 5 vertices. This must always be the first statement in definition of a network. Definition of vertices follows after that – to each vertex we give a label, which is displayed between “ and ”. Using *Arcslist, a list of directed lines from selected vertices are declared (1 2 4 means, that there exist two lines from vertex 1, one to vertex 2 and another to vertex 4). Similarly *Edgeslist, declares list of undirected lines from selected vertex. In the file no empty lines are allowed – empty line means end of network.

Network – .net 2. Pairs of lines (Arcs / Edges) (see test 2.net) *Vertices 5 1 ”a” 2 ”b” 3 ”c” 4 ”d” 5 ”e” *Arcs 1 2 1 1 4 1 2 3 2 3 1 1 3 4 2 4 5 1 *Edges 1 5 1

Explanation Directed lines are defined using *Arcs, undirected lines are defined using *Edges. The third number in rows defining arcs/edges gives the value/weight of the arc/edge. In the previous format (Arcslist / Edgeslist) values of lines are not defined the format is suitable only if all values of lines are 1. If values of lines are not important the third number can be omitted (all lines get value 1). In the file no empty lines are allowed – empty line means end of network.

Network – .net 3.Matrix (see test 3.net) *Vertices 5 1 ”a” 2 ”b” 3 ”c” 4 ”d” 5 ”e” *Matrix 0 1 0 1 1 0 0 2 0 0 1 0 0 2 0 0 0 0 0 1 1 0 0 0 0

Explanation In this format directed lines (arcs) are given in the matrix form (*Matrix). If we want to transform bidirected arcs to edges we can use “Network>create new network>Transform>Arcs to Edges>Bidirected only”

Additional definition of network Additionally, Pajek enables precise definition of elements used for drawing networks (coordinates of vertices, shapes and colors of vertices and lines, ...). Example: (see test 4.net) *Vertices 5 1 “a” box 2 “b” ellipse 3 “c” diamond 4 “d” triangle 5 “e” empty ...

Draw Layout of networks Energy: The network is presented like a physical system, and we are searching for the state with minimal energy Kamada-Kawai: using separate components, you can tile connected components in a plane Fruchterman-Reingold: draw in a plane or space and selecting the repulsion factor Eigen Values: Selecting 2 or 3 eigenvectors to become the coordinates of vertices. Can obtain nice pictures

Partition – .clu Partitions are used to describe nominal properties of vertices. e.g., 1-men, 2-women Definition in input file (see test.clu) *Vertices 5 1 2

Vector – .vec Vectors are used to describe numerical properties of vertices (e.g., centralities). Definition in input file (see test.vec) *Vertices 5 0.58 0.25 0.08

Pajek project files It is time consuming to load objects one by one. Therefore it is convenient to store all data in one file, called Pajek project file (.paj). (see test.paj) Project files can be produced manually by using “File>Pajek Project File>Save” To load objects stored in Pajek project file select “File>Pajek Project File>Read”

Menu structure Commands are put to menu according to the following criterion: commands that need only a network as input are available in menu Net, commands that need as input two networks are available in menu Networks, commands that need as input two objects (e. g., network and partition) are available in menu Operations, commands that need only a partition as input are available in menu Partition . . .

Global and local views on network

Global and local views on network Local view is obtained by extracting sub-network induced by selected cluster of vertices. Global view is obtained by shrinking vertices in the same cluster to new (compound) vertex. In this way relations among clusters of vertices are shown. Combination of local and global view is contextual view: Relations among clusters of vertices and selected vertices are shown.

Example Import and export in 1994 among 80 countries are given. They is given in 1000$. (See Country_Imports.net) Partition according to continents (see Country_Continent.clu) 1 – Africa, 2 – Asia, 3 – Europe, 4 – N. America, 5 – Oceania, 6 – S. America. Operations>Extract from Network>Partition Operations>Shrink Network>Partition

Extracting Subnetwork Operations>Extract from Network>Partition

Extracting Subnetwork Operations>Shrink Network>Partition

Removing lines with low values Network>Info>Line Values

Removing lines with low values Network>Create New Network>Transform>Remove>Lines with value>lower than (340000)

Resources Download Text file into Pajek WoS to Pajek Tutorial The latest version of Pajek is freely available, for non-commercial use, at its home page: http://vlado.fmf.uni-lj.si/pub/networks/pajek/ Text file into Pajek http://vlado.fmf.uni-lj.si/pub/networks/pajek/howto/text2pajek.htm WoS to Pajek http://vlado.fmf.uni-lj.si/pub/networks/pajek/WoS2Pajek/default.htm Tutorial Exploratory Social Network Analysis with Pajek visit Pajek wiki for more information http://pajek.imfm.si/doku.php

http://pajek.imfm.si/doku.php?id=wos2pajek/ WOS to pajek

Web of Science S519

Output S519

Output S519

wos2pajek The download link: The new tutorial slides: http://pajek.imfm.si/doku.php?id=wos2pajek The new tutorial slides:  http://pajek.imfm.si/lib/exe/fetch.php?media=faq:wos:wos2pajek07.pdf

MontyLingua Download from: http://web.media.mit.edu/~hugo/montylingua/ Unpack it and copy ‘montylingua-2.1’ to C:\Python26\Lib\site-packages Set up a new environment variable named ‘MONTYLINGUA’ and set the variable value as c:\Python26\Lib\site-packages\MontyLingua-2.1\Python

wos2pajek Download the latest version of WoS2Pajek. http://pajek.imfm.si/doku.php?id=wos2pajek Unpack it, and double click on WoS2Pajek.py to show the main interface of program:

You can also put all wos files in a folder

WoS2Pajek Program The current version of WoS2Pajek requires 7 parameters to be given by the user: MontyLingua directory: path to the directory in which the MontyLingua package is installed; project directory: where the output files are saved; WoS file; maxnum – estimate of the number of all vertices (number of records+number of cited Works) –30*number of records; step – prints info about each k*step record as a trace; step= 0– no trace. use ISI name / short name; make a clean WoS file without duplicates; boolean list[DE, ID, TI, AB] specifying which fields are sources of keywords.

Wos-pajek.txt

Cite.net Network/Info/General Network/Create New Network/Transform/Remove/Loops Network/Create New Network/Transform/Remove/Multiple lines/Single line

CiteNew.net Paper citation network Questions What are highly cited articles? The diameter of the network? What are the major clusters? More questions?

Strong component of cite network Network/Create Partition/Components/Strong [2] Operations/Network+Partition/Extract SubNetwork [1-*] Operations/Network+Partition/Transform/Remove Lines/Between Cluster Save citestrong.clu

Co-author network Read WA.net Network/2-mode network/2-mode to 1-mode/Columns Network/Create Partition/Components/Weak [2] Operations/Network+Partition/Extract SubNetwork[1-*] Network/Create New Network/Transform/Remove/Loops WANew.net (which is a co-author network) Questions: The author with highest co-authors?

Bibliographic coupling network [Read Cite.net] Network/Create New Network/Transform/1-mode to 2-mode Network/2-mode Network/2-mode to 1-mode/Rows Network/Create Partition/Components/Weak [2] Operations/Network + Partition/Extract SubNetwork [1-*]

Co-citation network [Read Cite.net] Network/Create Partitions/Degree/Output Operations/Network+Partition/Extract subNetwork [1-*] Network/Create New Network/Transform/1-mode to 2-mode Network/2-mode network/2-mode to 1-mode/Columns Network/Create Partition/Components/Weak [2] Operations/Network+Partition/Extract SubNetwork [1-*]

Network analysis

Two-mode network One-mode network Two-mode network each vertex can be related to each other vertex. Two-mode network vertices are divided into two sets and vertices can only be related to vertices in the other set.

Example Suppose we have data as below: P1: Au1, Au2, Au5 *vertices 15 10 1 "P1" 2 "P2" 3 "P3" 4 "P4" 5 "P5" 6 "P6" 7 "P7" 8 "P8" 9 "P9" 10 "P10" 11 "Au1" 12 "Au2" 13 "Au3" 14 "Au5" 15 "Au5" *edgeslist 1 11 12 15 2 12 14 15 3 14 4 11 15 5 12 13 6 13 7 11 15 8 11 12 14 9 11 12 13 14 15 10 11 12 15 Suppose we have data as below: P1: Au1, Au2, Au5 P2: Au2, Au4, Au5 P3: Au4 P4: Au1, Au5 P5: Au2, Au3 P6: Au3 P7: Au1, Au5 P8: Au1, Au2, Au4 P9: Au1, Au2, Au3, Au4, Au5 P10: Au1, Au2, Au5 See two_mode.net

Transforming to valued networks The network is transformed into an ordinary network, where the vertices are elements from the first subset, using “Network>2 mode network>2-Mode to 1-Mode>Rows”.

Transforming to valued networks If we want to get a network with elements from the second subset we use “Network>2 mode network>2-Mode to 1-Mode>Columns”.

Basic information about a network Basic information can be obtained by “Network>Info>General” which is available in the main window of the program. We get number of vertices number of arcs, number of directed loops number of edges, number of undirected loops density of lines Additionally we must answer the question: Input 1 or 2 numbers: +/highest, -/lowest where we enter the number of lines with the highest/lowest value or interval of values that we want to output. If we enter 10 , 10 lines with the highest value will be displayed. If we enter -10, 10 lines with the lowest value will be displayed. If we enter 3 10 , lines with the highest values from rank 3 to 10 will be displayed.

Metformin Network Load metformin network to Pajek

EntityMetrics Entitymetrics is defined as using entities (i.e., evaluative entities or knowledge entities) in the measurement of impact, knowledge usage, and knowledge transfer, to facilitate knowledge discovery. Ding, Y., Song, M., Han, J., Yu, Q., Yan, E., Lin, L., & Chambers, T. (2013). Entitymetrics: Measuring the impact of entities. PLoS One, 8(8): 1-14.

EntityMetrics

Diameter of the network Network/Create New Network/SubNetwork with Paths/Info on Diameter Pajek returns only the two vertices that are the furthest away.

Component Strongly connected components Weakly connected components Network>Create Partition>Components>Strong Weakly connected components Network>Create Partition>Components>Weak Result is represented by a partition vertices that belong to the same component have the same number in the partition. Example component.net

Component.net

Weak Component Go to partition weak component, Partition>make network>random network>Input Visualize the new random network

Weak Component

Strong Component

Strong Component

Bicomponent A cut-vertex is a vertex whose deletion increases the number of components in the network. A bi-component is a component of minimum size 3 that does not contain a cut-vertex.

Bicomponent example

Bicomponent Network/Create New Network/......with Bi-Connected Components stored as Relation Numbers Bicommponents are stored in hierarchy Load USAir97.net Get bicomponents with (14 of them) with component size >3

Bicomponent The largest component is 244 airports

Bicomponents Hierarchy>Extract Cluster (13), then result is stored in cluster Draw the cluster

Bicomponents Operations>Network+Cluster>Extract SubNetwork

Bicomponents Operations>Network+Cluster>Extract SubNetwork The info about the largest cluster (244)

Bicomponents Network>Create Partition>Degree>Input Busy airports

K-Cores A subset of vertices is called a k-core if every vertex from the subset is connected to at least k vertices from the same subset. K-Cores can be computed using “Network>Create Partitions>K-Core” and selecting Input, Output or All core. Result is a partition: for every vertex its core number is given. In most cases we are interested in the highest core(s) only. The corresponding subnetwork can be extracted using “Operations>Extract from Network>Partition” and typing the lower and upper limit for the core number. Example See k_core.net

K_core.net

Clustering Coefficients How three nodes are connected Calculation of local Clustering Coefficients: Network>Create Vector>Clustering Coefficients>CC1 K_core.net

Degree Centrality Degree centrality Network>Create Partition>Degree, or Network/Create Vector/Centrality/Degree; Example: Metformin network

Betweenness Centrality How nodes are connecting different clusters Betweenness centrality Network>Create vector>Centrality>Betweenness

Betweenness Centrality The betweenness centrality value for each node

Closeness Centrality Closeness centrality Network>Create Vector>Centrality>Closeness Showing how one node is close to all other nodes in the network

Shortest Path Network/Create New Network/SubNetwork with Paths/.. ...One Shortest Path between Two Vertices Enter two vertices Forget values on lines Yes, if searching for the shortest path is based on lengths No, if searching for the shortest path is based o vlaue of lines Identify vertices in source network No Result will be a new subnetwork containing the two selected vertices Layout>Energy>Kamada Kawai>Fix first and last

Shortest path Network/Create New Network/SubNetwork with Paths/.. ...One Shortest Path between Two Vertices (17-7045)