ILPnet2 social network analysis

Slides:



Advertisements
Similar presentations
Aims of a Variance Components Analysis Estimate the amount of variation between groups (level 2 variance) relative to within groups (level 1 variance)
Advertisements

Engineering Institutions in the 21st Century Dr A Roberts, Chief Executive IEE.
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Network Pajek.
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
1 Relational Data Mining Applied to Virtual Engineering of Product Designs Monika Žáková 1, Filip Železný 1, Javier A. Garcia-Sedano 2, Cyril Masia Tissot.
Informetric methods seminar Tutorial 2: Using Pajek for network properties Qi Yu.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
Lecture 9 Measures and Metrics. Structural Metrics Degree distribution Average path length Centrality Degree, Eigenvector, Katz, Pagerank, Closeness,
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data mining and statistical learning - lecture 14 Clustering methods  Partitional clustering in which clusters are represented by their centroids (proc.
CSE 321 Discrete Structures Winter 2008 Lecture 25 Graph Theory.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
RESEARCH COLLABORATION OF ARTIFICIAL INTELLIGENCE LITERATURE OUTPUT: A SCIENTOMETRIC ANALYSIS Presented by S.JEYAPRIYA, 2 nd MLIS, BDU, Trichy Guide Dr.
Course Overview & Introduction to Social Network Analysis How to analyse social networks?
HIERARCHICAL CONFORMANCE CHECKING OF PROCESS MODELS BASED ON EVENT LOGS Jorge Munoz-Gama, Josep Carmona and Wil M.P. van der Aalst.
Sunbelt XXIV, Portorož, Pajek Workshop Vladimir Batagelj Andrej Mrvar Wouter de Nooy.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
Lecture 18 (Ch 18) HW: Ch 18: 1, 3, 15, 41 Kinetics pt 2: Temperature Dependence of Rate Constants.
Outline Class Intros – What are your goals? – What types of problems? datasets? Overview of Course Example Research Project.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
Lecture 20: Cluster Validation
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Pajek – Program for Large Network Analysis Vladimir Batagelj and Andrej Mrvar.
Temporal Analysis using Sci2 Ted Polley and Dr. Katy Börner Cyberinfrastructure for Network Science Center Information Visualization Laboratory School.
Hierarchical Document Clustering using Frequent Itemsets Benjamin C. M. Fung, Ke Wang, Martin Ester SDM 2003 Presentation Serhiy Polyakov DSCI 5240 Fall.
Friends and Locations Recommendation with the use of LBSN By EKUNDAYO OLUFEMI ADEOLA
Multiple parallel hidden layers and other improvements to recurrent neural network language modeling ICASSP 2013 Diamantino Caseiro, Andrej Ljolje AT&T.
Data Extraction using Image Similarity CIS 601 Image Processing Ajay Kumar Yadav.
An Energy-Efficient Voting-Based Clustering Algorithm for Sensor Networks Min Qin and Roger Zimmermann Computer Science Department, Integrated Media Systems.
Document Ontology Extractor (DOE) Research Team: Govind R Maddi, Jun Zhao Chakravarthi S Velvadapu Faculty: Dr.Sadanand Srivastava Dr.James Gil De Lamadrid.
1. 2 CIShell Features A framework for easy integration of new and existing algorithms written in any programming language. CIShell Sci2 Tool NWB Tool.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
“Pajek”: Large Network Analysis. 2 Agenda Introduction Network Definitions Network Data Files Network Analysis 2.
 Life Expectancy is 180 th in the World.  Literacy Rate is 4 th in Africa.
Miss Joanne Smith Final Year PhD Student Supervisor: Dr Alex Duffy Research Presentation Day Thursday 24th January 2001 A Holistic Modular Design Methodology.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.
Visualization of Biological Information with Circular Drawings.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
CLUSTERING GRID-BASED METHODS Elsayed Hemayed Data Mining Course.
Chapter 8 (3-4), 9 More about Correlation. Today’s Lecture l SD Line l Calculating r l correlation vs causation.
Statistics and probability Dr. Khaled Ismael Almghari Phone No:
Graph clustering to detect network modules
Computational Biology
Ranking in social networks
Properties of Sine and Cosine Functions
Summary of Prev. Lecture
Waikato Environment for Knowledge Analysis
Topic 3: Cluster Analysis
Parametric calibration of speed–density relationships in mesoscopic traffic simulator with data mining Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2009/10/20.
Parametric and non parametric tests
A * B Supplemental Figure S1.
Chi-Square Analysis of Cells in Mitosis Cards
Michael L. Nelson CS 495/595 Old Dominion University
Evaluation of a Stylometry System on Various Length Portions of Books
Concept Decomposition for Large Sparse Text Data Using Clustering
Ying Dai Faculty of software and information science,
Statistics II: An Overview of Statistics
Analyzing Two Participation Strategies in an Undergraduate Course Community Francisco Gutierrez Gustavo Zurita
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Ying Dai Faculty of software and information science,
“Exploring” spherical-wave reflection coefficients
Topic 5: Cluster Analysis
Clustering The process of grouping samples so that the samples are similar within each group.
Developmental time course of network measures.
Fig. 1 Prestige hierarchies in faculty hiring networks.
Presentation transcript:

ILPnet2 social network analysis Miha Grčar Course in Knowledge Management Lecturer: prof. dr. Nada Lavrac Ljubljana, January 2007

Outline of the presentation Data preprocessing Directing the network Social vs. structural prestige Correlation between the two Triad census of strong components in the co-authorship network Hierarchy of authors with respect to co-authorship Conclusions Ljubljana, January 2007 Miha Grčar

Data preprocessing # citations # (joint) publications Ljubljana, January 2007 Miha Grčar

Data preprocessing Pajek network file SQL Ljubljana, January 2007 Miha Grčar

Ljubljana, January 2007 Miha Grčar

Directing the network Create a complete directed network Logarithmize and normalize values Allow each author to keep at most k outgoing arcs – the ones with the highest weights Calculate proximity prestige for several different values of k and a, and determine its correlation with/to the social prestige represented by the number of citations Ljubljana, January 2007 Miha Grčar

Correlation Ljubljana, January 2007 Miha Grčar

Strong components triad census for k=3, a=1 ------------------------------------------------------------------------------------------------------ Type Number of triads (ni) Expected (ei) (ni-ei)/ei Model 3 - 102 0 61.84 -1.00 Balance 16 - 300 0 0.00 -1.00 1 - 003 2985835 2984491.39 0.00 Clusterability 4 - 021D 10 61.84 -0.84 Ranked Clusters 5 - 021U 1534 61.84 23.80 9 - 030T 28 0.33 85.14 12 - 120D 0 0.00 -1.00 13 - 120U 0 0.00 -1.00 2 - 012 44402 47062.30 -0.06 Transitivity 14 - 120C 0 0.00 -1.00 Hierarchical Clusters 15 - 210 0 0.00 -1.00 6 - 021C 55 123.69 -0.56 Forbidden 7 - 111D 0 0.33 -1.00 8 - 111U 0 0.33 -1.00 10 - 030C 0 0.11 -1.00 11 - 201 0 0.00 -1.00 Chi-Square: 37695.2629*** 10 cells (62.50%) have expected frequencies less than 5. The minimum expected cell frequency is 0.00. Ljubljana, January 2007 Miha Grčar

Strong components in k=3, a=1 Net>Hierarchical Decomposition>Symmetric-acyclic (Net>Transform>Remove>Loops) Net>Partitions>Depth>Acyclic --------------------------------------------------------------- Net>Components>Strong - 1st partition: depth partition of the shrunk network - 2nd partition: strong components Partitions>Expand>First according to Second (Shrink) Operations>Transform>Remove Lines>Between Clusters Net>Transform>Arcs->Edges>Bidirected only Ljubljana, January 2007 Miha Grčar

Strong components, hierarchical view Ljubljana, January 2007 Miha Grčar

People, ranked clusters 1. Remove inter-cluster arcs 2. Convert bidirected intra-cluster arcs into edges 3. Remove all remaining arcs Ljubljana, January 2007 Miha Grčar

People, hierarchical view Ljubljana, January 2007 Miha Grčar

Conclusions (Typical) data-mining data preprocessing process was presented We have shown that some directed network models reflect the ranking of authors according to the citations quite well We showed Pajek can be used to explore rankings and hierarchies in social networks Slovene ILP team rocks!  Ljubljana, January 2007 Miha Grčar