Topical Scientific Community —A combined perspective of topic and topology Jin Mao Postdoc, School of Information, University of Arizona Sept 4, 2015.

Slides:



Advertisements
Similar presentations
Introduction to Network Theory: Modern Concepts, Algorithms
Advertisements

Analysis and Modeling of Social Networks Foudalis Ilias.
Title: The Author-Topic Model for Authors and Documents
Introduction to Networks and Business Intelligence Prof. Dr. Daning Hu Department of Informatics University of Zurich Sep, 2012.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
Mining and Searching Massive Graphs (Networks)
Networks FIAS Summer School 6th August 2008 Complex Networks 1.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Global topological properties of biological networks.
Mining and Searching Massive Graphs (Networks) Introduction and Background Lecture 1.
CSE 222 Systems Programming Graph Theory Basics Dr. Jim Holten.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
Graphs G = (V,E) V is the vertex set. Vertices are also called nodes and points. E is the edge set. Each edge connects two different vertices. Edges are.
How is this going to make us 100K Applications of Graph Theory.
Models of Influence in Online Social Networks
CS105 Introduction to Social Network Lecture: Yang Mu UMass Boston.
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
Research Meeting Seungseok Kang Center for E-Business Technology Seoul National University Seoul, Korea.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Social Networking and On-Line Communities: Classification and Research Trends Maria Ioannidou, Eugenia Raptotasiou, Ioannis Anagnostopoulos.
Community detection algorithms: a comparative analysis Santo Fortunato.
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
Science: Graph theory and networks Dr Andy Evans.
Complex Networks First Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Special Topics in Educational Data Mining HUDK5199 Spring 2013 March 25, 2012.
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
Analysis of biological networks Part III Shalev Itzkovitz Shalev Itzkovitz Uri Alon’s group Uri Alon’s group July 2005 July 2005.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.
How to Analyse Social Network?
Data Structures & Algorithms Graphs
L – Modelling and Simulating Social Systems with MATLAB Lesson 6 – Graphs (Networks) Anders Johansson and Wenjian Yu (with S. Lozano.
Complex Networks: Models Lecture 2 Slides by Panayiotis TsaparasPanayiotis Tsaparas.
1. 2 CIShell Features A framework for easy integration of new and existing algorithms written in any programming language. CIShell Sci2 Tool NWB Tool.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Topic Modeling using Latent Dirichlet Allocation
Topics Paths and Circuits (11.2) A B C D E F G.
Network Community Behavior to Infer Human Activities.
A connected simple graph is Eulerian iff every graph vertex has even degree. The numbers of Eulerian graphs with, 2,... nodes are 1, 1, 2, 3, 7, 16, 54,
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Models and Algorithms for Complex Networks Introduction and Background Lecture 1.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
Class 2: Graph Theory IST402. Can one walk across the seven bridges and never cross the same bridge twice? Network Science: Graph Theory THE BRIDGES OF.
How to Analyse Social Network? Social networks can be represented by complex networks.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Class 2: Graph Theory IST402.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
Information Retrieval Search Engine Technology (10) Prof. Dragomir R. Radev.
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
Netlogo demo. Complexity and Networks Melanie Mitchell Portland State University and Santa Fe Institute.
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Lecture II Introduction to complex networks Santo Fortunato.
Sul-Ah Ahn and Youngim Jung * Korea Institute of Science and Technology Information Daejeon, Republic of Korea { snowy; * Corresponding Author: acorn
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Groups of vertices and Core-periphery structure
Applications of graph theory in complex systems research
Postdoc, School of Information, University of Arizona
Clustering Coefficients
(Social) Networks Analysis II
Presentation transcript:

Topical Scientific Community —A combined perspective of topic and topology Jin Mao Postdoc, School of Information, University of Arizona Sept 4, 2015

Complex Network Multidisciplinary : Mathematics(Graph theory), Physics, Social Science, Informetrics, … Network Science: an emergent cross-disciplinary area

Examples of complex networks Internet WWW Transport networks Protein interaction networks Social networks...

Graph Theory Leonhard Euler's paper on “Seven Bridges of Königsberg”, published in Other problems in modern science

Definition of Graph(Network) G ={V, E} V is a set of nodes, points, or vertices. E is a set of edges, lines, ties, or connections. an adjacency matrix is a means of representing which vertices (or nodes) of a graph are adjacent to which other vertices. V:={1,2,3,4,5,6} E:={{1,2},{1,5},{2,3},{2,5},{3,4},{4,5},{4,6} }

Types of Graph Undirected Graph Directed Graph Unweighted Graph Weighted Graph each edge has an associated weight, usually given by a weight function w: E  R.

Graph Structures Path Connectivity Component

Structural Measures Degree Centrality Number of edges incident on a node In-degree: Number of edges entering Out-degree: Number of edges leaving

Structural Measures Length of Shortest Path Length of Shortest Path Diameter Diameter Node scale Node scale Edge scale Edge scale Density Density Betweenness centrality Betweenness centrality Closeness centrality Closeness centrality Eigenvector centrality Eigenvector centrality Edge betweenness Edge betweenness Cluster coefficient Cluster coefficient

Research in network science To generalize statistical properties of complex networks: Small-World Network: small diameter, large cluster coefficient Small-World Network: small diameter, large cluster coefficient Scale-free Network: power law degree distribution Scale-free Network: power law degree distribution …. …. Research Paradigm: model/reflect complex circumstance with networks ??What’s the physical meanings of these generalize statistical properties in specific domain

Community Structure in Complex Network Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the national academy of sciences,99(12), In social network analysis: Clique, Clan, K-shell, … to identify interesting social groups Definition: For a subgraph, internal degree is larger than external degree. Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., & Parisi, D. (2004). Defining and identifying communities in networks. PNAS, 101(9),

Community Structure in Scientific Communications The scientific community is a diverse network of interacting scientists. It includes many "sub-communities" working on particular scientific fields.(Wikipedia) Sociology of Science Philosophy of Science Kuhn, The Structure of Scientific Revolutions KuhnThe Structure of Scientific Revolutions abstract vague to identify members What’s that on earth? Research group? Research team? Research institution?

Community Structure in Scientific Communications Implications by the research paradigm of complex network : model the scientific system with networks, detect scientific communities from the networks of scientists. Scholarly networks: Coauthor network Author citation network Author coupling network …. Semantic methods: Author clustering  Combine them both Topology-based community detection Topic-based community detection (Ding, 2010)

Community Structure in Scientific Communication In scientometrics, scientists hold the opinion that the community structure can reflect the structure of science It’s an approach to understand researchers, topics, publications,…. and their relations

Community Structure in Scientific Communication Finally, we find some methods to discover scientific communities. And the research on detecting scientific communities is still on the way by exploring various features of the scientific circumstance. We need to go further: Where is scientific community from? How does scientific community emerge? What are the properties of scientific community? Statistical & on ground What’s the role of scientific community? In this paper, we have observed a new form of scientific community with topic constrains, i.e., topical scientific community, and attempted to investigate the properties of topical scientific community rooted in the scientific circumstance.

Topical Scientific Community Definition in the research progress on a specific topic, a groups of researchers form a topical scientific community to address research questions of a specific topic by collaborating with each other intensively. Two significant features Interact in the same semantic space Form collaborations

Topical Scientific Community Figure 1. The conception of topical scientific community

Topical Scientific Community Detection Approach Dataset: Web of Science (WoS), metrics field(~2014), 6959 papers Fields: title, abstract, author, address, responding author, and year Author name disambiguation: 1, standardize: surname plus initials of all the words in the given names, e.g., “Strotmann, Andreas” is transformed into “Strotmann, A”. 2, extract the affiliations of the author names, keep the organizations: “School of Library and Information Science, Indiana University” will be extracted as “Indiana University”. 3, disambiguate: a)Generally, one name with the same organization is treated as a distinct author name. However, one author can have many organizations in practice. b) In a particular paper, the same standardized name with multiple organizations is assumed to be the same author.

Topical Scientific Community Detection Approach Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation[J]. Journal of machine Learning Research, 2003, 3: We get: Topic z: term distribution, P(w|z) Document d: topic distribution, P(z|d) Discover topics: In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. LDA Graphic Model

Topical Scientific Community Detection Approach Rosen-Zvi M, Griffiths T, Steyvers M, et al. The author-topic model for authors and documents[C]//Proceedings of the 20th conference on Uncertainty in artificial intelligence. AUAI Press, 2004: We get: Topic z: term distribution, P(w|z) Author x: topic distribution, P(z|x) We can infer: Document d: topic distribution, P(z|d) Author Topic model(AT): authors have their word preferences pertaining to their research topics and to write a paper is to generate words from the topics of its authors. AT Graphic Model

Topical Scientific Community Detection Approach 1) Construct topical scientific collaboration networks for the topics. the authors pertaining to the topic become the nodes of the network the collaborations between the authors form the edges of the network 2) Detect components as topical scientific community. Any two components are isolated from each other showing no collaboration between their authors.

Results Topics AT LDA The optimal range for K seems to be about from 40 to topics are reported.

Results Topics Topic 2: GeographTopic 10: Patent Topic 14 : Trends 24countri technolog new output patent chang nation innov develop european industri emerg world sector gener usa compani histori Musavi,SM Zhang,Y Block,JA Miguel,S Heimeriks,G Babu,AR Lindqvist,OV Erfanmanesh,M Bhavnani,SK Anwar,MA Tang,J Marcondes,CH0.2143

Results Topical Scientific Collaboration Networks Metrics# of nodes# of edges Edge weights Density Min E-04 Max E-04 Avg E-04 Std.dev E-04

Results Topical Scientific Communities Topic # of com. Topic # of com. Topic # of com. Topic # of com. Topic # of com communities in 50 topics

Results Topical Scientific Communities Figure 5. The Network of Topical Scientific Community “C12_41” 7 member authors have coauthored 2 papers in this topic, forming 19 internal collaborations

The Characteristics of Topical Scientific Community V.S. Global Scientific Community Network metrics Topical Collaboration Networks(Avg.) Metrics Slovenian Scientists Synthetic Chemistry No. of Nodes2648,10612,6096,645 No. of Communities1021, Avg. No. of Nodes in Communities Global Scientific Community is detected from the collaboration network for a discipline or disciplines. modularity based approaches: maximize inner links and minimize outer links for the communities.

The Characteristics of Topical Scientific Community VS Global Scientific Community

The Characteristics of Topical Scientific Community Statistical Properties MetricsNodesEdges Edge Weights Density Min Max Avg Std.dev topical scientific community is a kind of meso-level structure emerging from the collaboration of researchers members interact intensively.

The Characteristics of Topical Scientific Community The Contributions of Topical Scientific Community Fewer authors, significant portion of papers Improved author productivities

Discussion Topical scientific community emerges from the author collaborations in the research activities. Topical coherence drives the collaborations between researchers for some part. Topical scientific community reflects some kind of research organization with high productivity. Implications Limitations One dataset Topic is latent and unsupervised, to v.s. other methods. Future study Generalize and go further on its characteristics Dynamics: growth law, coevolve with topics …

Beyond the paper Graph representation is used in text mining Some tasks can be addressed by using network measures: ranking entities/texts, keyword/key phrases extraction, feature selection,… Implications

Thank you! Q&A