Understanding outside collaborations of the Chinese Academy of Science using Jensen-Shannon divergence Visualization and Data Analysis 2009 San Jose, California,

Slides:



Advertisements
Similar presentations
Hierarchical Clustering
Advertisements

Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Albert Gatt Corpora and Statistical Methods Lecture 13.
Evaluation of Clustering Techniques on DMOZ Data  Alper Rifat Uluçınar  Rıfat Özcan  Mustafa Canım.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Creating Marketplaces for Science The Emerging Characteristics of Modern Science Cyberinfrastructures Russell J. Duhon: Creating Marketplaces for Science.
Mutual Information Mathematical Biology Seminar
Interfaces for Selecting and Understanding Collections.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Routing State Distance: A Path-based Metric for Network Analysis Natali Ruchansky Gonca Gürsun, Evimaria Terzi, and Mark Crovella.
Title1 UN GAID e-SDDC Implementation Plan Overview Prof. LIU Chuang Co-Chair, UN GAID e-SDDC Executive Committee Leading Professor of Global Change Information.
Computational Scientometrics Studying science by scientific means Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information.
Creating Marketplaces for Science 8th Regional Congress on Health Sciences Information Rio de Janeiro, Brazil 18 September 2008 Russell J. Duhon: Creating.
CyberInfrastructure Shell: A Plug-and-Play Macroscopes Framework Chin Hua Kong Sr. System Architect / Project Manager Cyberinfrastructure for Network Science.
Temporal Analysis using Sci2 Ted Polley and Dr. Katy Börner Cyberinfrastructure for Network Science Center Information Visualization Laboratory School.
Computational Scientometrics Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information Visualization Laboratory, Director School.
Computational Scientometrics: Mapping the Structure and Evolution of Science Katy Börner & the InfoVis Lab School of Library and Information Science.
The Scholarly Database and Its Utility for Scientometrics Research Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information.
Network Workbench A Workbench for Network Scientists Download at
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
The 1st Global Tech Mining Conference, Atlanta, USA Analyzing Technology Evolution of Graphene Sensor Based on Patent Documents Fang Shu 1, Hu Zhengyin.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Funding Opportunities and Partnerships Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information Visualization Laboratory, Director.
CyberInfrastructure Shell: A Plug-and-Play Macroscopes Framework P632: Object Oriented Software Development Adam Simpson & Steve Corenflos Software Developers.
Clustering / Scaling. Cluster Analysis Objective: – Partitions observations into meaningful groups with individuals in a group being more “similar” to.
Chapter 2: Tools of Environmental Science
Database and Data API. Database Different data sources: International Offices Web Design the data schema based on the data found Transform all the data.
SIMS 202, Marti Hearst Content Analysis Prof. Marti Hearst SIMS 202, Lecture 15.
VIVO Social Network Visualizations
Data Mining for Expertise: Using Scopus to Create Lists of Experts for U.S. Department of Education Discretionary Grant Programs Good afternoon, my name.
Workshop 4: Developing a one page business case
Chairman, Danish Council for Research Policy
(Workshop Title) Working Pack
Measuring Scholarly and Public Impact: Let’s Talk Metrics
Working with Scholarly Articles
Institute of Economics, University of Campinas, Brazil.
Analysis of University Researcher Collaboration Network Using Co-authorship Jiadi Yao School of Electronic and Computer Science,
Databases Chapter 16.
CSE5544 Final Project Proposal
PV 2009 December 3, 2009 The Data Conservancy: Building Sustainable Infrastructure for Interdisciplinary Scientific Data Curation and Preservation.
Matt Link Associate Vice President (Acting) Director, Systems
SECTION 3: Taking Action
VIVO: Faculty Research Information System and Discovery
L2L The Professional Development Framework through the lens of Libraries & Librarians.
Worksite Network Structure
Chen Jimena Melisa Parodi Menashe Shalom
Soma Mukherjee for LIGO Science Collaboration
Good talks – some hints Henning Schulzrinne
Topic 3: Cluster Analysis
Product Behaviour Analytics - RetailReco
Finding Communities by Clustering a Graph into Overlapping Subgraphs
Advanced Quantitative Analysis
Cultural Industries and Identity Canada Today Pages 22-23
K-means and Hierarchical Clustering
Tableau Data Visualizations and Collection Analysis for Shared Print
Facts and Data Search on the Web

I’m the consumer product lead at Yelp
Do Now (7 min) 8/26 Interpreting Maps pg. 3-4
Analyzing Two Participation Strategies in an Undergraduate Course Community Francisco Gutierrez Gustavo Zurita
Computational Thinking
SAN Website Richard Verhoeven 27-Apr-19
Topic 5: Cluster Analysis
Clustering The process of grouping samples so that the samples are similar within each group.
Evaluation of Clustering Techniques on DMOZ Data
Bibliometric Services at the Masaryk University
Temple BETT Technology Applications
Hierarchical Clustering
Yining ZHAO Computer Network Information Center,
Presentation transcript:

Understanding outside collaborations of the Chinese Academy of Science using Jensen-Shannon divergence Visualization and Data Analysis 2009 San Jose, California, USA 19 January 2009 Add dendrogram or map Russell J. Duhon rduhon@indiana.edu Cyberinfrastructure for Network Science Center School of Library and Information Science Indiana University Bloomington, IN, USA Russell J. Duhon: Understanding Outside Collaborations 01/28

Table of Contents Collaboration Entropy Collaboration and Entropy Information Insight (Perhaps) Applications in the Small and Large Russell J. Duhon: CIShell, Network Workbench, and the Scholarly Database 02/28

What is Collaboration? Collaboration 03/28 Russell J. Duhon: Collaboration 03/28

What is Collaboration? Collaboration Good question . . . but a lot of collaboration is pretty easy to recognize. Co-authorship is a common sort looked at in scientometrics. Russell J. Duhon: Collaboration 04/28

How do we measure collaboration? In a lot of ways, some of which we’ve heard about earlier today. Returning to co-authorship, one way is to make networks and analyze those. Russell J. Duhon: Collaboration 05/28

What if we don’t have a nice co-authorship network? Collaboration What if we don’t have a nice co-authorship network? In other words, how do we uncover less obvious characteristics of interest? Russell J. Duhon: Collaboration 06/28

Entropy Russell J. Duhon: Entropy 07/28

What if we don’t have a nice co-authorship network? Entropy What if we don’t have a nice co-authorship network? In other words, how do we uncover less obvious characteristics of interest? Russell J. Duhon: Entropy 08/28

Entropy Many conceptualizations, and in one sense helps measure distinguishability. Russell J. Duhon: Entropy 09/28

Entropy Shannon Entropy Russell J. Duhon: Entropy 10/28

Jensen-Shannon Divergence Entropy Jensen-Shannon Divergence where H is the Shannon Entropy. Russell J. Duhon: Entropy 11/28

Collaboration and Entropy So, given how a set of collaborators co-author with some control set, we can treat those as distributions and uncover the distinguishability of those distributions. Russell J. Duhon: Collaboration and Entropy 12/28

Collaboration and Entropy Then, given a pairwise distinguishability metric, algorithms taking advantage of metric similarity become available. Russell J. Duhon: Collaboration and Entropy 13/28

Collaboration and Entropy Take those pair-wise metrics, and do Agglomerative Hierarchical Clustering with Ward’s Algorithm to look for structure. Russell J. Duhon: Collaboration and Entropy 14/28

Information Russell J. Duhon: Information 15/28

Information Ignoring Beijing-only collaborators, there are two primary clusters: Russell J. Duhon: Information 16/28

Information More developed, more politically normalized countries, Russell J. Duhon: Information 17/28

Information and less developed, less politically normalized countries. Russell J. Duhon: Information 18/28

Information Within the cluster of more developed, more politically normalized countries, there are again two clear clusters: Russell J. Duhon: Information 19/28

Information Larger, more developed countries, including the up-and-coming BRIC members, Russell J. Duhon: Information 20/28

Information and much smaller countries, for the most part 21/28 Russell J. Duhon: Information 21/28

Information Looking more closely, there are many small clusters that show expected patterns. Russell J. Duhon: Information 22/28

Insight (Perhaps) Starting to approach insight, the locations of Chinese collaborators hint at potential explanations. Russell J. Duhon: Insight (Perhaps) 23/28

Insight (Perhaps) ? , , One of the strongest potential applications will be identifying possible sites for the application of policy tools to strengthen, sustain, and transform collaborative structures. Russell J. Duhon: Insight (Perhaps) 24/28

Applications in the Small and Large Remember the motivation: understanding structure when there is less information than ideal. Russell J. Duhon: Applications in the Small and Large 25/28

Applications in the Small and Large Small research groups generally know who their members collaborate with and how much, but rarely how those people collaborate with each other. Russell J. Duhon: Applications in the Small and Large 26/28

Applications in the Small and Large Large organizations often have similar information limitations, particularly due to identification and data dirtiness. Russell J. Duhon: Applications in the Small and Large 27/28

Applications in the Small and Large In both cases, this procedure may be very useful. The insights hint at potential relations among outside collaborators, and comparison of distributions minimizes the impact of dirty data. Russell J. Duhon: Applications in the Small and Large 28/28

NSF IIS-0534909 and IIS-0513650 awards Thank You Russell Duhon rduhon@indiana.edu NSF IIS-0534909 and IIS-0513650 awards