Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.

Slides:



Advertisements
Similar presentations
PubMed/How to Search, Display, Download & (module 4.1)
Advertisements

EndNote Web Reference Management Software (module 5.1)
In the Format section, we have activated the Bibliographic style drop down menu. From this page, you can choose a specific journal or format (e.g. BMC.
EndNote Web Reference Management Software (module 5)
Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale Network Theory: Computational Phenomena and Processes Social Network.
Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,
Online Search Mehdi Osooli M.S.C in Epidemiology Department of Epidemiology & Biostatistics School of Public Health Tehran University of Medical Sciences.
Relationship Mining Network Analysis Week 5 Video 5.
Network Visualization using Gephi Ted Polley and Dr. Katy Börner Cyberinfrastructure for Network Science Center Information Visualization Laboratory School.
Data Structure and Algorithms (BCS 1223) GRAPH. Introduction of Graph A graph G consists of two things: 1.A set V of elements called nodes(or points or.
INFORMATION MURAL A technique for displaying and navigating large information spaces Dean F. Jerding and John T. Stasko Graphics, Visualization, and Usability.
 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge.
Garland Library Online Orientation. Introduction  This portion of the Online orientation is intended to help library users gain the basic knowledge and.
Graph & BFS.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
CS 376b Introduction to Computer Vision 04 / 08 / 2008 Instructor: Michael Eckmann.
Journal Citation Reports on the Web. Copyright 2006 Thomson Corporation 2 Introduction JCR distills citation trend data for 7,600+ journals from more.
CSE 222 Systems Programming Graph Theory Basics Dr. Jim Holten.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Tuple – InfoVis Publication Browser CS533 Project Presentation by Alex Gukov.
Unsupervised Learning
Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.
Search on Journal of Dairy Science ® An Overview April
Garland Library Online Orientation. Introduction  This portion of the Online orientation is intended to help library users gain the basic knowledge and.
PubMed/How to Search, Display, Download & (module 4.1)
Department of Computer Science, University of California, Irvine Site Visit for UC Irvine KD-D Project, April 21 st 2004 The Java Universal Network/Graph.
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
By LaBRI – INRIA Information Visualization Team. Tulip 2010 – version Tulip is an information visualization framework dedicated to the analysis.
Bibliometric Analysis with Sci2: Choose Your Own Adventure Laura Ridenour School of Library and Information Science, Indiana University.
Tutorial session 1 Network generation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
Data Science for VIVO Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
PubMed/How to Search, Display, Download & (module 4.1)
Network Analysis using the Network Workbench (NWB) Tool and the Science of Science (Sci2) Tool Ted Polley and Dr. Katy Börner CNS & IVL, SLIS, Indiana.
PubMed Overview From the HINARI Content page, we can access PubMed by clicking on Search inside HINARI full-text using PubMed. Note: If you do not properly.
PubMed/How to Search, Display, Download & (module 4.1)
Part 1 – PubMed Interface, Display options, Saving, Printing, and ing results. Instructions This part of the course is a PowerPoint demonstration.
Social Network Analysis (1) LING 575 Fei Xia 01/04/2011.
OARE Module 5A: Scopus (Elsevier). Table of Contents About Scopus (Elsevier) Using Scopus Search Page Results/Refine Search Pages Download, PDF, Export,
A Graph-based Friend Recommendation System Using Genetic Algorithm
Temporal Analysis using Sci2 Ted Polley and Dr. Katy Börner Cyberinfrastructure for Network Science Center Information Visualization Laboratory School.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 25, 2012.
CellFateScout step- by-step tutorial for a case study Version 0.94.
Special Topics in Educational Data Mining HUDK5199 Spring 2013 March 25, 2012.
Mathematics of Networks (Cont)
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
1. 2 CIShell Features A framework for easy integration of new and existing algorithms written in any programming language. CIShell Sci2 Tool NWB Tool.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Clustering More than Two Million Biomedical Publications Comparing the Accuracies of Nine Text-Based Similarity Approaches Boyack et al. (2011). PLoS ONE.
Disciplinary Maps of Sustainability Science Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information Visualization Laboratory,
Support.ebsco.com The EBSCOhost Android Application Tutorial.
Introduction to KE EMu Unit objectives: Introduction to Windows Use the keyboard and mouse Use the desktop Open, move and resize a.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
PubMed/How to Search, Display, Download & (module 4.1)
Informatics tools in network science
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Reference Management Module I: Introduction By Rehema Chande-Mallya(PhD)
VIVO Social Network Visualizations
Gephi Gephi is a tool for exploring and understanding graphs. Like Photoshop (but for graphs), the user interacts with the representation, manipulate the.
Network analysis.
Google Scholar, ShareLaTeX, and Gephi
Network Science: A Short Introduction i3 Workshop
Google Scholar, OverLeaf, and Gephi
Practical Applications Using igraph in R Roger Stanton
PubMed Database Interface (Basic Course: Module 4)
The Overview Panel on Gephi 0.9.1
PubMed/How to Search, Display, Download & (module 4.1)
Citation databases and social networks for researchers: measuring research impact and disseminating results - exercise Elisavet Koutzamani
PubMed/How to Search, Display, Download & (module 4.1)
Presentation transcript:

Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center Library and Information Science Please download Sci2 at See documentation at 1

This hands-on session introduces topical analysis and visualization of network data. Specifically, we will use the Sci2 tool to extract co-word occurrence networks and to generate science map overlays. Overview 2

The topic or semantic coverage of a unit of science can be derived from the text associated with it. Topical aggregations (e.g., over journal volumes, scientific disciplines, or institutions) are common. Topical analysis extracts the set of unique words or word profiles and their frequency from a text corpus. Stop words, such as 'the' and 'of' are removed, and stemming can be applied.stemming Stemming example: fishing, fished, fish, and fisher to the stem “fish” Introduction to Topic Analysis 3

What is a Network? Graph – network visualized Nodes (vertices) Edges (links) Introduction to Network Analysis 4

Graph Metrics - Nodes Betweenness Centrality – number of shortest paths a node sits between Degree Isolates Introduction to Network Analysis 5

Graph Metrics - Edges Shortest paths – shortest distance between two nodes Weight – strength of tie Directionality – is the connection one-way or two- way (in-degree vs. out-degree)? Bridge – deleting would change structure Introduction to Network Analysis 6

Extracting Word Co-Occurrence Network from SDB Data Word Co-Occurrence Network from the abstracts of articles from MEDLINE with the keyword “mesothelioma” in the title… 7

Extracting Word Co-Occurrence Network from SDB Data If you have registered for the Scholarly Database then go to and login… If you do not have an account type in and nwb for the 8

Extracting Word Co-Occurrence Network from SDB Data Do a keyword search in Title for “mesothelioma” and check MEDLINE… 9

Extracting Word Co-Occurrence Network from SDB Data We will only download the first 1000 results to minimize the runtime for the algorithms used in this workflow. Make sure to check MEDLINE master table since that will have all of the bibliographic data we need for this analysis. Your download limit will initially capped at 2000 records at a time. To increase this limit, please 10

Extracting Word Co-Occurrence Network from SDB Data Save the file somewhere on your computer for use later in this tutorial… 11

Extracting Word Co-Occurrence Network from SDB Data Extract Word Co-Occurrence The topic similarity of basic and aggregate units of science can be calculated via an analysis of the co-occurrence of words in associated texts. Units that share more words in common are assumed to have higher topical overlap and are connected via linkages and/or placed in closer proximity. Extract Word Co-Occurrence Network creates a weighted network where each node is a word and edges connect words to each other, where the strength of an edge represents how often two words occur in the same body of text together. This algorithm is a shortcut for extracting a directed network using Extract Directed Network, and then performing bibliographic coupling using Extract Reference Co-Occurrence (Bibliographic Coupling) Network. Extract Directed NetworkExtract Reference Co-Occurrence (Bibliographic Coupling) Network 12

Open Sci2 and load the MEDLINE_master_table.csv file as a Standard CSV file… Extracting Word Co-Occurrence Network from SDB Data 13

Normalize the titles by running Preprocessing > Topical > Lowercase, Tokenize, Stem, and Stopword Text and select “Abstract”…Preprocessing > Topical > Lowercase, Tokenize, Stem, and Stopword Text Extracting Word Co-Occurrence Network from SDB Data 14

Extracting Word Co-Occurrence Network from SDB Data Run Extract Word Co-Occurrence Network and set the parameters as shown below…Extract Word Co-Occurrence Network 15

Extracting Word Co-Occurrence Network from SDB Data To see more information about your network run Analysis > Networks > Network Analysis Toolkit…Analysis > Networks > Network Analysis Toolkit 16

The results from Network Analysis Toolkit show that there are 579 isolated nodes… Extracting Word Co-Occurrence Network from SDB Data This graph claims to be undirected. Nodes: 4846 Isolated nodes: 1000 Node attributes present: label, references Edges: No self loops were discovered. No parallel edges were discovered. Edge attributes: Did not detect any nonnumeric attributes. Numeric attributes: minmaxmean weight This network seems to be valued. Average degree: This graph is not weakly connected. There are 1001 weakly connected components. (1000 isolates) The largest connected component consists of 3846 nodes. Did not calculate strong connectedness because this graph was not directed. Density (disregarding weights): Additional Densities by Numeric Attribute 17

Extracting Word Co-Occurrence Network from SDB Data 18 Delete the isolate nodes by running Preprocessing > Networks > Delete IsolatesPreprocessing > Networks > Delete Isolates

Extracting Word Co-Occurrence Network from SDB Data Apply Visualization > Networks > DrL (VxOrd) and words that are similar will be plotted relatively close to each other. Set the parameters to those shown below…Visualization > Networks > DrL (VxOrd) 19

Extracting Word Co-Occurrence Network from SDB Data Laying out the network with Drl (VxOrd) may take some time, but once the algorithm is complete you will want to keep only the strongest edges, so select the “Laid out with DrL” and run Preprocessing > Networks > Extract Top Edges using the parameters shown below…Preprocessing > Networks > Extract Top Edges 20

Extracting Word Co-Occurrence Network from SDB Data Once edges have been removed, the network "top 1000 edges by weight" can be visualized by running Visualization > Networks > GUESS…Visualization > Networks > GUESS 21

In order to make use of the DrL (VxOrd) force directed layout we applied, we need to change to the interpreter at the bottom of the screen and type in the following commands… Extracting Word Co-Occurrence Network from SDB Data 22

Note, GUESS will not necessarily display the graph in the middle of the screen, you may have to scroll around the screen to find the graph. Extracting Word Co-Occurrence Network from SDB Data 23

Overlaying ISI Data on the Map of Science via Journals The Map of Science is a visual representation of 554 sub-disciplines within 13 disciplines of science and their relationships to one another, shown as points and lines connecting those points respectively. Over top this visualization is drawn the result of mapping a dataset's journals to the underlying sub- discipline(s) those journals contain. Mapped sub-disciplines are shown with size relative to the number matching journals and color from the discipline. For more information on maps of science, see As of the Sci2 v1.0 alpha release there is a plugin for Sci2 that allows users to visualize their own data overlaid on the Map of Science 24

Load the FourNetSciResearchers.isi file in the ISI flat format… Overlaying ISI Data on the Map of Science via Journals 25

Overlaying ISI Data on the Map of Science via Journals To visualize dataset overlaid on the Map of Science run Visualization > Topical > Map of Science via JournalsVisualization > Topical > Map of Science via Journals 26

Overlaying ISI Data on the Map of Science via Journals 27

The journals titles are used to determine which records fit into what subdiscipline. You can view the journal titles found and those not found from the data manager of Sci2. A single journal can belong to more than one subdiscipline and thus so can the record associated with that journal. So the circle sizes are proportional to the number of fractionally assigned records. Overlaying ISI Data on the Map of Science via Journals 28

29 Questions?