Katy IUB2003.01.17 1 Katy IUB2003.01.17 2 Analysis and Visualization of Knowledge Domains Draws on Research in Information.

Slides:



Advertisements
Similar presentations
Non-Academic References
Advertisements

Usage statistics in context - panel discussion on understanding usage, measuring success Peter Shepherd Project Director COUNTER AAP/PSP 9 February 2005.
Technology Roadmap Project Harold Flescher VP-Elect, Technical Activities August 2008, Region 1 Meeting.
Trends in Conceptual Modeling: Citation Analysis of the ER Conference Papers ( ) Chaomei Chen, Il-Yeol Song, Weizhong Zhu
Scopus. Agenda Scopus Introduction Online Demonstration Personal Profile Set-up Research Evaluation Tools -Author Identifier, Find Unmatched Authors,
April 21, Analysis of Infovis 2004 Contest Data: A Survey & Analysis Chia-Ning Chiang Ron Jung-Rung Han April 21, 2004.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
Search Engines and Information Retrieval
Shou Ray Information Service Co., Ltd.
Disasters and Human Factors Literature Nestor L Osorio Northern Illinois University.
1. Scopus Update November 2004 American University of Beirut Presented by:Amanda Hart Date: 11 November 2004.
Evaluation of the Scientific Production in Portuguese Medical Schools.
1 Scopus Update 15 Th Pan-Hellenic Academic Libraries Conference, November 3rd,2006 Patras, Greece Eduardo Ramos
Bibliometrics in Libraries 12 th September 2013 Alain Frey Strategic Business Manager
Mapping Interdisciplinary Research Domains Katy Börner School of Library and Information Science Presentation at the Parmenides Center.
Guillaume Rivalle APRIL 2014 MEASURE YOUR RESEARCH PERFORMANCE WITH INCITES.
SCOPUS AND SCIVAL EVALUATION AND PROMOTION OF UKRAINIAN RESEARCH RESULTS PIOTR GOŁKIEWICZ PRODUCT SALES MANAGER, CENTRAL AND EASTERN EUROPE KIEV, 31 JANUARY.
Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center.
SciTech Strategies, Inc. BETTER MAPS BETTER DECISIONS Science Mapping and Applications: Choices and Trade-offs Kevin W. Boyack, SciTech Strategies Standards.
China’s Scientific Data Sharing Initiatives and Future Perspective Pro. Peng, Jie Dr. Liu, Runda 5 March 2012,
The Web of Science database bibliometrics and alternative metrics
Welcome to Scopus Training by : Arash Nikyar June 2014
Search Engines and Information Retrieval Chapter 1.
The Latest in Information Technology for Research Universities.
IL Step 1: Sources of Information Information Literacy 1.
THOMSON SCIENTIFIC Web of Science Using the specialized search and analyze features Jackie Stapleton, librarian Fall 2006.
Rajesh Singh Deputy Librarian University of Delhi Measuring Research Output.
1 How to find literature - A very short introduction SMED 8004 Medicine and Health Library October 2014.
SCOPUS AND SCIVAL EVALUATION AND PROMOTION OF UKRAINIAN RESEARCH RESULTS PIOTR GOŁKIEWICZ PRODUCT SALES MANAGER, CENTRAL AND EASTERN EUROPE LVIV, 11 SEPTEMBER.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
Creative Metaphors to Stimulate New Approaches to Visualizing, Understanding, and Rethinking Large Repositories of Scholarly Data Dr. Katy Börner Cyberinfrastructure.
Computational Scientometrics Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information Visualization Laboratory, Director School.
Computational Scientometrics: Mapping the Structure and Evolution of Science Katy Börner & the InfoVis Lab School of Library and Information Science.
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
The Scholarly Database and Its Utility for Scientometrics Research Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information.
Media Arts and Technology Graduate Program UC Santa Barbara MAT 259 Visualizing Information Winter 2006George Legrady1 MAT 259 Visualizing Information.
1 CS 430: Information Discovery Lecture 25 Cluster Analysis 2 Thesaurus Construction.
Science Standards Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information Visualization Laboratory, Director School of Library.
Katy Börner, Knowledge Domain Visualizations in Support of Scholarly Knowledge and Expertise Management, SRI International, Oct 21,
Database collection evaluation An application of evaluative methods S519.
SciVal Spotlight Training for KU Huiling Ng, SciVal Product Sales Manager (South East Asia) Cassandra Teo, Account Manager (South East Asia) June 2013.
RESEARCH – DOING AND ANALYSING Gavin Coney Thomson Reuters May 2009.
Information Visualizations that Improve Access to Scholarly Knowledge and Expertise Katy Börner School of Library and Information Science
Managing Humanity’s Knowledge and Expertise: The InfoVis Cyberinfrastructure Katy Börner & the InfoVis Lab School of Library and Information Science
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Computational Scientometrics: Mapping the Structure and Evolution of Science Katy Börner & the InfoVis Lab School of Library and Information Science Indiana.
Information Visualization, Human-Computer Interaction, and Cognitive Psychology: Domain Visualizations Kevin W. Boyack Sandia National Laboratories.
Building a Multi-Year Database of AAG Conference Abstracts André Skupin /Shujing Shu Dept. of Geography / Dept. of Computer Science University of New Orleans.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
Katy Börner Teaching & Research Teaching & Research Katy Börner
MARKO ZOVKO, ACCOUNT MANAGER STEPHEN SMITH, SOLUTIONS SPECIALIST JOURNALS & HIGHLY-CITED DATA IN INCITES V. OLD JOURNAL CITATION REPORTS. WHAT MORE AM.
1 CS 430: Information Discovery Lecture 28 (a) Two Examples of Cluster Analysis (b) Conclusion.
Publication Pattern of CA-A Cancer Journal for Clinician Hsin Chen 1 *, Yee-Shuan Lee 2 and Yuh-Shan Ho 1# 1 School of Public Health, Taipei Medical University.
THE BIBLIOMETRIC INDICATORS. BIBLIOMETRIC INDICATORS COMPARING ‘LIKE TO LIKE’ Productivity And Impact Productivity And Impact Normalization Top Performance.
Sul-Ah Ahn and Youngim Jung * Korea Institute of Science and Technology Information Daejeon, Republic of Korea { snowy; * Corresponding Author: acorn
Demonstration: Tools for large scale bibliometric analysis André Somers | 1 June 25, 2009.
INTRODUCTION TO BIBLIOMETRICS 1. History Terminology Uses 2.
Data Mining for Expertise: Using Scopus to Create Lists of Experts for U.S. Department of Education Discretionary Grant Programs Good afternoon, my name.
It’s not about searching…. It’s about finding.
Demonstrating Scholarly Impact: Metrics, Tools and Trends
By: Azrul Abdullah Waeibrorheem Waemustafa Hamdan Mat Isa Universiti Teknologi Mara, Perlis Branch, Arau Campus SEFB, Universiti Utara, Malaysia Disclosure.
Data Analyzing Artificial Intelligence (AI)
TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation.
Introduction of KNS55 Platform
Indication of Publication Pattern of Scientometrics
CHAPTER 7: Information Visualization
Accessing journals by Language 4
Bird of Feather Session
Presentation transcript:

Katy IUB

Katy IUB Analysis and Visualization of Knowledge Domains Draws on Research in Information Retrieval Data Analysis/Data Mining/Knowledge Discovery Bibliometrics/Scientometrics/Webometrics Information Visualization & Interaction Design Visual Perception, Human-Computer Interaction Philosophy of Science (for interpretation) Knowledge Elicitation (for evaluation) My main research area is Information Visualization and more recently the Visualization of Knowledge Domains. Introduction

Katy IUB Analysis and Visualization of Knowledge Domains Requires large amounts of, e.g., publication, patent, and grant data and advanced analysis and visualization techniques. Can be utilized to objectively identify major research areas, experts, institutions, grants, publications, journals, etc.; to identify interconnections, the import and export of research between fields, the dynamics (speed of growth, diversification) of scientific fields, scientific and social networks, and the impact of strategic and applied research funding programs, among others.

Katy IUB Knowledge Domain Visualizations in Education [4.75] Provide an overview of the knowledge domain areas covered in the course (road map). [4.25] Show the interrelationship of the course topic to other related knowledge domains. [3.25] Identify major experts and their fields of expertise. [ 3.5 ] Identify major publications, what they cite, and who cites them. [4.25] Show the influence of one theory on subsequent work. [3.75] Show the evolution of the knowledge domain (based on publications) over the last, say 10 years. [ 4.0 ] Show research frontiers (e.g., based on areas containing young but highly cited papers). [ 4.5 ] Find research related to a certain topic, e.g., learning. [3.75] Identify material to be covered in a talk, course, textbook, Encyclopedia, etc.

Katy IUB Domain Visualizations are Facilitated by The explosion of information available digitally. Decreasing cost of storage and computing power. Larger hard disk sizes easing fast access to data. Fast graphics processors. High resolution color monitors. Expanding connectivity between systems. The mismatch between computer displays and the human perceptual system. The mismatch between computer controls and human motor functions.

Katy IUB Sample Applications "Science of science" - to study science with scientific means (Price 1965). Synthesis of specialty narratives from co-citation clusters (Small, 1986). Detect advances of scientific knowledge via "longitudinal mapping" (Garfield, 1994). Knowledge discovery in un-connected literature (Swanson & Smalheiser, 1997). Identify cross-disciplinary fertilization via "passages through science" (Small, 1999, 2000). Understand scholarly information foraging (Sandstrom, 2001).

Katy IUB Determine areas of expertise for specific researcher, research group via "invisible colleges" (note that researchers self definition might differ from how field defines him/her) (Crane, 1972). Identify profiles of authors, also called CAMEOS, to be used to for document retrieval or to map an author’s subject matter and studying his/her publishing career, or to map the social and intellectual networks evident in citations to and from authors and in co-authorships (White, 2001). Learn how to write highly cited papers (van Dalen & Henkens, 2001).

Katy IUB Identification of scientific frontiers frontiers.com/. frontiers.com/ ISI's Essential Science Indicators Import-export studies (Stigler, 1994). Evaluation of 'big science' facilities using 'converging partial indicators' (Martin, 1996; Martin & Irvine, 1983). Input (levels of funding, expertise of scientists, facilities used) - output (publications, patents, Nobel prices, improved health, reduced environment insults, etc. - influenced by political, economic, financial, and legal factors studies (Kostroff & DelRio, 2001). Determine influence of funding on research output (Boyack & Borner, 2002).

Katy IUB Visualizing Knowledge Domains The content and figures in this part were taken from: Katy Börner, Chaomei Chen, & Kevin Boyack (2003) Visualizing Knowledge Domains. In Blaise Cronin (Editor). Annual Review of Information Science and Technology. Volume 37. Medford, NJ: Information Today, Inc./American Society for Information Science and Technology. Pp (published November, 2002). Color versions of many images can be found at res.html res.html

Katy IUB The paper aims (1) To give a literature review of research on 'visualizing knowledge domains' by applying knowledge domain visualization techniques to analyze and visualize this domain. (2) To provide a tutorial on how to design visualizations of knowledge domains. Unique feature: Utilizes ARIST data set to compare different approaches.

Katy IUB Design=Analysis+Visualization+Interaction

Katy IUB SEARCH TERM USEDNumber of matching articles Topic Citation Analysis: citation analysis596 cocitation OR co-citation177 co-occurrence AND (term OR word)77 co-term OR co-word52 science map[ping] OR mapping science OR map[ping] of science32 Topic Semantics: semantic analysis OR semantic index OR semantic map331 Topic Bibliometrics: bibliometric818 scientometric327 Topic Visualization: data visualization OR visualization of data275 information visualization OR visualization of information113 scientific visualization268 Retrieved from Science Citation Index (SCI) and Social Science Citation Index (SSCI). The 2764 unique articles match citation analysis, semantics, bibliometrics, visualization related terms in titles, abstracts, and terms for the years 1977-July 27, The ARIST Data Set

Katy IUB Numbers of articles in the ARIST data set by year with terms (ISI keywords) or abstracts.

Katy IUB Number of articles by journal in the ARIST set (10 or more articles per journal)

Katy IUB Figure shows dramatic increase in publishing in citation analysis and bibliometrics starting in the late 80s and the birth of IV around the same time. Citation counts dropped throughout the 90s. Most recent articles are cited infrequently due to their young age.

Katy IUB The Importance of Good Data It is extremely important to choose an appropriate data source for retrieval, one whose data are likely to provide answers to the questions one wishes to answer using domain visualization. Limitations of the ARIST Data Set No abstracts or terms prior to Terms are available for only 71%. Abstracts are available for 81% of the articles published since Limited book, journal, conference coverage. No patents, policy changes, media coverage, Nobel prices, quality of graduate programs, …

Katy IUB Structure of Knowledge Domain Visualization Research (based on the ARIST data set) Three different kinds of visualizations: 1. GSA/StarWalker use Principal Component Analysis to break down domain into components. 2. ET-Maps and Cartographic Self Organizing Maps display overall domain structure as adjacent regions. 3. VxInsight uses a modified Force Directed Placement algorithm named VxOrd to display a ‘data landscape’. The different visualizations provide different views of the domain and enable a comparison of algorithms.

Katy IUB (1) GSA/StarWalker Author co-citation analysis Document co-citation analysis Procedure: Select a set of highly cited authors/documents (at least 10 citations). Compute co-citation frequencies. Apply Pathfinder Network Scaling to determine interconnectivity structure. Apply factor analysis to define intellectual groupings (e.g. mapping science, social studies of science, bibliometrics) Determine and display citation impact factor atop the intellectual groupings.

Katy IUB The Author Co-citation Map ( ) consists of 380 authors with 9 or more citations. The map is dominated by the largest specialty of citation indexing. No strong concentration of other specialties are found, which implies the diversity of the domain. Color code: red - mapping science green – social studies of science Blue – bibliometrics The three factors cummulatively explain 63% of the variance

Katy IUB Landscape View of Author Co-citation Map The height of a citation bar indicates the number of citations for the correspondent author. The spectrum of colors on each citation shows the time when citations were made. Authors with more than 50 citations are displayed with semi-transparent labels.

Katy IUB The Document Co-citation Analysis Map The height of a bar represents the number of citations to a publication. Labels indicate articles in clusters, for example, Small73 for an article of Small in Multiple publications within the same year are not distinguished at this level. For example, Small73 includes all Small ’ s publications in 1973.

Katy IUB The Document Co-citation Analysis Map Top-down view. Hand labeling of major clusters. Color code: red - mapping science green – social studies of science Blue – bibliometrics

Katy IUB (2a) ET Map of ARIST Data Set by Bin Zhu and Hsinchun Chen, U Arizona Trained 10x 10 nodes using ID/keyword data of the ARIST data set. After training, each node is associated with a list of documents that are semantically similar to each other. Each document list is labeled by the most frequently occurring keyword Spatial proximity on the map indicates semantic proximity.

Katy IUB

Katy IUB The size of the subject area is not necessarily related to the number of documents in an ET-map, but rather denotes the amount of space between areas based on the number of nodes used to generate the map.

Katy IUB (2b) SOM Map of ARIST Data by Andre Skupin, U New Orleans SOM are used to generate domain visualizations in cartographic fashion. 40 x 55 node SOM was trained based on ID/keyword list of ARIST data set. ArcGIS is used to generate the visualization. Dominance of clusters corresponds to number of articles it contains. Higher elevation—i.e., percentage—indicates a very organized, focused, and coherent portion of the information space. Labels are automatically assigned based on highly frequent keywords and are drawn within ArcGIS.

Katy IUB SOM Map of ARIST Data by Andre Skupin

Katy IUB (3) VxInsight, Sandia National Labs Next slides show: VxInsight citation maps of ARIST data for four different time segments. VxInsight co-term and LSA maps of ARIST data. VxInsight co-classification map of ARIST data. Comparison of maps. Dot color legend WHITE: citation analysis, GREEN: bibliometrics, BLUE: semantics, MAGENTA: visualization.

Katy IUB VxInsight Interface

Katy IUB VxInsight citation maps of ARIST data for four different time segments. A citation-based map using direct and co-citation linkages after the combined linkage method of Small (1997) using a direct:cocitation weighting factor of 20:1. Dot color legend WHITE: citation analysis, GREEN: bibliometrics, BLUE: semantics, MAGENTA: visualization. Shows growth of different areas

Katy IUB A: bibliometrics B: visualization C: semantic analysis D: citation analysis, bibliometrics, visualization are mixed

Katy IUB Dot color legend WHITE: citation analysis, GREEN: bibliometrics, BLUE: semantics, MAGENTA: visualization. 2. VxInsight co-term and LSA maps of ARIST data Co-term map is based on a cosine similarity using ISI keywords. LSA was applied over title words to generate a document-by-document similarity matrix. Only similarity values > 0.9 were used in VxOrd FDP to generate the map.

Katy IUB VxInsight co-classification map of ARIST data based a cosine similarity from the ISI journal classifications for each article. Dot color legend WHITE: citation analysis, GREEN: bibliometrics, BLUE: semantics, MAGENTA: visualization.

Katy IUB A: Cartographic-SOM B: ET-Map C: Co-term D: LSA 4. VxInsight: Comparison of Maps Dot color legend WHITE: citation analysis GREEN: bibliometrics, BLUE: semantics MAGENTA: visualization.

Katy IUB Obvious visual differences between layouts. SOMs tend to fill space more uniformly. Citation analysis (yellow) and bibliometrics (green) are always found together. Visualization (magenta) and semantics (blue) are mostly by themselves.

Katy IUB Strong co-term linkages based on cosine similarity for the three term-based document maps.

Katy IUB Conclusions Research on KDVs grows out of semantic analysis/indexing/mapping, citation analysis, bibliometrics, and visualization. There is interaction between the groups of researchers and their literature in citation analysis and bibliometrics. Visualization and semantics are mostly by themselves. KDVs could be beneficially used by diverse user groups and in diverse information seeking tasks but their design and usage is limited due to Availability of data and scalable code. Required processing power. Complexity/Usability of today's KDVs. Research Collaboration: TOPIC model by Griffiths and Steyvers is incremental, scalable, & generative. It produces good labels and can hopefully be applied recursively to generate maps of different resolutions.

Katy IUB Acknowledgements We greatly appreciate the time and effort Bin Zhu, Hsinchun Chen, and André Skupin put into the generation, discussion and comparison of the ET-Map and Cartographic SOM map. We wish to thank Katherine W. McCain, Blaise Cronin, Ralf Shaw, Henry Small, and Pamela Sandstrom for her very insightful comments. Ben Shneiderman and Alan Porter commented on an earlier version of the ARIST chapter. We gratefully acknowledge support for this work by The Council for Museums Archives and Libraries in the UK (RE/089), Laboratory Directed Research and Development, Sandia National Laboratories, U.S. Department of Energy (DE-AC04-94AL85000), and an NIH/NIA demonstration fund for Mapping Aging Research.

Katy IUB

Katy IUB Challenges and Opportunities Top Ten List of Challenges (adopted from Chen 2002) 1. Domain Specific vs. Domain Independent - how much domain knowledge is needed to do the analysis? 2. Quality vs. Timeliness - quality comes from collective expert views expressed in quickly outdated publications. 3. Interdisciplinary Nature - many areas contribute to the analysis, visualization and interpretation of KDVs. 4. Validation - understanding the strength and weaknesses of different techniques. 5. Design Metaphor - what metaphors are most effective? 6. Coverage - expand citation indexing databases to cover proceedings, technical reports, etc.

Katy IUB Scale-up - algorithmically, but also in terms of design & validation. 8. Automatic Labeling - requires proper classification & categorization. 9. Individual Differences 10. Ethical Constraints - KDVs enable to understand scientific networks, the influence of scholars, etc. hence to quickly gain the knowledge that distinguishes an expert from a newcomer.

Katy IUB In Work: IV/IR Computing Infrastructure at IU 1TB data space connected to parallel computing facilities running diverse data analysis, retrieval, and visualization services. Online data, computing and services access for researchers, educators, and society. Will facilitate sharing of code and (derivative) data & the generation of IVs/KDVs from large scale data sets.

Katy IUB About 4,500,000 documents (books, journals, proceedings, doctoral and masters theses, technical reports, patents, grants) covering cross-disciplinary research as well as domain specific documents from Computer Science, Physics, Mathematics, and Medicine.

Katy IUB Opportunities Knowledge domain visualizations can Become a valuable tool for scientists, philosophers of science, sociologists of knowledge, librarians, government agencies, others to grasp crucial developments in science and technology. Help discover topical relationships, research trends, complementary capabilities thereby facilitating research. Help translate among disciplines, bridge the gap between mutually unintelligible jargons. Study science using the scientific methods of science as suggested by Derek J. deSolla Price.

Katy IUB Related Conferences, Symposia, … International Symposium on Knowledge Domain Visualization IV02-KDViz at IV 2003, London UK, July 16-18, demon.co.uk/IV02/KDViz.htmhttp:// demon.co.uk/IV02/KDViz.htm International Conference on Scientometrics and Informetrics, Beijing, P.R. of China, Sackler Colloquium on Mapping Knowledge Domains, NAS' Beckman Center, Irvine, CA, May 9-11,

Katy IUB

Katy IUB

Katy IUB Common Database A major goal of the colloquium is to demonstrate and compare different techniques, algorithms, and approaches that can be utilized to map knowledge domains. Registered participants are eligible to utilize the so called PNAS Data Set. The data set comprises full text documents from the Proceedings of the National Academy of Sciences covering to (148 issues containing some 93,000 journal pages). Paper Submission 2-page abstract by March 1st, Student packages are available (registration fee plus $500 support).

Katy IUB

Katy IUB Question & Answer Session

Katy IUB Springer will publish a selected set of extended papers of the 2001 and 2002 JCDL workshop on "Visual Interfaces to Digital Libraries" in its Lecture Notes in Computer Science (LNCS) series. The edited book aims to provide a comprehensive coverage of the topic to a wider audience.

Katy IUB