Visual analytic tools for monitoring and understanding the emergence and evolution of innovations in science & technology Links from this talk: bit.ly/stmwant.

Slides:



Advertisements
Similar presentations
Open repositories: value added services The Socionet example Sergey Parinov, CEMI RAS and euroCRIS.
Advertisements

Assessing and Increasing the Impact of Research at the National Institute of Standards and Technology Susan Makar, Stacy Bruss, and Amanda Malanowski NIST.
Supporting End-User Access
Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Group-In-a-Box Layout for Multi-faceted Analysis of Communities Eduarda Rodrigues, Natasa Milic-Frayling, Marc A. Smith, Ben Shneiderman & Derek Hansen.
iOpener Workbench: Tools for Rapid Understanding of Scientific Literature Cody Dunne, Ben Shneiderman, Bonnie Dorr & Judith Klavans {cdunne, ben,
Evaluating Visual and Statistical Exploration of Scientific Literature Networks Robert Gove 1,3, Cody Dunne 1,3, Ben Shneiderman 1,3, Judith Klavans 2,
Interactive Data Visualization for Rapid Understanding of Scientific Literature Cody Dunne Dept. of Computer Science and Human-Computer Interaction Lab,
What Researchers Want Cody Dunne Dept. of Computer Science and Human-Computer Interaction Lab, University of Maryland STM 3 rd Master.
Managing Data Resources
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Chapter 3 Database Management
DECISION SUPPORT SYSTEMS AND BUSINESS INTELLIGENCE
Readability Metrics for Network Visualization Cody Dunne and Ben Shneiderman Human-Computer Interaction Lab & Department of Computer Science University.
Massive Data Analysis Lab (MassDAL) S. Muthukrishnan CS Dept.
Cody Dunne, Pengyi Zhang, Chen Huang, Jia Sun, Ben Shneiderman, Ping Wang & Yan Qu {cdunne, {pengyi, chhuang, jsun, pwang,
Business Intelligence Andrew Davis Andria Zippler Jana Krinsky Tiffany Ferris.
Panel on New Research Directions in KDD Ted E. Senator Disclaimer: Views are My Own, not necessarily those of DARPA, Department.
Knowledge Portals and Knowledge Management Tools
© 2013 Association for Computing Machinery Honeywell Introduction to the ACM Digital Library January 16, 2013 Honeywell Introduction to the ACM Digital.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Please use the following two slides as a template for your presentation at NES. Analyze, Visualize & Socialize to Integrate Your Property Management System.
Web Archives, IDEAL, and PBL Overview Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science Virginia Tech Blacksburg, VA, USA 21.
SRS Data and the SciSIP Initiative National Science Foundation Division of Science Resources Statistics Lynda T. Carlson Division Director SBE Advisory.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Analyzing Trends in Science & Technology Innovation Ben Shneiderman, Ping Wang, Yan Qu & Cody Dunne {pwang, yanqu, University.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Introducing Reporting Services for SQL Server 2005.
Bibliometrics and Impact Analyses at the National Institute of Standards and Technology Stacy Bruss and Susan Makar Research Librarians SLA Pharmaceutical.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
Dr. David Mowat June 22, 2005 Federal, Provincial & Local Roles Surveillance of Risk Factors and Determinants of Chronic Diseases.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Readability Metrics for Network Visualization Cody Dunne and Ben Shneiderman Human-Computer Interaction Lab & Department of Computer Science University.
Data Warehousing Data Mining Privacy. Reading Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer Extended RBAC-design and implementation.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Mapping New Strategies: National Science Foundation J. HicksNew York Academy of Sciences4 April 2006 Examples from our daily life at NSF Vision Opportunities.
Reporting and Analysis With Microsoft Office. Reporting and Analysis Business User Reporting & Analysis OLAP Data Warehouse.
Organizing Data and Information
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Foundations of Business Intelligence: Databases and Information Management.
Pertemuan 16 Materi : Buku Wajib & Sumber Materi :
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
Chapter 1 DECISION SUPPORT SYSTEMS AND BUSINESS INTELLIGENCE Skip subsections: 1.1, 1.2, 1.8, 1.10.
ISQS 3358, Business Intelligence Anatomy of Business Intelligence Zhangxi Lin Texas Tech University 1.
Business Intelligence Overview. What is Business Intelligence? Business Intelligence is the processes, technologies, and tools that help us change data.
Foundations of information systems : BIS 1202 Lecture 4: Database Systems and Business Intelligence.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Managing Data Resources File Organization and databases for business information systems.
INTRODUCTION TO BIBLIOMETRICS 1. History Terminology Uses 2.
Intro to MIS – MGS351 Databases and Data Warehouses
Ricardo EIto Brun Strasbourg, 5 Nov 2015
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
Data Mining – Intro.
Reporting and Analysis With Microsoft Office
Fundamentals of Information Systems
Data Mining: Concepts and Techniques Course Outline
STM Annual Spring Conference 2011
MANAGING DATA RESOURCES
Data Warehousing and Data Mining
C.U.SHAH COLLEGE OF ENG. & TECH.
Supporting End-User Access
CHAPTER 7: Information Visualization
Chapter 3 Database Management
Citation databases and social networks for researchers: measuring research impact and disseminating results - exercise Elisavet Koutzamani
Presentation transcript:

Visual analytic tools for monitoring and understanding the emergence and evolution of innovations in science & technology Links from this talk: bit.ly/stmwant Cody Dunne Dept. of Computer Science and Human-Computer Interaction Lab, University of Maryland cdunne@cs.umd.edu OECD KNOWINNO Workshop November 14-15, 2011 Alexandria, VA, USA QR Codes from: http://chart.apis.google.com/chart?chs=400x400&cht=qr&chl=http://bit.ly/stmwant

Outline Academic literature exploration Case study: Tree visualization techniques Case study: Business intelligence news Case study: Pennsylvania innovations STICK approach

1. Academic literature exploration Users are looking for: Foundations Emerging research topics State of the art/open problems Collaborations & relationships between Communities Field evolution Easily understandable surveys

Action Science Explorer

User requirements Control over the paper collection Choose custom subset via query, then iteratively drill down, filter, & refine Overview either as visualization or text statistics Orient within subset Easy to understand metrics for identifying interesting papers Ranking & filtering Create groups & annotate with findings Organize discovery process Share results

Action Science Explorer Bibliometric lexical link mining to create a citation network and citation context Network clustering and multi-document summarization to extract key points Potent network analysis and visualization tools www.cs.umd.edu/hcil/ase

2. Case study: Tree visualization Problem: Traditional 2D node-link diagrams of trees become too large Solutions: Treemaps: Nested Rectangles Cone Trees: 3D Interactive Animations Hyperbolic Trees: Focus + Context Measures: Papers, articles, patents, citations,… Press releases, blog posts, tweets,… Users, downloads, sales,…

Treemaps: nested rectangles www.cs.umd.edu/hcil/treemap-history

Smartmoney MarketMap Feb 27, 2007 Green is A&P trading company, which bought Pathmark chain of stores. Up 6% that day. smartmoney.com/marketmap

Cone trees: 3D interactive animations Robertson, G. G., Card, S. K., and Mackinlay, J. D., Information visualization using 3D interactive animation, Communications of the ACM, 36, 4 (1993), 51-71. Robertson, G. G., Mackinlay, J. D., and Card, S. K., Cone trees: Animated 3D visualizations of hierarchical information, Proc. ACM SIGCHI Conference on Human Factors in Computing Systems, ACM Press, New York, (April 1991), 189-194.

Hyperbolic trees: focus & context Lamping, J. and Rao, R., Laying out and visualizing large trees using a hyper-bolic space, Proc. 7th Annual ACM symposium on User Interface Software and Technology, ACM Press, New York (1994), 13-14. Lamping, J., Rao, R., and Pirolli, P., A focus+context technique based on hy-perbolic geometry for visualizing large hierarchies, Proc. SIGCHI Conference on Human Factors in Computing Systems, ACM Press, New York (1995), 401-408.

Tree visualization publishing TM=Treemaps CT=Cone Trees HT=Hyperbolic Trees Trade Press Articles Academic Papers Patents

Tree visualization citations TM=Treemaps CT=Cone Trees HT=Hyperbolic Trees Academic Papers Patents

Insights Emerging ideas may benefit from open access Compelling demonstrations with familiar applications help Many components to commercial success 2D visualizations w/spatial stability successful Term disambiguation & data cleaning are hard Shneiderman, B., Dunne, C., Sharma, P. & Wang, P. (2011), "Innovation trajectories for information visualizations: Comparing treemaps, cone trees, and hyperbolic trees", Information Visualization. http://www.cs.umd.edu/localphp/hcil/tech-reports-search.php?number=2010-16

3. Case study: Business intelligence news Proquest 2000-2009 Term Frequency   hyperion 3122 decision support system 39 data mining 889 business process reengineering 36 business intelligence 434 data mart 29 knowledge mgmt. 221 business analytics 21 data warehouse 207 text mining 19 data warehousing 139 predictive analytics 18 cognos 112 business performance mgmt 6 competitive intelligence 86 online analytical processing 5 electronic data itrch. 69 knowledge discovery in database 1 meta data ad hoc query

PQ Business Intelligence 2000-2009 Co-occurrence of concepts with organizations Data Mining National Security Agency NSA White House FBI AT&T American Civil Liberties Union Electronic Frontier Foundation Dept. of Homeland Security CIA Frequency Year

PQ Business Intelligence 2000-2009 Co-occurrence of organizations NSA Natl. Security Agency White House AT&T FBI EFF ACLU CIA Pentagon Frequency Year

Business Intelligence 2000-2009 Matrix showing Co-Occurrence of concepts and orgs.

Business Intelligence 2000-2009: (subset)

Business Intelligence 2000-2009: Data mining NSA CIA FBI White House Pentagon DOD DHS AT&T ACLU EFF Senate Judiciary Committee

Business Intelligence 2000-2009: Tech1 Google Yahoo Stanford Apple Tech2 IBM, Cognos Microsoft Oracle Finance NASDAQ NYSE SEC NCR MicroStrategy

Business Intelligence 2000-2009: Air Force Army Navy GSA UMD*

Insights Useful groupings in PQ BI terms based on events and long-term collaborators Interactive line charts useful for looking at co-occurrence relationships over time Clustered heatmaps useful for overall co-occurrence relationships stick.ischool.umd.edu

4. Case study: Pennsylvania innovations Innovation relationships during 1990 State & federal funding Patents (both strong and weak ties) Location Connecting State & federal agencies Universities Firms Inventors

Patent Tech SBIR (federal) PA DCED (state) Related patent 2: Federal agency 3: Enterprise 5: Inventors 9: Universities 10: PA DCED 11/12: Phil/Pitt metro cnty 13-15: Semi-rural/rural cnty 17: Foreign countries 19: Other states

Patent Tech SBIR (federal) PA DCED (state) Related patent 2: Federal agency 3: Enterprise 5: Inventors 9: Universities 10: PA DCED 11/12: Phil/Pitt metro cnty 13-15: Semi-rural/rural cnty 17: Foreign countries 19: Other states

Pharmaceutical/Medical No Location Philadelphia Patent Tech Navy SBIR (federal) PA DCED (state) Related patent 2: Federal agency Pharmaceutical/Medical 3: Enterprise Pittsburgh Metro 5: Inventors 9: Universities 10: PA DCED 11/12: Phil/Pitt metro cnty 13-15: Semi-rural/rural cnty 17: Foreign countries Westinghouse Electric 19: Other states

Pharmaceutical/Medical No Location Philadelphia Patent Tech Navy SBIR (federal) PA DCED (state) Related patent 2: Federal agency Pharmaceutical/Medical 3: Enterprise Pittsburgh Metro 5: Inventors 9: Universities 10: PA DCED 11/12: Phil/Pitt metro cnty 13-15: Semi-rural/rural cnty 17: Foreign countries Westinghouse Electric 19: Other states

Insights Meta-layouts useful for showing: User comments Groups (clusters, attributes, manual) Relationships between them User comments “We've never been able to see anything like this“ “This is going to be huge" www.terpconnect.umd.edu/~dempy/

5. STICK approach NSF SciSIP Program The STICK Project Science of Science & Innovation Policy Goal: Scientific approach to science policy The STICK Project Science & Technology Innovation Concept Knowledge-base Goal: Monitoring, Understanding, and Advancing the (R)Evolution of Science & Technology Innovations

STICK approach cont… Scientific, data-driven way to track innovations Vs. current expert-based, time consuming approaches (e.g., Gartner’s Hype Cycle, tire track diagrams) Includes both concept and product forms Study relationships between Study the innovation ecosystem Organizations & people Both those producing & using innovations stick.ischool.umd.edu

STICK Process (overview) Identify concepts Business intelligence, cloud computing, customer relationship management, health IT, web 2.0, electronic health records, biotech Query data sources Processing Automatic entity recognition Crowd-sourced verification Co-occurrence networks Visualizing & analyzing Overall statistics Network evolution Sharing results News Dissertation Academic Patent Blogs

Process Cleaning Collecting Processing Visualizing & Analyzing Collaborating Cleaning

Collecting Identify Concepts Data Sources Begin with target concepts Business Intelligence Health IT Cloud Computing Customer Relationship Management Web 2.0 Personal Health Records Nanotechnology Develop 20-30 sub concepts from domain experts, wikis News Dissertation Academic Patent Blogs

Collecting (2) Form & Expand Queries Scrape Results ABS( "customer relationship management" OR "customers relationship management" OR "customer relation management" ) OR TEXT(…) OR SUB(…) OR TI(…) Scrape Results

Processing Automatic Entity Recognition Crowd-Sourced Verification BBN IdentiFinder Extract most frequent 25% Assign to CrowdFlower Workers check organization names and sample sentences Each time a MTurker is given a string and the sentence the string appears, then he/she selects the string that he/she think is an organization. Also I do not think we have people's names used in relation extraction right now. So you can also remove 'and people' from the first sentence in my slide if you feel need to, We first extracted the top 25% most frequently appeared organization names, then ask MTurkers to identify if they are correct. To identify correctly formatted organization names, we ask three MTurkers to vote the automatically extracted names and also the sentence contain the name. If at least two MTurkers vote the names as organization names, we collect them as corrected formated names.

Processing (2) CSV GraphML Compute Co-Occurrence Networks Output Overall edge weights Slice by time to see network evolution Output CSV GraphML

Visualizing & Analyzing Spotfire NodeXL Import CSV, Database Standard charts Multiple coordinated views Highly scalable CSV, Spigots, GraphML Automate feature Batch analysis & visualization Excel 2007/2010 template

Shared data & analysis repositories Online Research Community Share data, tools, results Data & analysis downloads Spotfire Web Player Communication Co-creation, co-authoring stick.ischool.umd.edu/community

Ongoing Work Collecting: Additional data sources and queries Processing: Improving entity recognition accuracy Visualizing & Analyzing: Visualizing network evolution Co-occurrence network sliced by time Collaborating: Develop the STICK Open Community site Motivate user participation Improve the resources available Invitation-only testing

Outline Academic literature exploration Citation networks and text summarization Case study: Tree visualization techniques Papers, patents, and trade press articles Case study: Business intelligence news News term co-occurrence Case study: Pennsylvania innovations Patents, funding, and locations STICK approach Tracking innovations across papers, patents, news articles, and blog posts

Take Away Messages Easier scientific, data-driven innovation analysis: Automatic collection & processing of innovation data Easy access to visual analytic tools for finding clusters, trends, outliers Communities for sharing data, tools, & results

Visual analytic tools for monitoring and understanding the emergence and evolution of innovations in science & technology Links from this talk: bit.ly/stmwant Cody Dunne Dept. of Computer Science and Human-Computer Interaction Lab, University of Maryland cdunne@cs.umd.edu This work has been partially supported by NSF grants IIS 0705832 (ASE) and SBE 0915645 (STICK) QR Codes from: http://chart.apis.google.com/chart?chs=400x400&cht=qr&chl=http://bit.ly/stmwant