Data Science for VIVO Dr. Brand Niemann Director and Senior Data Scientist Semantic Community

Slides:



Advertisements
Similar presentations
EndNote Web Reference Management Software (module 5.1)
Advertisements

EndNote Web Reference Management Software (module 5)
Student Getting Started Guide Updated June Ensure that you are connected to the Internet. 2. Launch your web browser (Internet Explorer, Firefox,
Centra Quick Tips Press button or Ctrl Key to speak Use button to ask questions Use button for Yes, button for No Use buttons for feedback - Step Out Text.
SqoolTools and….. Your Own Virtual Classroom What is SqoolTools? SqoolTools is a free, hosted site which allows you to create your own virtual classroom.
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
1 Using Scopus for Literature Research. 2 Why Scopus?  A comprehensive abstract and citation database of peer- reviewed literature and quality web sources.
1 CIS101 Introduction to Computing Week 01 Professor Catherine Dwyer.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
1 CIS101 Introduction to Computing Week 01 Professor Catherine Dwyer.
Presented by Janine Termine Welcome to E-Learning.
Introduction to MVNU Technology aka IT 101. Welcome to IT 101! Our plan is to introduce you to a few basic technology items. We designed this to be short.
M AKING E - RESOURCE ACCESSIBLE FROM ONLINE CATALOG *e-books *serials Yan Wang Senior Librarian Head of Cataloging & Database Maintenance Central Piedmont.
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
NIST Scientific Data for Data Science United Nations Open Data / Open Government Conference, April 26-28, Abu Dhabi
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
This is my Kindergarten Electronic Portfolio From ~ Gardner Math, Science, Technology Magnet School Click Here To Begin.
Discovery Education Streaming Overview. Log in Screen.
BY OBAJE, ALFRED MICHAEL (DBA., B.SC., M.L.S.) MEDICAL SCIENCES LIBRARIAN 1 Library E-resources Sensitization and Demonstrations for 400 Level Medical.
ALISE 2014 Conference Jeonghyun Kim & William E. Moen
NASAA EFD: Industry Training THURSDAY– FEBRUARY 05, 2015: 2 PM EST Send any questions to
Introduction. » How the course works ˃Homework ˃Project ˃Exams ˃Grades » prerequisite ˃CSCI 6441: Mandatory prerequisite ˃Take the prereq or get permission.
© Paradigm Publishing, Inc. 5-1 Chapter 5 Application Software Chapter 5 Application Software.
Presented by Janine Termine Welcome to E-Learning.
Computational Scientometrics Studying science by scientific means Dr. Katy Börner Cyberinfrastructure for Network Science Center, Director Information.
Farm Data Dashboards: USDA and Microsoft Innovation Challenge Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Presented by Janine Termine Welcome 095 Basic Algebra.
1 Scopus as a Research Tool March Why Scopus?  A comprehensive abstract and citation database of peer-reviewed literature and quality web sources.
Presented by ESC 7 Advanced Academic Services. Click on Set up new account and follow the directions. Return to this page to log in and register for.
This presentation is designed to help assist you in registering and creating an account to do online homework using the MyMathLab program via CourseCompass.
Resource Curation and Automated Resource Discovery.
Scientific Research in Biotechnology 5.03 – Demonstrate the use of the scientific method in the planning and development of an experimental SAE.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Temporal Analysis using Sci2 Ted Polley and Dr. Katy Börner Cyberinfrastructure for Network Science Center Information Visualization Laboratory School.
Overviews of ITCS 6161/8161: Advanced Topics on Database Systems Dr. Jianping Fan Department of Computer Science UNC-Charlotte
Introduction. » How the course works ˃Homework ˃Project ˃Exams ˃Grades » prerequisite ˃CSCI 6441: Mandatory prerequisite ˃Take the prereq or get permission.
Syllabus CS479(7118) / 679(7112): Introduction to Data Mining Spring-2008 course web site:
Introduction to Science Informatics Lecture 1. What Is Science? a dependence on external verification; an expectation of reproducible results; a focus.
LHC Database Developers ’ Workshop (2D) January 24 – , CERN.
Scottish Centre for Regeneration (SCR) – Learning Networks quick guide to the online forum platform.
Robin Mullinix Systems Analyst GeorgiaFIRST Financials PeopleSoft Query: The Next Step.
CIS101 Introduction to Computing Week 01. Agenda What is CIS101? Class Introductions Using your Pace Introduction to Blackboard and online learning.
Create speaking avatars and use them as an effective learning tool.
NGA Demo Participant Collaboration Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
VIVO and Scholarly Repositories: Synergistic Opportunities.
LIVE INTERACTIVE YOUR DESKTOP Wednesday, November 17, 2010 ExploraVision Online Resources.
Data Science for NIST Big Data Framework Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community
1 CIS101 Introduction to Computing Week 01 Dr. Catherine Dwyer Information Systems.
ACIS Introduction to Data Analytics & Business Intelligence Database s Benefits & Components.
CSCI 6442 Database Management II INTRODUCTION Copyright 2016 David C. Roberts, all rights reserved.
CIS101 Introduction to Computing Week 01. Agenda What is CIS101? Class Introductions Using your Pace Introduction to Blackboard and online learning.
Data Science for Global Ebola Response Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data Science Data Science.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
Presented by Janine Termine Welcome 095 Basic Algebra.
HRM 498 ASSIST Teaching Effectiverly/hrm498assist.com FOR MORE CLASSES VISIT
Copyright 2013 Exostar LLC.| All Rights Reserved.| Proprietary and Confidential1 Identity Proofing Service United Technologies Corporation September 26.
1 e-Resources on Social Sciences: Scopus. 2 Why Scopus?  A comprehensive abstract and citation database of peer-reviewed literature and quality web sources.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
INF 103 MART Successful Learning/inf103mart.com
Researching for your Literature Review
Janine Von Furst, Instructor American High School 2012
Orientation for Online Study
Introduction to computers
Decision Support Systems Professor Pat Paulson
Citation databases and social networks for researchers: measuring research impact and disseminating results - exercise Elisavet Koutzamani
Presentation transcript:

Data Science for VIVO Dr. Brand Niemann Director and Senior Data Scientist Semantic Community February 25,

Overview Data Fair Port Unconference 2014: – Discussed at Federal Big Data Working Group Meetup March 4th on Joint NSF-NIH Biomedical Big Data Research – Dr. Barend Mons referred to ORCID (Open Researcher and Contributor ID) VIVO in October 2012 Presentation – See Dr. Barend Mons 2010 VIVO Conference Video "The Next Step in Knowledge Evolution; Colonization of Brains..." My Previous Work Was Data Science for VIVO!: – Build VIVO in the Cloud – Spotfire on Scholarly Databases – An Interface to a Digital Library of the Atlas of Science 2010 Science and Engineering Indicators Nobel Prizes VIVOs Evolution from Semantic Web Application to DuraSpace: – VIVO is an open community, an information model, and an open source semantic web application. – VIVO is joining DuraSpace as the VIVO Project, to realize a shared mission through an established legal and financial framework compatible with VIVO’s own vision and goals. – DuraSpace is an independent 501(c)(3) not-for-profit born from a vision to help save our shared scholarly, scientific, and cultural record. Indiana University Evolution from Information Visualization to Data Science: – This course provides an overview about the state of the art in information visualization. It teaches the process of producing effective visualizations that take the needs of users into account. – This year, the course can be taken for three Indiana University credits as part of the Online Data Science Program just announced by the School of Informatics and Computing. Students interested in applying to the program can find more information here. – Everyone who registers gains free access to the Scholarly Database (26 million paper, patent, and grant records) and the Sci2 Tool (100+ algorithms and tools). My Note: But not all of it! 2

Vivoweb.org 3

Indiana University CNS: Vivo Data and Visualizations 4

Indiana University CNS: VIVO Presentation Information 5

Indiana University CNS: 2.5 Sample Data Sets 6

Cyberinfrastructure for Network Science Center 7

Information Visualization MOOC 8

Pre-Questionnaire for the IVMOOC 9 My Note: How did I do?

s for the IVMOOC IVMOOC: Welcome to the Sci2 Tool – Thank you for registering for the Information Visualization MOOC. As part of this course, you will be using the Science of Science (Sci2) tool, an open source data mining and visualization software package developed by the team that runs the course. – We have created an account on the Sci2 website for you. Please use the link below to activate the account. IVMOOC: SDB Account Confirmation – Thank you for registering for the Information Visualization MOOC. As part of this course, you will be asked to use data from the Scholarly Database, an open data source maintained by the team that runs the course. Our records show that your address is already registered for the Scholarly Database. Please take a moment to visit the below site and confirm that you remember your login information. IVMOOC: Welcome to the Information Visualization MOOC: – Thank you for registering for the Information Visualization (IVMOOC) course. This is to confirm receipt of your registration information. Unit one of the course will open on January 28, 2014, with another Unit opening each week. The course introduction posted on week one will include all the information about what you will need to do in order to earn the Mozilla Open Badge. – You should receive two s in addition to this one that contain your login information for the Sci2 Tool and Scholarly Database, two free tools that you will be using throughout the course. 10

IVMOOC - Schedule 11

IVMOOC - Week 5 Many of you did a great job on the Mid- Term! All students who submitted the Mid- Term will receive a pointer to the correct answers shortly. For your information, the Mid-Term will be graded on a curve, see: – Week 5 materials are now available at: – – Enjoy analyzing and visualizing trees. 12

IVMOOC – Student Locations 13

Indiana University CNS: Vivo Data and Visualizations Overview 14

The UCSD Map of Science 15

Published But Not the Raw Data 16

Indiana University CNS: Vivo Data and Visualizations Sample Data 17 My Note: These are all very small compared to the UCSD Map of Science and Scholarly Database!

Temporal Visualization: Grants Over Time 18 My Note: Not Impressed With This!

Statements I Disagree With! Visualization brings large amounts of data together in a single graphic to allow for greater understanding. All data is stored in the same format, via triples (subject, predicate, and object). There is no table structure to guide data collection in a triple store. Instead, you have the ontology. The ontology defines how data is linked together, and is critical when it comes to retrieving that data. With some understanding of the ontology, we’re ready to start writing SPARQL queries to get a data table for the visualization. VIVO does not store information based on the name, as we would consider it. Instead, it assigns every entity a Uniform Resource Identifier (or URI). Here is the core data behind this, see the problem? – – Excel is generally the one that most people are familiar with, but it is rather limited in the ability to replicate work easily. In the end, your final output can only be as good as the data that went into it. 19

Exercises and Data Sets Burst Detection: – Use publication_result.csv – Normalizing publication title by using Lowercase, Tokenize, Stem, Stopword Text plugin. – Save as burst.csv Map of Science: – Use publication_result.csv – The Journal Naming Convention plugin tries to normalize these names and link them to lookup tables within Sci2 to create the best matching possible. Network: – Use publication_result.csv – Extract the Co-Author network using Extract Co-Occurrence Network plugin. – Visualize Using GUESS and Gephi. My Note: Use NodeXL 20

iNRN 21 A centralized place for sharing and accessing networks, data, knowledge, and expertise. Further explores the evolution of VIVO. My Note: Still No Data!

Theory Unit Structure Each theory unit comprises: – Examples of best visualizations – Visualization goals – Key terminology – General visualization types and their names – Workflow design Read data Analyze Visualize – Discussion of specific algorithms 22