Daniel R. Harris Center for Clinical and Translational Sciences

Slides:



Advertisements
Similar presentations
Lesson 3 ODOT Analysis & Assessment. Analysis & Assessment Learning Outcomes As part of a small group, apply the two- part analysis by generating exposure-
Advertisements

National Center for Research Resources NATIONAL INSTITUTES OF HEALTH T r a n s l a t i n g r e s e a r c h f r o m b a s i c d i s c o v e r y t o i m.
Big Data and Predictive Analytics in Health Care Presented by: Mehadi Sayed President and CEO, Clinisys EMR Inc.
© 2012 TeraMedica, Inc. Big Data: Challenges and Opportunities for Healthcare Joe Paxton Healthcare and Life Sciences Sales Leader.
CceHUB A Knowledge Discovery Environment for Cancer Care Engineering Research Ann Christine Catlin HUBzero Workshop November 7, 2008.
Customer Service: A Practical Approach, 5th ed. By Elaine K. Harris
Teaching and Learning for Creativity Keith Sawyer Washington University in St. Louis.
HISTORICAL THINKING A lesson on WHY and HOW we study history.
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
Copyright © 2015, SAS Institute Inc. All rights reserved. Future Drug Applications with No Tables, Listings and Graphs? PhUSE Annual Conference 2015, Vienna.
1 Semantic Relations for Interpreting DNA Microarray Data and for Novel Hypotheses Generation Dimitar Hristovski, 1 PhD, Andrej Kastrin, 2 Borut Peterlin,
Purpose of a Literature Review Potential Research Sources Writing a Literature Review.
Knowledge Modeling and Discovery. About Thetus Thetus develops knowledge modeling and discovery infrastructure software for customers who: Have high-value.
THOMSON REUTERS PROFESSIONAL SERVICES. THOMSON REUTERS PATENT CONTENT 98% of world’s filed patents.
The opportunities and challenges of sharing genomics data with the pharmaceutical industry Shahid Hanif, Head of Health Data & Outcomes, ABPI DNA digest.
The purposes of nursing theory?
HCA 497 MART Experience Tradition /hca497mart.com FOR MORE CLASSES VISIT
HCA 497 NERD Real Success HCA 497 Entire Course (Ash) FOR MORE CLASSES VISIT HCA 497 Week 1 DQ 1 U.S. Health Care System vs. Other.
TDM in the Life Sciences Application to Drug Repositioning *
BC SUPPORT Unit: Overview and update
Popular Database Management Systems
Learning Assessment Techniques
Intellectual Merit & Broader Impact Statements August 2016
Ricardo EIto Brun Strasbourg, 5 Nov 2015
Frameworks for Information Visualization
Identify and Meet a Market Need
CARER Proposal Writing Workshop November 2004
Designing Your Study and Selecting a Sample
PROMOTING PROBLEM SOLVING AND SENSE-MAKING DAY 3
Inquiry, Pedagogy, & Technology: Automated Textual Analysis of 30 Refereed Journal Articles David A. Thomas Mathematics Center, University of Great Falls,
Measuring Success Toolkit
Focusing on High Quality Patent
Visually Mining and Monitoring Massive Time Series
Workflows in archaeology & heritage sciences
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Customer Service: A Practical Approach, 5th ed. By Elaine K. Harris
NR 501Competitive Success/tutorialrank.com
The Social Model for A/T Technology Transfer – AAATE 2010 “From Problem Identification to Social Validation: An Operational Model” Joseph P. Lane,
Pilot project training
MD Online IEP System Instructional Series – PD Activity
MD Online IEP System Instructional Series – PD Activity #7
N. Capp, E. Krome, I. Obeid and J. Picone
Principles of Learning and Teaching – Math/Science EDU312
What Is Science? Read the lesson title aloud to students.
ORA’18 – Business Plan Contest
Customer Service: A Practical Approach, 5th ed. By Elaine K. Harris
Visualizing and Analyzing NIAID’s Research Portfolio Dolan Ghosh, Ph.D., and Marie Parker Office of Strategic Planning, Initiative Development, and Analysis.
Primary vs. Secondary Sources
Intellectual Merit & Broader Impact Statements August 2018
An ecosystem of contributions
Title of Presentation with Font Sized Adjusted to Fit Available Space
Visualization and Analysis of Air Pollution in US East Coast Cities
Year 10 Research Action Plan
Introduction to Information Retrieval
Integrating digital technologies: Three cases of visual learning Professor Robert Fitzgerald Charles Darwin University IRU Senior Leaders Forum 2018.
GLOBALIZATION COMPETITION COMPLEXITY What to Say
Seismic Eruption - forecasting future earthquakes
Secure Knowledge (1-3) Describe investigation process
Intellectual Merit & Broader Impact Statements August 2017
BUFFALO FUND: ACCELERATOR Full Proposal Presentation
Welcome World History Teachers!
PRESENTER NAME PRESENTER INSTITUTION
Third International Seville Conference on Future-Oriented Technology Analysis (FTA): Impacts and implications for policy and decision-making 16th- 17th.
Norman L Webb.
PYP PLANNER Year level/ Subject: Dates unit taught:
Phase 3 IDEATE Review the previous weeks assignment for 20 – 30 mins before beginning the presentation. This week is to make the students learn about looking.
Information Systems in Organizations 2
October 29, 2017 MKTG 5604.
Title of Presentation with Font Sized Adjusted to Fit Available Space
Intellectual Merit & Broader Impact Statements August 2019
Presentation transcript:

Augmenting Data with Semantics for Visualization (Big Data Analytics Competition) Daniel R. Harris Center for Clinical and Translational Sciences Institute of Pharmaceutical Outcomes and Policy

Augmenting Data with Semantics for Visualization We focus on augmenting data with semantics via concept extraction. Specifically, from the call: What insights can you provide by analyzing the dataset? Your insights and suggestions are expected to be creative. Based on this dataset, what are the most common medical and health applications where patent development is occurring? How frequently are patents being filed with the same title? How would you improve this dataset to better distinguish unique patents with duplicate titles? What additional data / metadata would you include in this dataset to help researchers more efficiently locate relevant medical and health patents? What conclusions can you draw from this data? What trends, if any, have formed over the past decade? Where are the trends moving? Consider both health industry and patent filing perspectives. What anomalies can you find in this data? Is there anything that affects the integrity of the data?

Augmenting Data with Semantics for Visualization What is the original data set? A curated collection of patents selected from BHI keywords How did we augment this data set? We mapped the invention title to UMLS concepts (CUIs) using Metamap (https://metamap.nlm.nih.gov/). How is this helpful? Each concept has a semantic type that can drive visualizations to help understand the nature of the data and how trends change over time.

A Quick Example Original: A CARD GAME HAVING CARDS WITH GRAPHIC AND PICTORIAL ILLUSTRATIONS OF GEOGRAPHIC, HISTORICAL AND HEALTH RELATED FACTS Idea or Concept (1) : (A CARD GAME HAVING CARDS WITH GRAPHIC AND PICTORIAL ILLUSTRATIONS OF GEOGRAPHIC, HISTORICAL AND HEALTH RELATED FACTS) Intellectual Product (3): (((A CARD GAME) HAVING CARDS WITH GRAPHIC AND PICTORIAL ILLUSTRATIONS) OF GEOGRAPHIC, HISTORICAL AND HEALTH RELATED FACTS) Finding (2): A CARD GAME HAVING CARDS WITH (GRAPHIC AND PICTORIAL ILLUSTRATIONS) OF (GEOGRAPHIC, HISTORICAL AND HEALTH RELATED FACTS) Qualitative Concept (1): A CARD GAME HAVING CARDS WITH GRAPHIC AND PICTORIAL ILLUSTRATIONS OF (GEOGRAPHIC, HISTORICAL AND HEALTH RELATED) FACTS Spatial Concept (1): A CARD GAME HAVING CARDS WITH GRAPHIC AND PICTORIAL ILLUSTRATIONS OF (GEOGRAPHIC), HISTORICAL AND HEALTH RELATED FACTS Each of the next slides is a visualization (Tableau) produced using the augmented data.

Aggregation Concepts provide simple buckets to use when aggregating data The right shows the frequency of concepts extracted from the patent database

Aggregation This can be paired with any other facet of information available in the original data set, such as time of filing

Temporal Considerations We can filter by semantic type. The right shows the frequency of patents having the semantic type of “Pharmacologic Substance” Time is important because we can use previous data points to forecast future data points

Trends Trend lines can work in conjunction with temporal data (linear regression shown)

Comparing Different semantic types experience different trends We can visualize them side by side to ask questions that might help us understand how the data fluctuates across time We can merge this with forecasting and trend lines

Stratify Change We can stratify our data by sections of time to see how our forecasts change when considering only the last X years. This is helpful when specific policy change, funding changes, or discoveries impact the filing of patents.

Drill Down We can still leverage specific information about each invention. For QA purposes, the concepts extracted from each invention title are easily listed and are reviewable Research question: Can we automate this evaluation?

Landscape Analysis Peaks and valleys are easily identifiable but explaining why is a more difficult challenge. We can overlay this with annotations corresponding to significant law, policy, or regulatory changes to see the impact of such changes

Identify Areas of Improvement Aggregating at a conceptual level both exposes areas that are popular and areas where additional opportunity exists

Example Using the last X years of data, we can forecast that the number of medical device patents initially decrease then remain stable. We can also see that the number of manufactured objects decrease with a possible return. If these manufactured objects are not new medical devices, what are they? (IoT, surveillance, communication, etc)?

Conjunction We can recover the invention titles given the semantic type We can also see how this semantic type coincides with other semantic types

Conclusions We focus on augmenting the data set with semantic knowledge extracted from the invention titles This additional knowledge gives abstract buckets to compare and contrast patent trends at a higher level We give example visualizations to demonstrate the potential expressive power