The Pragmatics of Ontology and Heterogeneous Data Sources The Ins and Outs of CTSAsearch David Eichmann School of Library and Information Science University.

Slides:



Advertisements
Similar presentations
1 Integrating ChemAxon and Linguamatics to provide Agile, Chemistry-enabled Text Mining Dr Paul Milligan Senior Application Specialist, Linguamatics ChemAxon.
Advertisements

1 Integrating ChemAxon and Linguamatics to provide Agile, Chemistry-enabled Text Mining Dr Jeffrey L. Nauss Application Specialist, Linguamatics ChemAxon.
CLiP 2006: Literatures, Languages and Cultural Heritage in a digital world Building a Virtual Research Environment for the Humanities The JISC funded ‘Building.
Who’s Sharing with Who? Acknowledgements-driven identification of resources David Eichmann School of Library and Information Science & Information Science.
CRICOS No J a university for the world real R Queensland University of Technology Janet Baker, QUT Library.
Weaving a Semantic Web: Using Linked Open Data from Institutional and National Sources David Eichmann School of Library and Information Science University.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Research Networking Tools and Profiling Systems Payam Kabiri, MD, PhD. Clinical Epidemiologist Director of Scientific Publications and Information Development.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
Creating a UK Account Tamela Harper, MHA Biomedical Intelligence Reporting Officer Center for Clinical and Translational Science Biomedical Informatics.
SciVal Experts & SciVal Funding Information Sessions.
An Integration Platform of Social Networking Applications to Support Life Long Learning in Rural Territories: the “SoRuraLL Virtual Learning World” Environment.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
CSCI 572 Project Presentation Mohsen Taheriyan Semantic Search on FOAF profiles.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
Community of Science The Leading Internet Site for Researchers Worldwide
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Analytical Solution for Management of Biotech & Pharma Innovation Data collection, data warehousing, analytical design, data analysis and communication.
Tools for Publishing Environmental Observations on the Internet Justin Berger, Undergraduate Researcher Jeff Horsburgh, Faculty Mentor David Tarboton,
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
VIVO: Enabling National Networking of Scientists Michael Conlon, PhD Principal Investigator
IDR Snapshot: Quantitative Assessment Methodology Evaluating Size and Comprehensiveness of an Integrated Data Repository Vojtech Huser, MD, PhD a James.
HUBZERO AT INDIANA UNIVERSITY: THE INDIANA CTSI HUB Bill Barnett EDUCAUSE October 14, 2010.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Managing & Integrating Enterprise Data with Semantic Technologies Susie Stephens Principal Product Manager, Oracle
Information Integration Intelligence with TopBraid Suite SemTech, San Jose, Holger Knublauch
Mike Conlon Here’s Mike on a conference call from his home. Mike spends a lot of time on conference calls from his home, and from coffee shops in and around.
Scholarly Activity and Research Networking Tools at the University of Minnesota Kate McCready, University Libraries.
Group 2 Lead: Brian Butler Participants: Michael Wardlow – Elaine Collier – Mike Conlon – Robert McDonald – Ying Ding – Janos Hajagos – Neo Martinez Breakout.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
VIVO is supported by NIH award U24 RR The UF CTSI is supported in part by NIH awards UL1 RR029890, KL2 RR and TL1 RR The UF CTSI VIVO:
VIVO: Enabling Networking of Scientists Mike Conlon University of Florida.
The VIVO Story: Origins and Future Directions Mike Conlon University of Florida.
What is VIVO? Research and Scholarship Discovery Across the University of Florida – A service of the Clinical and Translational Science Institute A Grant.
Supported by EU projects 12/12/2013 Athens, Greece Open Data in Agriculture Hands-on with data infrastructures that can power your agricultural data products.
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Presented by Dr. S. C. Jindal Librarian Central Science Library University of Delhi Delhi Information Competency.
Sharon M. Jordan Assistant Director for Program Integration U.S. DOE Office of Scientific & Technical Information Vantage Point: Government R&D Results.
VIVO Update and Many Flavors of Search Mike Conlon University of Florida.
Own research related to workshop Can we produce “knowledge maps” to locate and find (scientific) works across collections, time and space?
VIVO and Scholarly Repositories: Synergistic Opportunities.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
| 1 Open Access Advancing Text and Data Mining Libraries & Publishers working together to support Researchers What is Text Mining?
How to Find Evidence- Based Resources Mary Catherine Santoro Outreach and Instruction Librarian Carilion Clinic Health Sciences Libraries.
Marine Metadata Interoperability Acknowledgements Ongoing funding for this project is provided by the National Science Foundation.
Semantic Web Portal: A Platform for Better Browsing and Visualizing Semantic Data Ying Ding et al. Jin Guang Zheng, Tetherless World Constellation.
Building on VIVO and going the next step: Adding or Linking to Local and National repositories and/or research data; research resources and core facilities;
VIVO is... A community of 133 sites in 26 countries Organizations represented on VIVO governance groups: Brown, Cornell, Duke, George Washington University,
Getting More from Your VIVO Mike Conlon, UF Melissa Haendel, OHSU Kristi Holmes, Northwestern.
VIVO: DISCOVERY, MANAGEMENT AND CONNECTING RESEARCHERS Heather Seibert-Racine : Research and Scholarly Communications 12 th Annual Joyner Library Paraprofessional.
Large Scale Semantic Data Integration and Analytics through Cloud: A Case Study in Bioinformatics Tat Thang Parallel and Distributed Computing Centre,
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
Promoting Collaboration and Team Science Across the CTSA Consortium OR Now that you’ve VIVOed, what Next? Holly Falk-Krzesinski (Northwestern U./Elsevier)
The Design and Development of an Integrated Researcher Profile System at Texas A&M to Enrich Scholarly Identity of Faculty Dr. Bruce Herbert | Michael.
Sh Challenge: Showcase the faculty Help them collaborate
Portfolio Analysis in OPASI at NIH
VIVO: Faculty Research Information System and Discovery
Research Enablement Metrics
Extending VIVO infrastructure to support linking information between EarthCollab VIVO instances Huda Khan, Matthew Mayernik, Keith Maull, M. Benjamin Gross,
Analyzing and Securing Social Networks
Identifying Collaborative Relationships and Interconnections Between Research Communities Using LinkedIn Maps David Eichmann University of Iowa Noshir.
An ecosystem of contributions
Introduction of KNS55 Platform
Scientific and Technical Information Issues
Bird of Feather Session
Knowledge Sharing Mechanism in Social Networking for Learning
Metadata supported full-text search in a web archive
Presentation transcript:

The Pragmatics of Ontology and Heterogeneous Data Sources The Ins and Outs of CTSAsearch David Eichmann School of Library and Information Science University of Iowa

Research Networking Programmatic support for discovery and use of research and scholarly information regarding people and resources. They are essentially special purpose institutional knowledge management systems.

Representative RN Systems Profiles (Harvard) VIVO (VIVO Consortium) Loki (Iowa) SciVal Experts (aka Pure – Elsevier) A number of others

Why Bother with VIVO (the ontology)? Words in a profile are just sequences of characters carrying no meaning –Try asking Google Scholar what grant funded a given hit… With structure and relationship comes meaning, aka semantics –Enter the Semantic Web!

Connecting the Dots The real challenge here is translation of information already in existence in scattered sources –Research networking tools –Citation databases (e.g., PubMED) –Award databases (e.g., NIH Reporter) –Curated archives (e.g., GenBank) –Locked up in text (the research literature)

CTSAsearch – version 1 10 SPARQL endpoints 19 institutions 124,945 individuals Proved challenging for some sites to handle the queries

CTSAsearch – version 1 subclass | count NonFacultyAcademic | FacultyMember | NonAcademic | EmeritusFaculty | 2134 EmeritusProfessor | 2070 Postdoc | 1226 Librarian | 232 Student | 89 GraduateStudent | 71

CTSAsearch – version 2 10 SPARQL endpoints (19 institutions) 15 VIVO sites –Harvested with customized crawler 14 Profile sites –Harvested with customized crawler

CTSAsearch – version 2 subclass | count NonFacultyAcademic | FacultyMember | NonAcademic | Student | GraduateStudent | EmeritusFaculty | 3096 EmeritusProfessor | 2072 Postdoc | 1410 Librarian | 264

CTSAsearch – architecture 1 VIVO-based SPARQL harvester 2(!) VIVO-based crawlers 1 Profiles-based crawler 2 Platform-specific HTML crawlers 1 CSV-based loader

CTSAsearch – architecture

CTSAsearch – current 45,456,417 VIVO-derived triples 48,569,115 Profiles-derived triples

Recent Work Cross-linkage across sites –Resolving ‘stubs’ –Formation of a single ecosystem Macro concerns –Institution-scale analytics –Pondering reflection

Current “profile”

CTSAsearch/Polyglot – version x Temporary SPARQL endpoint: – Shared visualization widgets –Intended for embedding in institutional sites Community-wide sameAs assertions

Pattuelli’s Spectrum of Relationships (2012)

Pattuelli’s Spectrum of Relationships (2012) RN Tools

Pattuelli’s Spectrum of Relationships (2012) RN Tools Linked In

Pattuelli’s Spectrum of Relationships (2012) Ontologies used –foaf (Friend of a Friend) –rel (Relationship) –mo (Music) Echos of Trigg’s link taxonomy –Trigg, R Network-Based Approach to Text Handling for the Online Scientific Community. Ph.D. dissertation, Department of Computer Science, University of Maryland, technical report TR-1346

Connecting the Dots – Take 2 Figure courtesy of Melissa Haendel, OHSU

PubMed Central Open Access 886,172 papers (as of 1/1/15) 423,764 with acknowledgements 994,931 sentences 4,329,972 parses

The Simple Cases PMCID: SeqNum: 2 SentNum: 6 Sentence: EK analysed the data. POS: [EK/NNP, analysed/VBD, the/DT, data/NNS,./.] Parse: [S [NP EK/NNP ] [VP analysed/VBD [NP the/DT data/NNS ] ]./. ]

And the Not So Simple… PMCID: Sentence: We thank Sheila Harvey, Clinical Trials Unit Manager at ICNARC, and Ruth Canter, Trials Administrator at ICNARC, for their assistance in chasing completed surveys; Dr Kevin Gunning for early advice and project development; Drs Neill K. J. Adhikari and Gordon D. Rubenfeld for feedback and discussion of analysis plan; Dr Chris AKY Chong for his valuable comments on the initial draft of this manuscript; and our Responders: Addenbrooke’s Hospital ( Dr Kevin Gunning ), Airedale General Hospital ( Dr John Scriven ), Alexandra Hospital ( Dr Tracey Leach ), Arrowe Park Hospital ( Dr Lawrence Wilson ), Barnet Hospital ( Dr AH Wolff ), … 8,245 character long sentence

Extract Entities/Relationships with Syntactic Queries [S [NP:Author NN:Author ] [VP NN [NP:Person ] [PP ], [PP ] ] ] S <1NP:Author <2[VP <1/thank/ <2(NP) <3(PP) ] –For the sentence having this pattern, match the object noun phrase and the next prepositional phrase NP <#2 <1(NNP) <2(NNP) –For the noun phrase, extract two proper nouns PP <#2 <1DT <2(NP) –For the prepositional phrase, match the noun phrase

Person Results Snippet IDTitleFirst NameMiddle NameLast Name 76HansMatrin 77JeffVieira 78P.ZAMORE 79Prof.EricSchon 80CarlosLois 81AndreaMöll 82ElenaGovorkova 83K.M.Pollard 84Dr.MichaelBerton

Relationships for Person 77 PMCIDCategoryPP Supportthe kind gift of rKSHV Supportthe kind gift of rKSHV.219 and for helpful discussions Collaborationhelpful discussions

Relationships for Person 79 PMCIDCategoryPP Resourcethe rabbit polyclonal antibody Resourcethe ECFP and EYFP plasmids Collaborationhis helpful advice and discussions

Category Frequencies CategoryCount Collaboration47,052 46,327 Technique33,598 Resource8,894 Support6,836 Event3,744 Project854 Place Name229 Publication Component 210 Place186 Organization93

Next Steps Continue slogging through extraction pattern definition Define patterns for –funding declarations –chairs, fellowships, etc. Merge data into CTSAsearch visualizations Align current category scheme with Melissa Haendel’s current draft ontology for CASRAI taxonomy and then merge with VIVO-ISF

In the Next Year Joint work with Melissa Haendel (OHSU) on administrative supplement to OHSU’s CTSA bridging RNs and NIH’s SciENcv –Map SciENcv data model to VIVO-ISF –Enable bi-directional data exchange –Integrate clinical/trial data sources –Integrate SciENcv, ORCID data into CTSAsearch –Multi-granularity search and visualization

Questions?