Experience with the development and operation of the Neuroscience Information Framework (NIF) portal Maryann E. Martone, Ph. D. University of California,

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

© 2008 EBSCO Information Services SUSHI, COUNTER and ERM Systems An Update on Usage Standards Ressources électroniques dans les bibliothèques électroniques.
Usage Statistics in Context: related standards and tools Oliver Pesch Chief Strategist, E-Resources EBSCO Information Services Usage Statistics and Publishers:
Compiled by Helene van der Sandt. Is a search engine that searches for scholarly literature Can search across many disciplines Searches for articles,
Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San.
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
Data Landscapes neuinfo.org Anita Bandrowski, Ph. D. University of California, San Diego.
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Community of Science The Leading Internet Site for Researchers Worldwide
Environmental Terminology System and Services (ETSS) June 2007.
Accelerate Business Success With CRM CRM Interoperability.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
Why search again and again? Encore and next-generation searching at UQ Keith Webster University Librarian & Director of Learning Services.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Overview of Search Engines
Leveraging Names with Linked Data Karen Smith-Yoshimura Ralph LeVan 2010 RLG Partnership Annual Meeting Chicago, IL 9 June 2010.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
IBE312: Ch15 Building an IA Team & Ch16 Tools & Software 2013.
ASIDIC Spring Conference ‘Smart Content’ Uncovering the Value and Benefits of Semantic Technology Richard C. Fusco Director, Content Strategy – McGraw-Hill.
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
Enriching the Ontology for Biomedical Investigations (OBI) to Improve Its Suitability for Web Service Annotations Chaitanya Guttula, Alok Dhamanaskar,
The use of ontologies within the Neuroscience Information Framework, a neuroscience-centered portal for searching and accessing diverse resources Maryann.
The Neuroscience Information Framework Establishing a practical semantic framework for neuroscience Maryann Martone, Ph. D. University of California, San.
Integrating digital atlases of the brain: atlas services with WPS Ilya Zaslavsky San Diego Supercomputer Center, UCSD Lead of the INCF Digital Atlasing.
The OCLC Library Spotlight Program ™ Localize, Mobilize, and Spotlight Your Library Jeff Penka Director, Global Discovery & Syndication.
Understanding the Web Site Development Process. Understanding the Web Site Development You need a good project plan Larger projects need a project manager.
Strategies for improving Web site performance Google Webmaster Tools + Google Analytics Marshall Breeding Director for Innovative Technologies and Research.
Databases and Library Catalogs Global Index Medicus/Global Health Library PubMed Source Bibliographic Database: International Health and Disability.
GCMD/IDN STATUS AND PLANS Stephen Wharton CWIC Meeting February19, 2015.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Microsoft Academic Search Search | Explore | Discover Alex D. Wade Director - Scholarly Communication.
Resource Curation and Automated Resource Discovery.
Navigating the Neuroscience Data Landscape Maryann Martone, Ph. D. University of California, San Diego.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
Neuroinformatics Maryann Martone Amarnath Gupta. Bioinformatics a scientific discipline that encompasses all aspects of biological information acquisition,
Marshall Breeding Director for Innovative Technology and Research Vanderbilt University
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Alan Ruttenberg PONS R&D Task force Alan Ruttenberg Science Commons.
Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NIFSTD Maryann Martone University of California, San Diego.
Big data from small data: A deep survey of the neuroscience landscape data via the Neuroscience Information Framework Maryann Martone, Ph. D. University.
The Neuroscience Information Framework Making Resources Discoverable for the Computational Neuroscience Community Jeffrey S. Grethe, Ph. D. Co-Principal.
Building a Topic Map Repository Xia Lin Drexel University Philadelphia, PA Jian Qin Syracuse University Syracuse, NY * Presented at Knowledge Technologies.
Introduction to the Semantic Web and Linked Data
Controlled Vocabulary Giri Palanisamy Eda C. Melendez-Colom Corinna Gries Duane Costa John Porter.
University of California, San Diego Ontology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework.
PRO and the NIF / ImmPort Antibody Registries Alexander Diehl Protein Ontology Workshop 6/18/14.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Advancing Science: OSTI’s Current and Future Search Strategies Jeff Given IT Operations Manager Computer Protection Program Manager Office of Scientific.
N IF S TD : A C OMPREHENSIVE O NTOLOGY FOR N EUROSCIENCE Fahim IMAM 1, Stephen LARSON 1, Sridevi POLAVARAM 2, Georgio ASCOLI 2, Gordon SHEPHERD 3, Jeffery.
The Neuroscience information framework A User’s Guide.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
ILYA ZASLAVSKY RAQUEL CALDERON CHRIS CONDIT JEFFREY GRETHE AMARNATH GUPTA BURAK OZYURT THOMAS WHITENACK DAVID VALENTINE ALICE GILIARINI AARON GONG University.
NSDL STEM Exchange: Technical Overview and Implications for Active Dissemination of Federally Funded Resources Across Implementation Systems.
Contributions to mouse BIRN tools and resources Maryann Martone and Mark Ellisman University of California, San Diego 2008.
Semantics and the EPA System of Registries Gail Hodge IIa/ Consultant to the U.S. Environmental Protection Agency 18 April 2007.
Uniform Resource Layer Anita Bandrowski, Ph. D. Neuroscience Information Framework University of California, San Diego.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Data Sources & Using VIVO Data Visualizing Science VIVO provides network analysis and visualization tools to maximize the benefits afforded by the data.
What is Google Analytics?
University of California, San Diego
Ilya Zaslavsky Jeffrey Grethe amarnath Gupta burak Ozyurt
Development of the Amphibian Anatomical Ontology
Strategies for improving Web site performance
Doron Goldfarb & Yann LE FRANC
WikiNeuron: Semantic Neuro-Mashup
An ecosystem of contributions
Jonathan Griffin, Managing Director, IFIS Publishing &
Presentation transcript:

Experience with the development and operation of the Neuroscience Information Framework (NIF) portal Maryann E. Martone, Ph. D. University of California, San Diego

NIF is an initiative of the NIH Blueprint consortium of institutes NIF is an initiative of the NIH Blueprint consortium of institutes – What types of resources (data, tools, materials, services) are available to the neuroscience community? – How many are there? – What domains do they cover? What domains do they not cover? – Where are they? Web sites Web sites Databases Databases Literature Literature Supplementary material Supplementary material – Who uses them? – Who creates them? – How can we find them? – How can we make them better in the future? PDF files PDF files Desk drawers Desk drawers

The Neuroscience Information Framework NIF has developed a production technology platform for researchers to: – Discover – Share – Analyze – Integrate neuroscience-relevant information Since 2008, NIF has assembled the largest searchable catalog of neuroscience data and resources on the web Cost-effective and innovative strategy for managing data assets “This unique data depository serves as a model for other Web sites to provide research data. “ - Choice Reviews Online NIF is poised to capitalize on the new tools and emphasis on big data and open science

NIF searches across 3 main indices: Registry, Federation and Literature Data Federation: 200 databases/400M records Registry: 6300 resources (2500 databases) Literature: 22 million articles Data Federation: 200 databases/400M records Registry: 6300 resources (2500 databases) Literature: 22 million articles

NIF must work with ecosystem as it is today NIF was one of the first projects to attempt data integration in the neurosciences on a large scale NIF is supported by a contract that specified the number of resources to be added per year – Designed to be populated rapidly; set up process for progressive refinement – No budget was allocated to retrofit existing resources; had to work with them in their current state – We designed a system that required little to no cooperation or work from providers – Supports many formats: relational, XML, RDF

NIF Semantic Framework: NIFSTD ontology NIF covers multiple structural scales and domains of relevance to neuroscience Aggregate of community ontologies with some extensions for neuroscience, e.g., Gene Ontology, Chebi, Protein Ontology NIFSTD Organism NS Function Molecule Investigation Subcellular structure Macromolecule Gene Molecule Descriptors Techniques Reagent Protocols Cell Resource Instrument Dysfunction Quality Anatomical Structure

How do resources get added to the NIF? NIF curators Nomination by the community Semi-automated text mining pipelines  NIF Registry  Requires no special skills  Site map available for local hosting NIF Data Federation DISCO interop Requires some programming skill Open Source Brain < 2 hr Two tiered system: low barrier to entry + progressive refinement

Inside the federation: What are the connections of the hippocampus? Hippocampus OR “Cornu Ammonis” OR “Ammon’s horn” Query expansion: Synonyms and related concepts Boolean queries Query expansion: Synonyms and related concepts Boolean queries Data sources categorized by “data type” and level of nervous system Common views across multiple sources Tutorials for using full resource when getting there from NIF Link back to record in original source

Progressive Refinement of Data Discoverability Accessibility Web of Data Data specified via simple semantics Data in a usable form Semantically-enabled search Enhanced semantics Standardized representation Linked Open Data - RDF Data resources simply described Automated data harvesting technologies Common resource registry Processing of data from different sources can be tailored to requirements… 9

390 million records! NIF Concept Mapper June10, Aligns sources to the NIF semantic framework

Column level mapping: Reducing false positives Gene Organism Anatomy

Entity mapping birnlex_1732 Brodmann.1 Explicit mapping of database content helps disambiguate non-unique and custom terminology: Google Refine tools and services

Working with and extending ontologies: Neurolex.org Semantic MediWiki Provide a simple interface for defining the concepts required Light weight semantics Good teaching tool for learning about semantic integration and the benefits of a consistent semantic framework Community based: Anyone can contribute their terms, concepts, things Anyone can edit Anyone can link Accessible: searched by Google Growing into a significant knowledge base for neuroscience International Neuroinformatics Coordinating Facility Demo D03 Larson et al, Frontiers in Neuroinformatics, in press

WHO USES NIF?

Usage of NIF

Building knowledge in the web Because they are static URL’s, Wikis are searchable by Google

Data Services Current Planned 1.Vocabulary 1.NITRC (autocomplete) 2.Neuroscience.com (annotate) 3. INCF Atlasing tools 2.Data Summary (NIF Navigator) 1.NIDA, Blueprint 2.NeuroLex 3.Individual Data Sources 1.DOMEO 2.OneMind 3.Eagle i 4.DISCO Services (LinkOut) 1.PubMed Anita Bandrowski, Jeff Grethe

People use NIF to... Find resources – “Where can I find a translation of Talaraich to MNI coorindates- NIF Forum – “Where are all the biospecimen repositories?”- One Mind for Research – “What biospecimen banks are available with tissues from opiate addicts?”-NIH Find answers – What is the amount of data published on males vs females- NIH – “Where is my gene expressed”- Researcher – “What projects to the ventral lateral geniculate nucleus”-researcher Track resource utilization – What projects are using my antibody/mouse/database? Serve as a springboard – NIF ontologies, tools and data resources are used by many groups (20,000 hits/month on NIF services) – NIF technologies and expertise jumpstart related efforts Top search terms Top NIF sources Data Literature Registry

NIF Analytics: The Neuroscience Landscape Exposing knowledge gaps and biases Where are the data? Striatum Hypothalamus Olfactory bulb Cerebral cortex Brain Brain region Data source Funding

Metrics of success Return visits: – >1000 per month come more than once 196 sites have NIF links on their pages > 100 research libraries list us in their library guides – ~350 colleges and universities visit per month 3 professional societies > 21 papers that use NIF – 2 that mention uses of data – 5 that use NIF ontology

NIF RESOURCE SERVICES: LINKING AND TRACKING RESOURCE UTILIZATION

Tracking resource growth and utilization Promoting resources in our federation – Webinar series – NIF Digest Providing resource growth statistics Making sure resources are “alive” Top 25 databases for August 2013 NIF provides support for resource providers; they are our customers too

DISCO Dashboard 23 Management of registry resources through a single administrative dashboard Associated discovery pipeline Tools to manage data updates Change tracking Globally unique identifier creation Management of registry resources through a single administrative dashboard Associated discovery pipeline Tools to manage data updates Change tracking Globally unique identifier creation Luis Marenco, Rixin Wang, Perrry Miller, Gordon Shepherd Yale University

Cross-index and resource integration NIF Data Federation Pub Med Link Out via NIF Link Out Broker Literature annotation with unique identifiers via DOMEO

CHALLENGES AND OPPORTUNITIES

User Requests and Challenges Make it easier to find resources: – Spell checking – More expressive search, e.g., questions Make it easier to track usage – Resource identification project-NIF Registry identifiers into published literature Personalized services – Send updates/new sources that are relevant – “other people who searched for X looked at Y” Ranking of data federation results Keeping contents current Measuring success: – Are we reaching our core audience; how can we tell? NIF recently redesigned the portal to decouple the front end and the back end to make NIF more flexible

Resource View Snippets constructed from data record Combines Registry and selected databases

Keeping the Registry Current – NIF employs an automated link checker – Last analysis: 478/6100 invalid URL’s (~8%) – 199 can’t locate at another university or URL  out of service (~3%) – Bigger issue: 30-50% of resources are no longer updated or maintained Resources added Last updated NIF has 5 years of data tracking digital resources

Keeping content up to date Connectome Tractography Epigenetics New tags come into existence New resource types come into existence, e.g., Mobile apps Resources add new types of content Change name Change scope > 7000 updates to the registry last year It’s a challenge to keep the registry up to date; sitemaps, curation, ontologies, community review

Musings from the NIF Every resource is resource limited: few have enough time, money, staff or expertise required to do everything they would like – If the market can support 11 MRI databases, fine – Some consolidation, coordination is usually warranted Big, broad and messy beats small, narrow and neat – Without trying to integrate a lot of data, we will not know what needs to be done – Progressive refinement; addition of complexity through layers Be flexible and opportunistic – A single optimal technology/container for all types of scientific data and information does not exist; technology is changing Think globally; act locally: – No source, not even NIF, is THE source; we are all a source – Think about interoperation from the inception

NIF team (past and present) Jeff Grethe, UCSD, Co Investigator, Interim PI Amarnath Gupta, UCSD, Co Investigator Anita Bandrowski, NIF Project Leader Gordon Shepherd, Yale University Perry Miller Luis Marenco Rixin Wang David Van Essen, Washington University Erin Reid Paul Sternberg, Cal Tech Arun Rangarajan Hans Michael Muller Yuling Li Giorgio Ascoli, George Mason University Sridevi Polavarum Fahim Imam Larry Lui Andrea Arnaud Stagg Jonathan Cachat Jennifer Lawrence Svetlana Sulima Davis Banks Vadim Astakhov Xufei Qian Chris Condit Mark Ellisman Stephen Larson Willie Wong Tim Clark, Harvard University Paolo Ciccarese Karen Skinner, NIH, Program Officer (retired) Jonathan Pollock, NIH, Program Officer And my colleagues in Monarch, dkNet, 3DVC, Force 11